pandas panda extract()运算符返回NaN

omqzjyyz  于 2022-12-09  发布在  其他
关注(0)|答案(1)|浏览(184)

我需要从Pandas数据框中的一列数据中提取特定值,并根据提取的值对其他列进行分组。我需要的模式是U1,U2,U3,... U9。
首先,我找到了match的数据值:

filtered = df[df['column1'].str.match(r'(U\s*\d)') == True]
print(filtered['column1'])

输出是这样的:

9370       U 1 / U 2; Gleisdreieck, barrierefreier Ausbau
9371                           U 1 / U 3; Tunnelsanierung
9372     U 1 / U 6; Hallesches Tor; barrierefreier Ausbau
9373     U 1 / U 8; Kottbusser Tor, barrierefreier Ausbau
9374     U 1 / U 9; Kurfürstendamm, barrierefreier Ausbau
                               ...                       
34032               U9, Hansaplatz: barrierefreier Ausbau
34033            U9, Nauener Platz: barrierefreier Ausbau
34034             U9, Schloßstraße: barrierefreier Ausbau
34035               U9, Turmstraße: barrierefreier Ausbau
34250                                                 U25

现在,我需要提取U1,... U9。我将代码更改为:

extracted = df[df['column1'].str.extract(r'(U\s*\d)') == True]

但我得到值只有NaN

0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
        ... 
40815    NaN
40816    NaN
40817    NaN
40818    NaN
40819    NaN
vaqhlq81

vaqhlq811#

通过执行df['column1'].str.extract(r'(U\s*\d)') == True,可以隐式调用pandas.DataFrame.eq
获取 Dataframe 和其他元素的等于(二元运算符eq)
返回:**布尔值的 Dataframe **
您可以简单地使用此函数来获取第一个匹配项:

extracted = df['column1'].str.extract(r'(U\s*\d)')

pandas.Series.str.findall以获取所有匹配项的列表:

extracted = df['column1'].str.findall(r'(U\s*\d)')
#输出:
extract     findall
9370      U 1  [U 1, U 2]
9371      U 1  [U 1, U 3]
9372      U 1  [U 1, U 6]
9373      U 1  [U 1, U 8]
9374      U 1  [U 1, U 9]
...       ...         ...
34032      U9        [U9]
34033      U9        [U9]
34034      U9        [U9]
34035      U9        [U9]
34250      U2        [U2]

[11 rows x 2 columns]
#编辑:

根据评论,您可以根据自己的期望选择以下解决方案之一:

donatation = pd.read_csv(os.path.join("zuwendungen-berlin.csv"))

# --- To create a Series with the extracted value (#Return: Series)
extracted = donatation['Zweck'].str.extract(r'(U\s*\d)', expand=False)
print(extracted)

# --- To create a new DataFrame with a single column containing (if possible) the extracted value (#Return: DataFrame)
extracted = donatation['Zweck'].str.extract(r'(U\s*\d)')
print(extracted)

# --- To create a new Column with the extracted value in the original dataframe (#Return: DataFrame)
extracted = donatation.assingn(New_Zweck= donatation['Zweck'].str.extract(r'(U\s*\d)', expand=False))
print(extracted['New_Zweck'])

# --- To filter the original dataframe without creating a column (#Return: DataFrame)
extracted = donatation.loc[donatation['Zweck'].str.contains(r'(U\s*\d)', na=False)]
print(extracted['Zweck'])

# --- To create a column with the extracted value and remove NaN values/rows (#Return: DataFrame)
donatation['New_Zweck'] = donatation['Zweck'].str.extract(r'(U\s*\d)')
extracted = donatation.loc[donatation['New_Zweck'].notna()]
print(extracted['New_Zweck'])

相关问题