用字符串和列表从对象列中获取每个Pandas行的最大数量

vq8itlhq  于 2023-01-15  发布在  其他
关注(0)|答案(4)|浏览(98)

我有一个数据框

import pandas as pd
import numpy as np

df1 = pd.DataFrame.from_dict(
    {"col1": [0, 0, 0, 0, 0],
    "col2": ["15", [10,15,20], "30", [20, 25], np.nan]})

看起来像这样
| 列1|列2|
| - ------|- ------|
| 无|"十五"|
| 无|[十、十五、二十]|
| 无|"三十"|
| 无|[二十、二十五]|
| 无|钠氮|
对于col2,我需要每行的最大值,例如第一行为15,第二行为20,这样我就得到了以下 Dataframe :

df2 = pd.DataFrame.from_dict(
    {"col1": [0, 0, 0, 0, 0],
    "col2": [15, 20, 30, 25, np.nan]})

应该是这样的
| 列1|列2|
| - ------|- ------|
| 无|十五|
| 无|二十个|
| 无|三十|
| 无|二十五|
| 无|钠氮|
我尝试使用一个for循环来检查每一行的col2类型,然后将str转换为int,将max()应用于列表,并保持nan的列表不变,但没有成功。这是我尝试的方法(尽管我建议忽略我的尝试):

col = df1["col2"]
coltypes = []

for i in col:
#get type of each row
    coltype = type(i) 
    coltypes.append(coltype)

df1["coltypes"] = coltypes

#assign value to col3 based on type
df1["col3"] = np.where(df1["coltypes"] == str, df1["col1"].astype(int), 
                      np.where(df1["coltypes"] == list, max(df1["coltypes"]), np.nan))

给出以下错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-b8eb266d5519> in <module>
      9 
     10 df1["col3"] = np.where(df1["coltypes"] == str, df1["col1"].astype(int), 
---> 11                       np.where(df1["coltypes"] == list, max(df1["coltypes"]), np.nan))

TypeError: '>' not supported between instances of 'type' and 'type'
brccelvz

brccelvz1#

让我们先尝试explode,然后再尝试groupbymax

out = df1.col2.explode().groupby(level=0).max()
Out[208]: 
0     15
1     20
2     30
3     25
4    NaN
Name: col2, dtype: object
i5desfxk

i5desfxk2#

另一种可能更容易理解的方法是使用apply()和一个简单的函数,该函数根据类型返回max。

import pandas as pd
import numpy as np

df1 = pd.DataFrame.from_dict(
    {"col1": [0, 0, 0, 0, 0],
    "col2": ["15", [10,15,20], "30", [20, 25], np.nan]})

def get_max(x):
    if isinstance(x, list):
        return max(x)
    elif isinstance(x, str):
        return int(x)
    else:
        return x

df1['max'] = df1['col2'].apply(get_max)

print(df1)

输出为:

col1          col2   max
0     0            15  15.0
1     0  [10, 15, 20]  20.0
2     0            30  30.0
3     0      [20, 25]  25.0
4     0           NaN   NaN
50few1ms

50few1ms3#

import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict(
    {"col1": [0, 0, 0, 0, 0],
    "col2": ["15", [10,15,20], "30", [20, 25], np.nan]})
res=df1['col2']
lis=[]
for i in res:
    if type(i)==str:
        i=int(i)
    if type(i)==list:
        i=max(i)
        lis.append(i)
    else:
        lis.append(i)
df1['col2']=lis
df1

我想你想得到这个答案......

zfciruhq

zfciruhq4#

以下是另外两个选项:

df1['col2'].map(lambda x: max([int(x)]) if type(x)==str else max(x),na_action='ignore')

pd.to_numeric(df1['col2'],errors = 'coerce').fillna(df1['col2'].map(max,na_action='ignore'))

输出:

0    15.0
1    20.0
2    30.0
3    25.0
4     NaN

相关问题