用字符串和列表从对象列中获取每个Pandas行的最大数量

vq8itlhq 于 2023-01-15 发布在其他

关注(0)|答案(4)|浏览(95)

我有一个数据框

import pandas as pd
import numpy as np

df1 = pd.DataFrame.from_dict(
    {"col1": [0, 0, 0, 0, 0],
    "col2": ["15", [10,15,20], "30", [20, 25], np.nan]})

看起来像这样
| 列1|列2|
| - ------|- ------|
| 无|"十五"|
| 无|[十、十五、二十]|
| 无|"三十"|
| 无|[二十、二十五]|
| 无|钠氮|
对于col2，我需要每行的最大值，例如第一行为15，第二行为20，这样我就得到了以下 Dataframe ：

df2 = pd.DataFrame.from_dict(
    {"col1": [0, 0, 0, 0, 0],
    "col2": [15, 20, 30, 25, np.nan]})

应该是这样的
| 列1|列2|
| - ------|- ------|
| 无|十五|
| 无|二十个|
| 无|三十|
| 无|二十五|
| 无|钠氮|
我尝试使用一个for循环来检查每一行的col2类型，然后将str转换为int，将max（）应用于列表，并保持nan的列表不变，但没有成功。这是我尝试的方法（尽管我建议忽略我的尝试）：

col = df1["col2"]
coltypes = []

for i in col:
#get type of each row
    coltype = type(i) 
    coltypes.append(coltype)

df1["coltypes"] = coltypes

#assign value to col3 based on type
df1["col3"] = np.where(df1["coltypes"] == str, df1["col1"].astype(int), 
                      np.where(df1["coltypes"] == list, max(df1["coltypes"]), np.nan))

给出以下错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-b8eb266d5519> in <module>
      9 
     10 df1["col3"] = np.where(df1["coltypes"] == str, df1["col1"].astype(int), 
---> 11                       np.where(df1["coltypes"] == list, max(df1["coltypes"]), np.nan))

TypeError: '>' not supported between instances of 'type' and 'type'

pandas

来源：https://stackoverflow.com/questions/70929680/get-max-number-of-each-pandas-row-from-object-column-with-strings-and-lists

4条答案

按热度按时间

brccelvz1#

让我们先尝试explode，然后再尝试groupby和max

out = df1.col2.explode().groupby(level=0).max()
Out[208]: 
0     15
1     20
2     30
3     25
4    NaN
Name: col2, dtype: object

赞(0）回复(0）举报 2023-01-15

i5desfxk2#

另一种可能更容易理解的方法是使用apply()和一个简单的函数，该函数根据类型返回max。

import pandas as pd
import numpy as np

df1 = pd.DataFrame.from_dict(
    {"col1": [0, 0, 0, 0, 0],
    "col2": ["15", [10,15,20], "30", [20, 25], np.nan]})

def get_max(x):
    if isinstance(x, list):
        return max(x)
    elif isinstance(x, str):
        return int(x)
    else:
        return x

df1['max'] = df1['col2'].apply(get_max)

print(df1)

输出为：

col1          col2   max
0     0            15  15.0
1     0  [10, 15, 20]  20.0
2     0            30  30.0
3     0      [20, 25]  25.0
4     0           NaN   NaN

赞(0）回复(0）举报 2023-01-15

50few1ms3#

import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict(
    {"col1": [0, 0, 0, 0, 0],
    "col2": ["15", [10,15,20], "30", [20, 25], np.nan]})
res=df1['col2']
lis=[]
for i in res:
    if type(i)==str:
        i=int(i)
    if type(i)==list:
        i=max(i)
        lis.append(i)
    else:
        lis.append(i)
df1['col2']=lis
df1

我想你想得到这个答案......

赞(0）回复(0）举报 2023-01-15

zfciruhq4#

以下是另外两个选项：

df1['col2'].map(lambda x: max([int(x)]) if type(x)==str else max(x),na_action='ignore')

或

pd.to_numeric(df1['col2'],errors = 'coerce').fillna(df1['col2'].map(max,na_action='ignore'))

输出：

赞(0）回复(0）举报 2023-01-15

我来回答

用字符串和列表从对象列中获取每个Pandas行的最大数量

4条答案

相关问题

热门标签

最新问答