pandas 使用包含运算符添加新列

bvuwiixz  于 2023-03-06  发布在  其他
关注(0)|答案(1)|浏览(137)

我有一个从excel文件导入的数据框,我们可以说不同类型的费用。这个数据框有一个名为“概念”的列。我想根据费用类型添加两个新列。为此,我确定了概念中的“关键词”。例如〉

import pandas as pd

dictionary = {
      "DATE" : ['12/02/2023', '02/01/2023', '02/01/2023', '10/02/2023'],    
      "CONCEPT" : ['Supermarket','Restaurant', 'petrol station', 'decathlon'],
      "EUR" : [-150,-50,-45,-95]
          }
df = pd.DataFrame(dictionary)

df['EXPENSE TYPE'] = pd.Series(dtype="string")

df['EXPENSE TYPE'][df['CONCEPT'].str.upper().str.contains('SUPERMARKET')] = 'FOOD'
df['EXPENSE TYPE'][df['CONCEPT'].str.upper().str.contains('RESTAURANT')] = 'FOOD'
df['EXPENSE TYPE'][df['CONCEPT'].str.upper().str.contains('PETROL')] = 'GAS'
df['EXPENSE TYPE'][df['CONCEPT'].str.upper().str.contains('DECATHLON')] = 'CLOTHES'

使用这段代码,我得到了预期的输出〉

DATE         CONCEPT  EUR EXPENSE TYPE
0  12/02/2023     Supermarket -150         FOOD
1  02/01/2023      Restaurant  -50         FOOD
2  02/01/2023  petrol station  -45          GAS
3  10/02/2023       decathlon  -95      CLOTHES

但是,我想添加两个字段而不是一个。因此,当我识别单词“Supermarket”时,我添加了“Expense TYPE”和“SUBCATEGORY”,例如:

DATE         CONCEPT  EUR EXPENSE TYPE  SUBCATEGORY
0  12/02/2023     Supermarket -150         FOOD         HOME
1  02/01/2023      Restaurant  -50         FOOD  OUT OF HOME
2  02/01/2023  petrol station  -45          GAS       DIESEL
3  10/02/2023       decathlon  -95      CLOTHES        SPORT

如何添加两个新列而不是一个?

lqfhib0f

lqfhib0f1#

您可以将.apply与适当的函数一起使用:

import pandas as pd

dictionary = {
      "DATE" : ['12/02/2023', '02/01/2023', '02/01/2023', '10/02/2023'],    
      "CONCEPT" : ['Supermarket','Restaurant', 'petrol station', 'decathlon'],
      "EUR" : [-150,-50,-45,-95]
          }
df = pd.DataFrame(dictionary)

df['EXPENSE TYPE'] = pd.Series(dtype="string")
df['SUBCATEGORY'] = pd.Series(dtype="string")

def func(row):
    s = row['CONCEPT'].upper()
    if 'SUPERMARKET' in s:
        r1 = 'FOOD'
        r2 = 'HOME'
    elif 'RESTAURANT' in s:
        r1 = 'FOOD'
        r2 = 'OUT OF HOME'
    elif 'PETROL' in s:
        r1 = 'GAS'
        r2 = 'DIESEL'
    elif 'DECATHLON' in s:
        r1 = 'CLOTHES'
        r2 = 'SPORT'
    else:
        r1 = 'UNKNOWN'
        r2 = 'UNKNOWN'
    row['EXPENSE TYPE'] = r1
    row['SUBCATEGORY'] = r2
    return row

df = df.apply(func, axis=1)

print(df)

它给出了

DATE         CONCEPT  EUR EXPENSE TYPE  SUBCATEGORY
0  12/02/2023     Supermarket -150         FOOD         HOME
1  02/01/2023      Restaurant  -50         FOOD  OUT OF HOME
2  02/01/2023  petrol station  -45          GAS       DIESEL
3  10/02/2023       decathlon  -95      CLOTHES        SPORT

相关问题