python 为什么我不能使用np.where和str.contains来分配类别?

wtlkbnrh  于 2023-09-29  发布在  Python
关注(0)|答案(1)|浏览(109)

我试图将每个描述分配到一个类别,但我一直得到一个错误。我对python很陌生,所以任何帮助都很感激!
| | 描述|量|类别| Category |
| --|--|--|--|--|
| 0 |2023年11月8日|自动售货机| -2.40 |未分配|
| 1 |2023年10月14日星期一|基督教关怀| -155.00 |未分配|
| 2 |2023年10月14日星期一|莎士比亚| -33.56 |未分配|
| 3 |2023年10月14日星期一|卡皮托隆| -317.14 |未分配|
| 4 |2023年10月15日|诺德产品公司| -18.28 |未分配|

#Assign transactions to the correct category

# Bills
df['Category'] = np.where(df['Description'].str.contains('christianacare',
  'capitalone', 'usaa', 'zelle'), 'Bills', df['Category'])
# Food
df['Category'] = np.where(df['Description'].str.contains('vending machine',
  'tacobell', 'shakeshack', 'univlqr', 'gridiron', 'starbucks'), 'Food', df['Category'])
# Shopping
df['Category'] = np.where(df['Description'].str.contains('amazon'), 'Shopping', df['Category'])
# Services
df['Category'] = np.where(df['Description'].str.contains('coursera',
  'empowerme', 'albert', 'apple', 'peacock', 'nordproducts', 'patreon'), 'Services', df['Category'])
# Entertainment
df['Category'] = np.where(df['Description'].str.contains('playstation',
  'microsoft'), 'Entertainment', df['Category'])
# Transport
df['Category'] = np.where(df['Description'].str.contains('parkmobile'), 'Transport', df['Category'])

错误代码:

TypeError                                 Traceback (most recent call last)
Cell In[15], line 5
      1 #Assign transactions to the correct category
      2 
      3 # Bills
----> 5 df['Category'] = np.where(df['Description'].str.contains('christianacare',
      6   'capitalone', 'usaa', 'zelle'), 'Bills', df['Category'])
      8 # Food
     10 df['Category'] = np.where(df['Description'].str.contains('vending machine',
     11   'tacobell', 'shakeshack', 'univlqr', 'gridiron', 'starbucks'), 'Food', df['Category'])

File ~\anaconda3\Lib\site-packages\pandas\core\strings\accessor.py:129, in forbid_nonstring_types.<locals>._forbid_nonstring_types.<locals>.wrapper(self, *args, **kwargs)
    124     msg = (
    125         f"Cannot use .str.{func_name} with values of "
    126         f"inferred dtype '{self._inferred_dtype}'."
    127     )
    128     raise TypeError(msg)
--> 129 return func(self, *args, **kwargs)

File ~\anaconda3\Lib\site-packages\pandas\core\strings\accessor.py:1260, in StringMethods.contains(self, pat, case, flags, na, regex)
   1252 if regex and re.compile(pat).groups:
   1253     warnings.warn(
   1254         "This pattern is interpreted as a regular expression, and has "
   1255         "match groups. To actually get the groups, use str.extract.",
   1256         UserWarning,
   1257         stacklevel=find_stack_level(),
   1258     )
-> 1260 result = self._data.array._str_contains(pat, case, flags, na, regex)
   1261 return self._wrap_result(result, fill_value=na, returns_string=False)

File ~\anaconda3\Lib\site-packages\pandas\core\strings\object_array.py:122, in ObjectStringArrayMixin._str_contains(self, pat, case, flags, na, regex)
    119     if not case:
    120         flags |= re.IGNORECASE
--> 122     pat = re.compile(pat, flags=flags)
    124     f = lambda x: pat.search(x) is not None
    125 else:

File ~\anaconda3\Lib\re\__init__.py:227, in compile(pattern, flags)
    225 def compile(pattern, flags=0):
    226     "Compile a regular expression pattern, returning a Pattern object."
--> 227     return _compile(pattern, flags)

File ~\anaconda3\Lib\re\__init__.py:287, in _compile(pattern, flags)
    285 if not _compiler.isstring(pattern):
    286     raise TypeError("first argument must be string or compiled pattern")
--> 287 if flags & T:
    288     import warnings
    289     warnings.warn("The re.TEMPLATE/re.T flag is deprecated "
    290               "as it is an undocumented flag "
    291               "without an obvious purpose. "
    292               "Don't use it.",
    293               DeprecationWarning)

TypeError: unsupported operand type(s) for &: 'str' and 'RegexFlag'

我尝试使用str.match而不是str.contains。我希望每个类别都被分配到特定的描述。因此,当我查看数据集时,类别列将自动填写。

oxalkeyp

oxalkeyp1#

一个可能的解决方案是使用.isin

mapping = {
    'Bills': {'capitalone', 'christianacare', 'usaa', 'zelle'},
    'Food': {'gridiron', 'shakeshack', 'starbucks', 'tacobell', 'univlqr', 'vending machine'},
    'Shopping': {'amazon'},
    'Services': {'albert', 'apple', 'coursera', 'empowerme', 'nordproducts', 'patreon', 'peacock'},
    'Entertainment': {'microsoft', 'playstation'},
    'Transport': {'parkmobile'}
}

for k, v in mapping.items():
    df["Category"] = np.where(df["Description"].isin(v), k, df["Category"])

print(df)

图纸:

Date      Description  Amount  Category
0  8/11/2023  vending machine   -2.40      Food
1  8/14/2023   christianacare -155.00     Bills
2  8/14/2023       shakeshack  -33.56      Food
3  8/14/2023       capitalone -317.14     Bills
4  8/15/2023     nordproducts  -18.28  Services

相关问题