python 根据项数选择元组

qfe3c7zg 于 2022-12-28 发布在 Python

关注(0)|答案(2)|浏览(168)

我正在做NLP。我已经做了标记化，我的数据已经变成元组。现在，我想选择包含超过4个项目（单词）的数据。这里是我的数据集的一个样本。

ID                                content
 0         [yes, no, check, sample, word]
 1                           [never, you]
 2 [non, program, more, link, draft, ask]
 3                                 [able]
 4       [to, ask, you, other, man, will]

我想创建一个包含数据号0、2和4的新数据集（有4个以上的项目）。下面是一个示例。

ID                                content
 0         [yes, no, check, sample, word]
 2 [non, program, more, link, draft, ask]
 4       [to, ask, you, other, man, will]

这是我正在写的代码...

df_new = df.loc[df.content.map(len).ne(>4)]

python

来源：https://stackoverflow.com/questions/74936069/select-tuple-based-on-number-of-item

2条答案

按热度按时间

xe55xuns1#

您可以使用pandas.Series.gt。

>>> import pandas as pd
>>> 
>>> df = pd.DataFrame({'ID': [0, 1], 'content': [['yes', 'no', 'check', 'sample', 'word'], ['able']]})
>>> df
   ID                         content
0   0  [yes, no, check, sample, word]
1   1                          [able]
>>> df[df.content.map(len).gt(4)]
   ID                         content
0   0  [yes, no, check, sample, word]

赞(0）回复(0）举报 2022-12-28

k97glaaz2#

可以使用ge（大于或等于），而不是ne，如下所示：

import pandas as pd

df = pd.DataFrame({
    'content': [
        ['yes', 'no', 'check', 'sample', 'word'],
        ['never', 'you'],
        ['non', 'program', 'more', 'link', 'draft', 'ask'],
        ['able'],
        ['to', 'ask', 'you', 'other', 'man', 'will']],
})

df_new = df.loc[df.content.map(len).ge(4)]

print(df_new)
"""
                                  content
0          [yes, no, check, sample, word]
2  [non, program, more, link, draft, ask]
4        [to, ask, you, other, man, will]
"""

有关详细信息，请参阅：https://pandas.pydata.org/docs/reference/api/pandas.Series.ge.html

赞(0）回复(0）举报 2022-12-28

我来回答

python 根据项数选择元组

2条答案

相关问题

热门标签

最新问答