pandas 如何删除具有特定字符的句子？

sycxhyv7 于 2022-11-20 发布在其他

关注(0)|答案(2)|浏览(141)

我有一个文章文本的数据框。其中一行有几个带有版权符号“©"的句子。
| 文章文本|
| - -|
| © Aaron Davidson/Getty Images Aaron Davidson/Getty Images Beyond Meat公司裁员19%，其中包括名誉扫地的首席运营官，据该公司发布的一份新闻稿称，该公司首席执行长布朗（Ethan Brown）说，这家以植物为原料的公司正在“大幅削减开支”，以专注于增长。这是我吃过的最好的快餐之一。6/25幻灯片©玛丽Meisenzahl/Insider Taco Bell全球营养与可持续发展总监Missy Schaaphok表示，这意味着它与Taco Bell的招牌牛肉没有区别，但“同样令人垂涎”。22/25幻灯片© Diana G./Yelp 2019年，塔可钟北美区总裁朱莉·费尔斯·马西诺公开表示，该连锁店正在依靠自己的素食选择，而不是创建新的植物-虽然目前还不清楚具体有多少员工被解雇，但该公司截至2021年仍有约1,100名员工。|
我只想删除行中带有版权符号的句子，并且我想对数据集中的每一行都这样做。我希望它看起来像这样：
| 文章文本|
| - -|
| 该公司首席执行长布朗（Ethan Brown）说，这家以植物为原料的公司正在“大幅削减开支”，以专注于增长。这是我吃过的最好吃的快餐之一。/Yelp 2019年，塔可钟北美区总裁朱莉·费尔斯·马西诺公开表示，该连锁店依赖于自己的素食选择，而不是创造新的植物性肉类替代品。尽管目前还不清楚具体有多少员工被解雇，但截至2021年，该公司仍有约1,100名员工。|
这就是我所尝试的：

for i in df['article_texts']:
try:
 paragraph = i
 tokens = paragraph.split(".")
 for sentence in tokens:
  if "©" in sentence:
   tokens.remove(sentence)
   final = (".").join(tokens)
   df['summaries'].loc[(df['summaries'] == i)] = final
except:
 print("Yeah, we good.")

然而，我还是明白了：
| 文章文本|
| - -|
| 该公司首席执行官伊桑·布朗（Ethan Brown）说，这家以植物为原料的公司正在“大幅削减开支”，以专注于业务增长。这是我吃过的最好吃的快餐之一。6/25幻灯片©玛丽Meisenzahl/Insider植物蛋白并不是要和塔可钟（Taco Bell）的招牌牛肉毫无区别，而是“同样令人垂涎。塔可钟全球营养与可持续发展总监米西·沙普霍克（Missy Schaaphok）表示。22/25幻灯片© Diana G./Yelp 2019年，塔可钟北美区总裁朱莉·费尔斯·马西诺公开表示，该连锁店依赖于自己的素食选择，而不是创造新的植物性肉类替代品。尽管目前还不清楚具体有多少员工被解雇，但截至2021年，该公司仍有约1,100名员工。|
我做错了什么？

pandas

来源：https://stackoverflow.com/questions/74448247/how-to-remove-sentences-with-a-specific-character

2条答案

按热度按时间

xqk2d5yq1#

我想对其他人的答案进行一些扩展。任何需要转换列值的问题都是使用.map（）的最佳选择。我认为这会使代码更具可读性。

def remove_sentences_with_copyright(paragraph):
    return '.'.join(sentence for sentence in paragraph.split(".") if "©" not in sentence)

df['summaries'] = df['article_texts'].map(remove_sentences_with_copyright)

赞(0）回复(0）举报 2022-11-20

anauzrmj2#

我将分享简单的过程。
将©替换为掩码#
按.拆分字符串
使用列表压缩删除元素

text ="""© Aaron Davidson/Getty Images Aaron Davidson/Getty Images Beyond Meat cuts 19% of workforce including disgraced COO, according to a release from the company. CEO Ethan Brown says the plant-based company is 'significantly reducing expenses' in an effort to focus on growth. It was one of the best fast food meals I've ever had. 6/25 SLIDES © Mary Meisenzahl/Insider The plant-based protein wasn't meant to be indistinguishable from Taco Bell's signature beef, but "equally cravable," according to Taco Bell's director of global nutrition & sustainability Missy Schaaphok. 22/25 SLIDES © Diana G./Yelp In 2019, Taco Bell North America president Julie Felss Masino publicly said that the chain was relying on
its own vegetarian options instead of creating new plant-based meat substitutes. Although it remains unclear exactly how
many employees were let go, the company ended 2021 with about 1,100 employees."""
my_list = text.replace("©", 'mask')
my_list = my_list.split(".")

mask = ['mask']

filtered = ([el for el in my_list if not any(ignore in el for ignore in mask)])
print(filtered)

输出列表号

[" CEO Ethan Brown says the plant-based company is 'significantly reducing expenses' in an effort to focus on growth", " It was one of the best fast food meals I've ever had", '/Yelp In 2019, Taco Bell North America president Julie Felss Masino publicly said that the chain was relying on\nits own vegetarian options instead of creating new plant-based meat substitutes', ' Although it remains unclear exactly how\nmany employees were let go, the company ended 2021 with about 1,100 employees', '']

连接列表

filtered ='. '.join(filtered)

输出编号

CEO Ethan Brown says the plant-based company is 'significantly reducing 
expenses' in an effort to focus on growth.  It was one of the best fast 
food meals I've ever had. /Yelp In 2019, Taco Bell North America 
president Julie Felss Masino publicly said that the chain was relying on
its own vegetarian options instead of creating new plant-based meat 
substitutes.  Although it remains unclear exactly how
many employees were let go, the company ended 2021 with about 1,100 
employees.

赞(0）回复(0）举报 2022-11-20

我来回答

pandas 如何删除具有特定字符的句子？

2条答案

相关问题

热门标签

最新问答