例如,给定以下数据框(请注意,此列的原始数据是dtype('0 '))
df = pd.DataFrame({'product_description': ["CUTLERY HVY DUTY FORKS", "XYZ DISP LQD SOAP", "ABCD FOOD STRG CNTNR"]})
如何有效地识别和分离缩写并产生类似的结果
product_description abbreviations
0 CUTLERY HVY DUTY FORKS [HVY]
1 XYZ DISP LQD SOAP [XYZ,DISP,LQD]
2 ABCD FOOD STRG CNTNR [ABCD,STRG,CNTNR]
所以我把这些缩写转换成完整的单词。
我试过这个:
import pandas as pd
import re
df = pd.DataFrame({'product_description': ["CUTLERY HVY DUTY FORKS", "XYZ DISP LQD SOAP", "ABCD FOOD STRG CNTNR"]})
def extract_abbreviations(description):
abbreviation_pattern = r'\b[A-Z]{2,}(?![a-z])' # Updated regular expression pattern to match abbreviations
abbreviations = re.findall(abbreviation_pattern, description)
return abbreviations
df['abbreviations'] = df['product_description'].apply(extract_abbreviations)
print(df)
但我得到的是
product_description abbreviations
0 CUTLERY HVY DUTY FORKS [CUTLERY,HVY,DUTY,FORKS]
1 XYZ DISP LQD SOAP [XYZ,DISP,LQD,SOAP]
2 ABCD FOOD STRG CNTNR [ABCD,FOOD,STRG,CNTNR]
非常感谢你的帮助。谢谢你
1条答案
按热度按时间ijnw1ujt1#
如果您有abb
['XYZ', 'DISP', 'LQD', 'ABCD', 'STRG', 'CNTNR', 'HVY', 'SOAP']
的列表,您应该能够应用以下逻辑来获得所需的结果