pandas 从 Dataframe 中提取特定单词- Python

zdwk9cvp 于 2022-12-28 发布在 Python

关注(0)|答案(3)|浏览(207)

1.我拥有的第一个 Dataframe 如下所示：
| 字符串1|
| - ------|
| Table 671usa50452.tab has been created as of the process date (12-19-22). |
| Table 643usa50552.tab has been created as of the process date (12-19-22). |
| Table 681usa50532.tab has been created as of the process date (12-19-22). |
| Table 621usa56452.tab has been created as of the process date (12-19-22). |
| Table 547usa67452.tab has been created as of the process date (12-19-22). |
我想提取所有包含'usa'之间的帐户和日期指定的每一行有这样的东西：
| 字符串1|账户|日期|
| - ------| - ------| - ------|
| Table 671usa50452.tab has been created as of the process date (12-19-22). | 671usa50452 | 12-19-22 |
| Table 643usa50552.tab has been created as of the process date (12-19-22). | 643usa50552 | 12-19-22 |
| Table 681usa50532.tab has been created as of the process date (12-19-22). | 681usa50532 | 12-19-22 |
| Table 621usa56452.tab has been created as of the process date (12-19-22). | 621usa56452 | 12-19-22 |
| Table 547usa67452.tab has been created as of the process date (12-19-22). | 547usa67452 | 12-19-22 |
我一直在尝试使用以下内容，但信息无法进入新 Dataframe 的列中：
第一个月
1.第二个 Dataframe 类似：
| 字符串2|
| - ------|
| 3203美国34088：资产USA1/asd011245|
| 3203美国34088：资产USA2/ghf023345|
| 3203美国34088：资产美国3/hgf012735|
| 3203美国34088：资产USA4/湿012455|
| 3203美国34088：资产美国5/nbj012245|
我希望得到以下信息：
| 字符串2|账户2|
| - ------| - ------|
| 3200美国34088：资产USA1/asd011245|小行星3200|
| 3201美国34088：资产USA2/ghf023345|小行星3201|
| 3202美国34088：资产美国3/hgf012735|小行星3202|
| 3203美国34088：资产USA4/湿012455|小行星3203|
| 3204美国34088：资产美国5/nbj012245|小行星3204|

pandas

来源：https://stackoverflow.com/questions/74891138/extract-specific-words-from-a-dataframe-python

3条答案

按热度按时间

bwntbbo31#

对于第一个 Dataframe ，我们可以使用str.extract如下：

df["Account"] = df["String1"].str.extract(r'(\w+)\.tab\b')
df["Date"] = df["String1"].str.extract(r'\((\d{2}-\d{2}-\d{2})\)')

对于第二个 Dataframe ：

df["Account2"] = df["String2"].str.extract(r'^(\w+)')

赞(0）回复(0）举报 2022-12-28

roqulrg32#

我认为这是可行的：

# Pandas lib
import pandas as pd

# -------------------------------------------------------------- FIRST DATAFRAME

# I will suppose youre importing the df from excel ok?
df1 = pd.read_excel("First_df.xlsx")

#First case:
list_account = []
list_date = []
for string in df1['String1']:
    if "usa" in string:
        new_string = string.split()
        newnew_string = new_string[1].split(".")
        date_string = new_string[10].split("(")
        datedate_string = date_string[0].split(")")
        
        list_account.append(newnew_string[0])
        list_date.append(datedate_string[0])

df_output = pd.DataFrame({'Account': list_account})
df_output['Date'] = list_date

# -------------------------------------------------------------- SECOND DATAFRAME

df2 = pd.read_excel("Second_df.xlsx")

list_account2 = []

for string in df2['String2']:
    if "usa" in string:
        new_string = string.split()
        list_account2.append(new_string[0])
        
df_output2 = pd.DataFrame({'Account2': list_account2})

赞(0）回复(0）举报 2022-12-28

lfapxunr3#

第一个使用案例的答案：

l=[]
l2=[]
for i in range(len(df)):
    l.append(df.string1.tolist()[i].split(" ")[1])
    s=(df.string1.tolist()[j].split(" ")[10])
    l2.append(s[s.find("(")+1:s.find(")")])

df['Account']=l
df['Date']=l2

输出：

string1          Account  

    Date
0  Table 671usa50452.tab has been created as of t...  671usa50452.tab  12-19-22
1  Table 643usa50552.tab has been created as of t...  643usa50552.tab  12-19-22
2  Table 681usa50532.tab has been created as of t...  681usa50532.tab  12-19-22
3  Table 621usa56452.tab has been created as of t...  621usa56452.tab  12-19-22
4  Table 547usa67452.tab has been created as of t...  547usa67452.tab  12-19-22

对于第二种：

l3=[]
for i in range(len(df)):
    l.append(df.string1.tolist()[i].split(" ")[0]) 
df['Account2']=l3

赞(0）回复(0）举报 2022-12-28

我来回答

pandas 从 Dataframe 中提取特定单词- Python

3条答案

相关问题

热门标签

最新问答