pandas 从字符串提取日期/时间

xfyts7mz 于 2023-01-07 发布在其他

关注(0)|答案(2)|浏览(171)

我有一个 Dataframe ，看起来像这样：

ID     RESULT
1      Pivot (Triage) Form Entered On:  12/30/2022 23:20 EST    Performed On:  12/30/2022 23:16 EST

我想提取这两个datetime变量，这样新的dataframe看起来就像这样：

ID        END_TIME            START_TIME
1         12/30/2022 23:20    12/30/2022 23:16

我尝试了多种方法，但得到的结果是'END_TIME'和'START_TIME'变量输出为"NA"。

TEST['END_TIME']=TEST['RESULT'].str.extract("Entered On:  (\d+) EST")
TEST['START_TIME']=TEST['RESULT'].str.extract("Performed On:  (\d+) EST")

pandas

来源：https://stackoverflow.com/questions/75007444/extract-date-time-from-string

2条答案

按热度按时间

ecfsfe2w1#

- 测试 Dataframe 开始：**

我们在应用regex函数之前构建以下 Dataframe （我假设结束日期总是在开始日期之前）：

import pandas as pd
import re

### We build dataframe test first ###
s = "Pivot (Triage) Form Entered On:  12/30/2022 23:20 EST    Performed On:  12/30/2022 23:16 EST"

df = pd.DataFrame([('1', s)], columns=['ID', 'RESULT'])

### ----------------------------- ###

ID                                             RESULT
0  1  Pivot (Triage) Form Entered On:  12/30/2022 23...

您可以在代码中使用下面的regex，或者使用下面的代码（它最适合您）regex = r'\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}'

- 使用您的脚本：**

import pandas as pd
import re

### We build dataframe test first ###
s = "Pivot (Triage) Form Entered On:  12/30/2022 23:20 EST    Performed On:  12/30/2022 23:16 EST"

df = pd.DataFrame([('1', s)], columns=['ID', 'RESULT'])

### ----------------------------- ###
# We define regex
regex = r'Form Entered On:  (\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{1,2}) EST'
df['END_TIME'] = df['RESULT'].str.extract(regex)
regex = r'Performed On:  (\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{1,2}) EST'
df['START_TIME'] = df['RESULT'].str.extract(regex)

- 另一种方式：**

import pandas as pd
import re

### We build dataframe test first ###
s = "Pivot (Triage) Form Entered On:  12/30/2022 23:20 EST    Performed On:  12/30/2022 23:16 EST"

df = pd.DataFrame([('1', s)], columns=['ID', 'RESULT'])

### ----------------------------- ###
# We define regex
regex = r'\d{1,2}\/\d{1,2}\/\d{4} \d{1,2}:\d{1,2}'

df[['END_TIME', 'START_TIME']] = df.apply(lambda x: re.findall(regex, x.iloc[1]), axis=1).iloc[0]

- df ["结束时间"]：**

0    12/30/2022 23:20
Name: END_TIME, dtype: object

- df ["开始时间"]：**

0    12/30/2022 23:16
Name: START_TIME, dtype: object

赞(0）回复(0）举报 2023-01-07

agxfikkp2#

假设总是有2个且只有2个时间戳，并使用更通用的正则表达式模式，我们可以尝试：

test[["END_TIME", "START_TIME"]] = test["RESULT"].str.extract(r'Entered On:\s*(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}) [A-Z]{3}\s+Performed On:\s*(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}) [A-Z]{3}')

以下regex demo显示regex模式和捕获组工作正常。

赞(0）回复(0）举报 2023-01-07

我来回答

pandas 从字符串提取日期/时间

2条答案

相关问题

热门标签

最新问答