是否可以在最后一个整数之后拆分pandas列？

a1o7rhls 于 2023-04-18 发布在其他

关注(0)|答案(2)|浏览(109)

我尝试将一个pandas列拆分为两个单独的列，第一个列应该只包含日期和第二个字符串。但我不想在某个字符后拆分它，比如计算最后一个整数的位置。相反，我想编写一个适用于一般情况的代码。
我的col看起来像这样：
| 色谱柱A|
| --------------|
| 01.01.2000无名氏|
| 01.01.2002无名氏|
我想让它看起来像这样：
| 色谱柱A|B栏|
| --------------|--------------|
| 2000年1月1日|约翰·多伊|
| 二零零一年一月一日|无名氏|

df_t['date'] = df_t['date_time'].str[0:19]
df_t["name"] = df_t["date_time"].str[19: ]
    
    
tid = df_t.drop(["date_time"], axis = 1)

这是我做的方式，但我需要一个一般的方式如上所述

pandas

来源：https://stackoverflow.com/questions/76033467/is-it-possible-to-split-a-pandas-column-after-last-integer

2条答案

按热度按时间

cgh8pdjw1#

您可以将str.extract与正则表达式一起使用：

import pandas as pd

# Sample data
data = {'Column A': ['01.01.2000John Doe', '01.01.2002Jane Doe']}
df = pd.DataFrame(data)

# Regular expression pattern
pattern = r'(?P<Date>\d{2}\.\d{2}\.\d{4})(?P<Name>.*)'

# Extracting the date and name into separate columns
df[['Column A', 'Column B']] = df['Column A'].str.extract(pattern)

print(df)

说明：

pattern变量包含正则表达式模式。表达式（？P\d{2}.\d{2}.\d{4}）捕获日期，（？P. *）捕获名称。
？P〈〉语法用于命名捕获的组，这使得在DataFrame中创建新列更加容易。

编辑

import pandas as pd

# Sample data
data = {
    '1Column A': ['2000-01-01 00:00:00John Doe', '2002-01-01 00:00:00Jane Doe'],
    '2Column B': ['2000-01-01 00:00:00Alice', '2002-01-01 00:00:00Bob'],
    '3Column C': ['Some other data', 'Not a date and name'],
}

df = pd.DataFrame(data)

# Regular expression pattern
pattern = r'(?P<Date>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})(?P<Name>.*)'

# Iterate through columns and apply the pattern conditionally
for col in df.columns:
    if col.startswith("1") or col.startswith("2"):
        # Extract date and name into separate columns with suffixes
        df[[f"{col}_date", f"{col}_name"]] = df[col].str.extract(pattern)
        # Drop the original column
        df.drop(col, axis=1, inplace=True)

print(df)

赞(0）回复(0）举报 2023-04-18

sycxhyv72#

你可以简单地使用索引：

df['Column A'], df['Column B'] = df['Column A'].str[:10], df['Column A'].str[10:]
print(df)

# Output
     Column A  Column B
0  01.01.2000  John Doe
1  01.01.2002  Jane Doe

如果您想转换为datetime64：

df['Column A'], df['Column B'] = \
    pd.to_datetime(df['Column A'].str[:10], dayfirst=True), df['Column A'].str[10:]
print(df)

# Output
    Column A  Column B
0 2000-01-01  John Doe
1 2002-01-01  Jane Doe

赞(0）回复(0）举报 2023-04-18

我来回答

是否可以在最后一个整数之后拆分pandas列？

2条答案

编辑

相关问题

热门标签

最新问答