pandas 如何将字符串的一部分提取到另一列

hpxqektj  于 2023-04-04  发布在  其他
关注(0)|答案(3)|浏览(120)

我有一个列,其中包含如下数据

伪数据:

df = pd.DataFrame(["Lyreco A-Type small 2i",
"Lyreco C-Type small 4i",
"Lyreco N-Part medium", 
"Lyreco AKG MT 4i small",
"Lyreco AKG/ N-Type medium 4i",
"Lyreco C-Type medium 2i",
"Lyreco C-Type/ SNU medium 2i",
"Lyreco K-part small 4i",
"Lyreco K-Part medium", 
"Lyreco SNU small 2i",
"Lyreco C-Part large 2i",
"Lyreco N-Type large 4i"])

我想创建一个额外的列,它剥离数据,并在每行中给出字符串(见下文)的所需部分

Column_1                      Column_2
Lyreco A-Type small 2i         A-Type
Lyreco C-Type small 4i         C-Type
Lyreco N-Part medium           N-Part
Lyreco STU MT 4i small         STU MT
Lyreco AKG/ N-Type medium 4i   AKG/ N-Type
Lyreco C-Type medium 2i        C-Type
Lyreco C-Type/ SNU medium 2i   C-Type/ SNU
Lyreco K-part small 4i         K-part
Lyreco K-Part medium           K-Part
Lyreco SNU small 2i            SNU
Lyreco C-Part large 2i         C-Part
Lyreco N-Type large 4i         N-Type

如何从第一列中提取第2列?

iklwldmw

iklwldmw1#

您可能会发现以下逻辑适用于您的数据:

df["Column_2"] = df["Column_1"].str.extract(r'\w+ (\S+(?: \S+)*) \b(?:small|medium|large)\b')

上面的模式从第二个术语开始匹配,直到到达smallmediumlarge关键字。这里是一个工作正则表达式demo

50few1ms

50few1ms2#

看看你发布的例子,拆分列值并返回“中间”项就足够了。你可以做一个简单的函数来封装逻辑并将其应用到 Dataframe 。

from math import floor

df = pd.DataFrame(
    {'Columns_1':
     ["Lyreco A-Type small 2i",
      "Lyreco C-Type small 4i",
      "Lyreco N-Part medium", 
      "Lyreco AKG MT 4i small",
      "Lyreco AKG/ N-Type medium 4i",
      "Lyreco C-Type medium 2i",
      "Lyreco C-Type/ SNU medium 2i",
      "Lyreco K-part small 4i",
      "Lyreco K-Part medium", 
      "Lyreco SNU small 2i",
      "Lyreco C-Part large 2i",
      "Lyreco N-Type large 4i"
     ]
    }
)

def f(row):
    blocks = row['Columns_1'].split()
    mid_index = 1 if len(blocks) <= 4 else floor(len(blocks)/2)
    return ' '.join(blocks[1:mid_index+1])

df['Columns_2'] = df.apply(f, axis=1)

print(df)

输出:

Columns_1    Columns_2
0         Lyreco A-Type small 2i       A-Type
1         Lyreco C-Type small 4i       C-Type
2           Lyreco N-Part medium       N-Part
3         Lyreco AKG MT 4i small       AKG MT
4   Lyreco AKG/ N-Type medium 4i  AKG/ N-Type
5        Lyreco C-Type medium 2i       C-Type
6   Lyreco C-Type/ SNU medium 2i  C-Type/ SNU
7         Lyreco K-part small 4i       K-part
8           Lyreco K-Part medium       K-Part
9            Lyreco SNU small 2i          SNU
10        Lyreco C-Part large 2i       C-Part
11        Lyreco N-Type large 4i       N-Type
nwnhqdif

nwnhqdif3#

df.columns = ['column_1']

df["column_2"] = [col.split(" ")[1] for col in df.column_1]

相关问题