在Python中将列表元素解析为多个列表

kzmpq1sx  于 2022-10-23  发布在  Python
关注(0)|答案(6)|浏览(153)

我已经成功地从数据源中提取了一个列表。列表元素的格式如下(注意第一个数字不是索引):

0                   cheese    100
1                   cheddar cheese    1100
2                   gorgonzola    1300
3                   smoked cheese    200


这意味着打印时,一行包含“0 cheese 100”,其中包含所有空格。
我想做的是解析每个条目,将其分为两个列表。我不需要第一个号码。相反,我想要的是奶酪类型和后面的数字。
例如:

cheese
cheddar cheese
gorgonzola
smoked cheese

以及:

100
1100
1300
200

最终目标是能够将这两个列表归因于pd.DataFrame中的列,这样它们就可以以各自的方式进行处理。
任何帮助都是非常感激的。

icnyk63a

icnyk63a1#

如果目标是一个 Dataframe ,为什么不只做这个而不是两个列表呢。如果您将字符串转换为序列,您可以使用pandas.Series.str.extract()将其拆分为所需的列:

import pandas as pd

s = '''0                   cheese    100
1                   cheddar cheese    1100
2                   gorgonzola    1300
3                   smoked cheese    200'''

pd.Series(s.split('\n')).str.extract(r'.*?\s+(?P<type>.*?)\s+(?P<value>\d+)')

这给出了一个 Dataframe :

type             value
0   cheese           100
1   cheddar cheese   1100
2   gorgonzola       1300
3   smoked cheese    200
41ik7eoe

41ik7eoe2#

IIUC字符串是列表的元素。您可以使用re.split在找到两个或更多空格的位置进行拆分:

import re
import pandas as pd

your_list = [
  "0                   cheese    100",
  "1                   cheddar cheese    1100",
  "2                   gorgonzola    1300",
  "3                   smoked cheese    200",
]

df = pd.DataFrame([re.split(r'\s{2,}', s)[1:] for s in your_list], columns=["type", "value"])

输出:

type value
0          cheese   100
1  cheddar cheese  1100
2      gorgonzola  1300
3   smoked cheese   200
pes8fvy9

pes8fvy93#

我认为以下内容可能有用:

import pandas as pd
import re
mylist=['0 cheese 100','1 cheddar cheese 200']

numbers = '[0-9]'

list1=[i.split()[-1] for i in mylist]
list2=[re.sub(numbers, '', i).strip() for i in mylist]

your_df=pd.DataFrame({'name1':list1,'name2':list2})
your_df
vdzxcuhz

vdzxcuhz4#

我可以建议这个简单的解决方案吗

lines = [
         "1                   cheddar cheese    1100 ",
         "2                   gorgonzola    1300 ",
         "3                   smoked cheese    200",
        ]

for line in lines:
  words = line.strip().split()
  print( ' '.join( words[1:-1]), words[-1])

结果:

cheddar cheese 1100
gorgonzola 1300
smoked cheese 200
c0vxltue

c0vxltue5#

您可以通过使用切片来实现这一点:

from curses.ascii import isdigit

inList = ['0                   cheese    100', '1                   cheddar cheese    1100', '2                   gorgonzola    1300', '3                   smoked cheese    200']

cheese = []
prices = []

for i in inList:
    temp = i[:19:-1] #Cuts out first number and all empty spaces until first character and reverses the string
    counter = 0
    counter2 = 0
    for char in temp: #Temp is reversed, meaning the number e.g. '100' for 'cheese' is in front but reversed
        if char.isdigit(): 
            counter += 1
        else:   #If the character is an empty space, we know the number is over
            prices.append((temp[:counter])[::-1]) #We know where the number begins (at position 0) and ends (at position counter), we flip it and store it in prices

            cheeseWithSpace = (temp[counter:]) #Since we cut out the number, the rest has to be the cheese name with some more spaces in front
            for char in cheeseWithSpace:
                if char == ' ': #We count how many spaces are in front
                    counter2 += 1
                else:   #If we reach something other than an empty space, we know the cheese name begins.
                    cheese.append(cheeseWithSpace[counter2:][::-1]) #We know where the cheese name begins (at position counter2) cut everything else out, flip it and store it
                    break
            break

print(prices)
print(cheese)

查看代码注解以了解方法。基本上,您可以使用[::-1]来翻转字符串,使其更容易处理。然后逐个移除每个零件。

wfsdck30

wfsdck306#

如果您有:

text = '''0                   cheese    100
1                   cheddar cheese    1100
2                   gorgonzola    1300
3                   smoked cheese    200'''

# OR

your_list = [
 '0                   cheese    100',
 '1                   cheddar cheese    1100',
 '2                   gorgonzola    1300',
 '3                   smoked cheese    200'
]

text = '\n'.join(your_list)

正在执行:

from io import StringIO

df = pd.read_csv(StringIO(text), sep='\s\s+', names=['col1', 'col2'], engine='python')
print(df)

输出:

col1  col2
0          cheese   100
1  cheddar cheese  1100
2      gorgonzola  1300
3   smoked cheese   200
  • 这是将第一个数字作为索引,但如果需要,可以使用df=df.reset_index(drop=True)重置它。

相关问题