Pandas键值对 Dataframe

rryofs0p  于 2023-02-07  发布在  其他
关注(0)|答案(2)|浏览(130)

Pandas可以把键值转换成自定义的表吗。下面是数据样本。

1675484100 customer=A.1 area=1 height=20 width={10,10} length=1
1675484101 customer=B.1 area=10 height=30 width={20,11} length=2
1675484102 customer=C.1 area=11 height=40 width={30,12} length=3 remarks=call

生成一个以key作为表头和相关值的表。第一个字段为时间。

wljmcqd8

wljmcqd81#

我将使用正则表达式来获取每个键/值对,然后重新整形:

data = '''1675484100 customer=A.1 area=1 height=20 width={10,10} length=1
1675484101 customer=B.1 area=10 height=30 width={20,11} length=2
1675484102 customer=C.1 area=11 height=40 width={30,12} length=3 remarks=call'''

df = (pd.Series(data.splitlines()).radd('time=')
      .str.extractall(r'([^\s=]+)=([^\s=]+)')
      .droplevel('match').set_index(0, append=True)[1]
      # unstack keeping order
      .pipe(lambda d: d.unstack()[d.index.get_level_values(-1).unique()])
      )

print(df)

输出:

0        time customer area height    width length remarks
0  1675484100      A.1    1     20  {10,10}      1     NaN
1  1675484101      B.1   10     30  {20,11}      2     NaN
2  1675484102      C.1   11     40  {30,12}      3    call
qcbq4gxm

qcbq4gxm2#

假设您的 input 是一个定义为datastring,您可以使用以下代码:

L = [{k: v for k, v in (x.split("=") for x in l.split()[1:])}
     for l in data.split("\n") if l.strip()]
​
df = pd.DataFrame(L)
​
df.insert(0, "time", [pd.to_datetime(int(x.split()[0]), unit="s")
                      for x in data.split("\n")])

否则,如果数据存储在某种(. txt)文件中,请在开头添加以下内容:

with open("file.txt", "r") as f:
    data = f.read()

输出:

print(df)
​
                 time customer area height    width length remarks
0 2023-02-04 04:15:00      A.1    1     20  {10,10}      1     NaN
1 2023-02-04 04:15:01      B.1   10     30  {20,11}      2     NaN
2 2023-02-04 04:15:02      C.1   11     40  {30,12}      3    call

相关问题