Pandas仅将包含复杂数据的列拆分为真实的部和虚部

kokeuurv 于 2023-08-01 发布在其他

关注(0)|答案(2)|浏览(92)

我是pandas的新手，我正在尝试使用包含复数值数据和其他一些东西（字符串等）的Dataframe。
我正在谈论的内容的一个简化版本：

import numpy as np
import pandas as pd
a = np.array([
        [0.1 + 1j, 0.2 + 0.2j, 0.2, 0.1j, "label_a", 1],
        [0.1 + 1j, 0.5 + 1.2j, 0.5, 1.0j, "label_b", 3],
    ])
columns = np.array([-12, -10, 10, 12, "label", "number"])
df = pd.DataFrame(data=a, columns=columns)

字符串
为了持久地保存和阅读到磁盘，我需要将复数值拆分为真实的和虚数，因为显然没有相关的磁盘格式（hdf5，parquet等）支持复数。
现在，如果数据框架只包含复数，我可以通过引入多索引来实现这一点，这是其他问题已经涵盖的内容（例如：Modify dataframe with complex values into a new multiindexed dataframe with real and imaginary parts using pandas）。

# save to file
pd.concat(
    [df.apply(np.real), df.apply(np.imag)],
    axis=1,
    keys=("R", "I"),
).swaplevel(0, 1, 1).sort_index(axis=1).to_parquet(file)

# read from file
df = pd.read_parquet(file)
real = df.loc[:, (slice(None), "R")].droplevel(1, axis=1)
imag = df.loc[:, (slice(None), "I")].droplevel(1, axis=1)
df = real + 1j * imag

型
然而，这种方法在存在例如。字符串字段。
我目前通过将 Dataframe 拆分为一个只包含复数的 Dataframe （即这里的前四列）和其余的。然后我将上述方法应用于前者，与后者合并并保存到文件中。这很有效，但不是特别好，特别是当列排列得不那么整齐时。
我希望有更多Pandas经验的人能有一个更简单的方法来实现这一目标。如果有关系的话：在性能方面，我不关心写，但我关心的是从文件阅读回数据框架。

pandas

来源：https://stackoverflow.com/questions/76736152/pandas-split-only-columns-with-complex-data-into-real-and-imaginary-part

2条答案

按热度按时间

vh0rcniy1#

您可以处理您知道是复杂的列，而其他列则独立处理。为其他柱添加第二个虚拟标高：

写作

N = 4
cols = df.columns[:N] # or define an explicit list of names

# ensure the type is complex
# you might need to adjust to other types (np.complex128, np.complex256…)
tmp = df[cols].astype(np.complex64)

(pd.concat(
    # slice the complex columns
    # NB. using a more efficient way to get the real/imaginary parts
    [pd.DataFrame(np.real(tmp), index=tmp.index, columns=cols),
     pd.DataFrame(np.imag(tmp), index=tmp.index, columns=cols),
    ],
    axis=1,
    keys=("R", "I"),
          )
   # add the other columns
   .join(pd.concat({None: df[df.columns.difference(cols)]}, axis=1))
   .swaplevel(0, 1, 1).sort_index(axis=1)
   .to_parquet('test_pqt')
)

字符串

阅读

# read from file
df = pd.read_parquet('test_pqt')

N = 4
cols = df.columns.get_level_values(0)[:N] # or define an explicit list of names

other_cols = df.columns.get_level_values(0).difference(cols)

real = df.loc[:, (cols, "R")].droplevel(1, axis=1)
imag = df.loc[:, (cols, "I")].droplevel(1, axis=1)
df = (real + 1j * imag).join(df.droplevel(1, axis=1)[other_cols])

print(df)

型
输出量：

-10       -12   10   10   12   12    label number
0  0.2+0.2j  0.1+1.0j  0.0  0.2  0.1  0.0  label_a      1
1  0.5+1.2j  0.1+1.0j  0.0  0.5  1.0  0.0  label_b      3

型

赞(0）回复(0）举报 2023-08-01

lmvvr0a82#

替代方法：

import numpy as np
import pandas as pd

tmp = df[cols].astype(np.complex64)
for col in tmp:

    col1 = col + "_1"
    col2 = col + "_2"
    df[[col1, col2]] = df[col].str.split('+',expand = True)

字符串

赞(0）回复(0）举报 2023-08-01

我来回答

Pandas仅将包含复杂数据的列拆分为真实的部和虚部

2条答案

写作

阅读

相关问题

热门标签

最新问答