pandas 在数据框内展开Numpy数组以拥有列

azpvetkf  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(118)

我有一个 Dataframe ,看起来像这样:
| Col A| B栏|键|值|
| --|--|--|--|
| 数据|数据|[key1,key2,key3]|[value1a,value2a,value3a]|
| 数据|数据|[key1,key2,key3]|[value1b,value2b,value3b]|
“keys”在每一行中是相同的。“values”在每一行中是不同的。“keys”和“values”都是Numpy数组类型。
我的目标是使用键作为列名,然后将值分配给该列中的单元格,如下所示:
| Col A| B栏|key1| Key2| key3|
| --|--|--|--|--|
| 数据|数据|value1a| value2a| value3a|
| 数据|数据|价值1b| value2b| value3b|
| 数据|数据|价值1c| value2c| value3c|
| 数据|数据|value1d| value2d| value3d|
我查看了this post,但它似乎只是使用类似的逻辑创建了一个新的 Dataframe 。
下面是一些创建 Dataframe 的代码:

source_df = pd.DataFrame(
    [
        ['data', 'data', np.array(['key1', 'key2']), np.array(['value1a', 'value2a'])],
        ['data', 'data', np.array(['key1', 'key2']), np.array(['value1b', 'value2b'])]
    ],
    columns=['col 1', 'col 2', 'keys', 'values']
)
goal_df = pd.DataFrame(
    [
        ['data', 'data', 'value1a', 'value2a'],
        ['data', 'data', 'value1b', 'value2b'],
    ],
    columns=['col 1', 'col 2', 'key1', 'key2']
)

字符串

nhjlsmyf

nhjlsmyf1#

如果所有行中的键都相同,则可以使用DataFrame构造函数和to_list转换为DataFrame,然后使用join转换为原始数据集:

out = (df.drop(columns=['keys', 'values'])
         .join(pd.DataFrame(df['values'].tolist(),
                            columns=df['keys'].iloc[0]))
      )

字符串
替代修改df的位置:

df[df.pop('keys').iloc[0]] = pd.DataFrame(df.pop('values').tolist())


输出量:

Col A Col B     key1     key2     key3
0  data  data  value1a  value2a  value3a
1  data  data  value1b  value2b  value3b
2  data  data  value1c  value2c  value3c
3  data  data  value1d  value2d  value3d

不相同的密钥

如果所有行中的键都不相同,则可以使用用途:

out = (df.drop(columns=['keys', 'values'])
         .join(pd.DataFrame([dict(zip(k, v)) for k, v in
                             zip(df['keys'], df['values'])]))
       )


范例:

# input
  Col A Col B                keys                       values
0  data  data  [key1, key2, key3]  [value1a, value2a, value3a]
1  data  data  [key3, key4, key1]  [value3b, value4b, value1b]

# output
  Col A Col B     key1     key2     key3     key4
0  data  data  value1a  value2a  value3a      NaN
1  data  data  value1b      NaN  value3b  value4b

2hh7jdfx

2hh7jdfx2#

我修改了mozway的代码,引入了函数和文档来演示列转换。

#!/usr/bin/env python3

import numpy as np
import pandas as pd

def create_dataframe(df: pd.DataFrame, key_col: str, val_col: str) -> pd.DataFrame:
    """Create a new dataframe with keys and values transposed.

    Args:
        df (pd.DataFrame): The source dataframe.
        key_col (str): The column name that represents the keys.
        val_col (str): The column name that represents the values.

    Returns:
        pd.DataFrame: A new dataframe with keys and values transposed.
    """
    return (
        df
        .drop(columns=[key_col, val_col])
        .join(pd.DataFrame(
            df[val_col].tolist(),
            columns=df[key_col].iloc[0]
        ))
    )

def update_dataframe(df: pd.DataFrame, key_col: str, val_col: str) -> None:
    """Update the dataframe in-place by transposing keys and values.

    Args:
        df (pd.DataFrame): The dataframe to be updated.
        key_col (str): The column name that represents the keys.
        val_col (str): The column name that represents the values.

    Returns:
        None
    """
    df[df.pop(key_col).iloc[0]] = pd.DataFrame(df.pop(val_col).tolist())

if __name__ == '__main__':
    source_df = pd.DataFrame(
        [
            ['data', 'data', np.array(['key1', 'key2']), np.array(['value1a', 'value2a'])],
            ['data', 'data', np.array(['key1', 'key2']), np.array(['value1b', 'value2b'])]
        ],
        columns=['Col A', 'Col B', 'keys', 'values']
    )

    # Create a new dataframe
    goal_df = create_dataframe(source_df, 'keys', 'values')
    print(goal_df)  # New dataframe

    print(source_df)  # Source is unmodified

    # Modify the dataframe in-place
    update_dataframe(source_df, 'keys', 'values')
    print(source_df)  # Source is updated

字符串
输出

# Copy of Original
# =============================================
  Col A Col B     key1     key2
0  data  data  value1a  value2a
1  data  data  value1b  value2b

# Original
# =============================================
  Col A Col B          keys              values
0  data  data  [key1, key2]  [value1a, value2a]
1  data  data  [key1, key2]  [value1b, value2b]

# Modified Original
# =============================================
  Col A Col B     key1     key2
0  data  data  value1a  value2a
1  data  data  value1b  value2b

rnmwe5a2

rnmwe5a23#

我的解决方案使用df.explode()df.assign()来完成任务。

df.explode('values', ignore_index=True).assign(**{k:lambda x: x['values'] for k in df['keys'].iloc[0]}).drop(['keys', 'values'], axis=1)

字符串
输出

col 1 col 2     key1     key2
0  data  data  value1a  value1a
1  data  data  value2a  value2a
2  data  data  value1b  value1b
3  data  data  value2b  value2b

相关问题