如何使每隔一行一列使用行以下的值在Pandas？

m2xkgtsf 于 2023-03-11 发布在其他

关注(0)|答案(3)|浏览(140)

我有一个Pandas的数据框，看起来像这样：

df = pd.DataFrame.from_dict({'type': {4: 'Second Product',
  5: 'table',
  6: 'First Product',
  7: 'chair',
  8: 'Second Product',
  9: 'desk',
  10: 'First Product',
  11: 'chair'},
 'id': {4: 'cust1',
  5: 'cust1',
  6: 'cust1',
  7: 'cust1',
  8: 'cust2',
  9: 'cust2',
  10: 'cust2',
  11: 'cust2'}})

但是我需要将“type”列分解为"value“列。因此，列名将是”Second Product“和”First Product“，但值将位于它们下面的行。如下所示：

df = pd.DataFrame.from_dict({'cust': {4:'cust1', 5:'cust2' },'Second Product': {4: 'table',
  5: 'desk'},
 'First Product': {4: 'chair',
  5: 'chair'}})

另一个问题是，可能不止有第一个和第二个产品，我希望获得所有列，并在不存在的地方填充空白或nans。因此，如果有一个客户有“第三个产品”，我需要将其作为一个列，在其他客户没有第三个产品值的地方，将其填充为空白或nans。
我试过转置，堆叠，拆堆和设置索引等...我只是坚持如何去做这件事。
编辑：我并不担心索引被重置，所以它不需要与我的示例完全匹配。

pandas

来源：https://stackoverflow.com/questions/75665344/how-to-make-every-other-row-a-column-using-row-below-as-the-value-in-pandas

3条答案

按热度按时间

gudnpqoy1#

代码

# check if value is the header/column name
m = df['type'].str.endswith('Product')

# mask and forward fill to associate
# column name with each value below it
df['cols'] = df['type'].mask(~m).ffill()

# pivot to reshape to wide format
result = df[~m].pivot(index='id', columns='cols', values='type')

结果

cols  First Product Second Product
id                                
cust1         chair          table
cust2         chair           desk

赞(0）回复(0）举报 2023-03-11

inkz8wg92#

您可以尝试使用移位来连接 Dataframe ，并透视结果：

temp = df.join(df.shift(), rsuffix='_2').rename_axis(index='ix').reset_index()
temp = temp.drop(columns=['id_2'])[temp.index % 2 == 1].set_index('ix')
result = temp.pivot('id', 'type_2', 'type').rename_axis(
    index='cust', columns=None).reset_index()

它给出：

cust First Product Second Product
0  cust1         chair          table
1  cust2         chair           desk

赞(0）回复(0）举报 2023-03-11

at0kjp5o3#

使用zip和list comprehension，您可以执行以下操作：

df = pd.DataFrame.from_dict({
  'type': { 4: 'Second Product', 5: 'table', 6: 'First Product', 7: 'chair',
            8: 'Second Product', 9: 'desk', 10: 'First Product', 11: 'chair' },
  'id': { 4: 'cust1', 5: 'cust1', 6: 'cust1', 7: 'cust1',
          8: 'cust2', 9: 'cust2', 10: 'cust2', 11: 'cust2' }
})

itpZip = list(zip(df['id'][::2], df['type'][::2], [*df['type'][1::2], None]))
getProdTypes = lambda c: set(ctp[1] for ctp in itpZip if c==ctp[0])
custProdTypes = {ci: getProdTypes(ci) for ci in set(df['id'][::2])}
df = pd.DataFrame([{
    'cust': ci, **{ti:', '.join(
        p for c,t,p in itpZip if c==ci and t==ti
      ) for ti in getProdTypes(ci)}
} for ci in set(df['id'][::2])])

[ view stages and output ]

赞(0）回复(0）举报 2023-03-11

我来回答

如何使每隔一行一列使用行以下的值在Pandas？

3条答案

代码

结果

相关问题

热门标签

最新问答