python-3.x 将Pandas Dataframe 的列连接到仅包含非零值的列表的新列中

pinkon5k  于 2022-12-05  发布在  Python
关注(0)|答案(1)|浏览(137)

I have a Pandas dataframe that looks like:

mwe5a = pd.DataFrame({'a': [0.1, 0.0],
                      'b': [0.0, 0.2],
                      'c': [0.3, 0.0]
                    }
                   )
mwe5a

    a      b       c
0   0.1    0.0     0.3
1   0.0    0.2     0.0

My desired output is:

mwe5b

output_column
[0.1, 0.3]
[0.2]

How do I do that?
After that, I'd like to sort the order of a column in another Pandas dataframe based on those values, from largest value to least.

mwe7a = pd.DataFrame({'items': [ ['item1', 'item2'],
                                 ['item3']
                               ]})

['item1', 'item2']
['item3']

which should then look like

mwe7b

['item2', 'item1']
['item3']

UPDATE:
I updated the MWE dataframes to be less confusing. So to review, I can get the following to work:

token_uniqueness_sparse = pd.DataFrame({'token_a': [0.1, 0.0],
                                        'token_b': [0.0, 0.2],
                                        'token c': [0.3, 0.0]
                                       }
                                      )
token_uniqueness_sparse

token_a token_b token c
0   0.1 0.0 0.3
1   0.0 0.2 0.0

sf_fake = pd.DataFrame({'items': [ ['token_a', 'token_c'],
                                   ['token_b']],
                        'rcol': [1,2]
                       })
sf_fake

items   rcol
0   [token_a, token_c]  1
1   [token_b]   2

token_uniqueness_dense = (token_uniqueness_sparse
         .apply(lambda x: list(x[x.ne(0)]), axis=1)
         .to_frame('output_column'))
token_uniqueness_dense

output_column
0   [0.1, 0.3]
1   [0.2]

(sf_fake.apply(lambda x: sorted(x['items'], key=lambda y: token_uniqueness_dense.loc[x.name,
 'output_column'][x['items'].index(y)], reverse=True), axis=1))

So I know the solution works. But when I attempt to apply it to my actual dataframes and not the toy ones above, I get the following error:

Input In [76], in <lambda>(x)
----> 1 (forbes_df.apply(lambda x: sorted(x['tokenized_company_name'], 
      2                                   key=lambda y: tfidf_df_dense.loc[x.name,
      3  'output_column'][x['tokenized_company_name'].index(y)], reverse=True), axis=1))

Input In [76], in <lambda>.<locals>.<lambda>(y)
      1 (forbes_df.apply(lambda x: sorted(x['tokenized_company_name'], 
----> 2                                   key=lambda y: tfidf_df_dense.loc[x.name,
      3  'output_column'][x['tokenized_company_name'].index(y)], reverse=True), axis=1))

IndexError: list index out of range

Any ideas what to check for?

xwbd5t1u

xwbd5t1u1#

可能的解决方案:

mwe5b = (mwe5a
         .apply(lambda x: list(x[x.ne(0)].sort_values(ascending=False)), axis=1)
         .to_frame('output_column'))

输出量:

output_column
0    [0.3, 0.1]
1         [0.2]

编辑

为了实现OP希望使用mwe7a实现的目标,我提供了以下解决方案:

(mwe7a.apply(lambda x: sorted(x['items'], key=lambda y: mwe5b.loc[x.name,
 'output_column'][x['items'].index(y)], reverse=True), axis=1))

要获取mwe5b而不进行排序,如获取mwe7a所需:

mwe5b = (mwe5a
         .apply(lambda x: list(x[x.ne(0)]), axis=1)
         .to_frame('output_column'))

输出量:

0    [item2, item1]
1           [item3]

相关问题