I have a Pandas dataframe that looks like:
mwe5a = pd.DataFrame({'a': [0.1, 0.0],
'b': [0.0, 0.2],
'c': [0.3, 0.0]
}
)
mwe5a
a b c
0 0.1 0.0 0.3
1 0.0 0.2 0.0
My desired output is:
mwe5b
output_column
[0.1, 0.3]
[0.2]
How do I do that?
After that, I'd like to sort the order of a column in another Pandas dataframe based on those values, from largest value to least.
mwe7a = pd.DataFrame({'items': [ ['item1', 'item2'],
['item3']
]})
['item1', 'item2']
['item3']
which should then look like
mwe7b
['item2', 'item1']
['item3']
UPDATE:
I updated the MWE dataframes to be less confusing. So to review, I can get the following to work:
token_uniqueness_sparse = pd.DataFrame({'token_a': [0.1, 0.0],
'token_b': [0.0, 0.2],
'token c': [0.3, 0.0]
}
)
token_uniqueness_sparse
token_a token_b token c
0 0.1 0.0 0.3
1 0.0 0.2 0.0
sf_fake = pd.DataFrame({'items': [ ['token_a', 'token_c'],
['token_b']],
'rcol': [1,2]
})
sf_fake
items rcol
0 [token_a, token_c] 1
1 [token_b] 2
token_uniqueness_dense = (token_uniqueness_sparse
.apply(lambda x: list(x[x.ne(0)]), axis=1)
.to_frame('output_column'))
token_uniqueness_dense
output_column
0 [0.1, 0.3]
1 [0.2]
(sf_fake.apply(lambda x: sorted(x['items'], key=lambda y: token_uniqueness_dense.loc[x.name,
'output_column'][x['items'].index(y)], reverse=True), axis=1))
So I know the solution works. But when I attempt to apply it to my actual dataframes and not the toy ones above, I get the following error:
Input In [76], in <lambda>(x)
----> 1 (forbes_df.apply(lambda x: sorted(x['tokenized_company_name'],
2 key=lambda y: tfidf_df_dense.loc[x.name,
3 'output_column'][x['tokenized_company_name'].index(y)], reverse=True), axis=1))
Input In [76], in <lambda>.<locals>.<lambda>(y)
1 (forbes_df.apply(lambda x: sorted(x['tokenized_company_name'],
----> 2 key=lambda y: tfidf_df_dense.loc[x.name,
3 'output_column'][x['tokenized_company_name'].index(y)], reverse=True), axis=1))
IndexError: list index out of range
Any ideas what to check for?
1条答案
按热度按时间xwbd5t1u1#
可能的解决方案:
输出量:
编辑
为了实现OP希望使用
mwe7a
实现的目标,我提供了以下解决方案:要获取
mwe5b
而不进行排序,如获取mwe7a
所需:输出量: