Pandas map返回列表

8fq7wneg  于 2023-11-15  发布在  其他
关注(0)|答案(1)|浏览(86)

我需要它合并'target_cw_id'上的' companies '数据集和' locations '数据集以及'relations'数据集中的'source_cw_id '
公司

row_id     cw_id   cik     company_name    source_type     source_id
0   1   1   20.0    MOTHER COMPANY  filers  35791
1   2   2   1750.0  FATHER COMPANY  filers  40788
2   3   3   1800.0  LITTLE SISTER   filers  60238
3   4   4   1800.0  MIDDLE SISTER   filers  60238
4   5   5   2132.0  BABY BROTHER    filers  8286
5   6   6   543.0   NAUGHTY COUSIN  filers  8286
6   7   7   4546.0  BIG BROTHER     filers  8286

字符串
关系

relation_id     target_cw_id    source_cw_id    relation_type   relation_origin     origin_id   year
0   1   3   1   NaN     relationships   2507504     2010
1   2   4   1   NaN     relationships   824847  2005
2   3   5   2   NaN     relationships   841281  2006
3   4   6   2   NaN     relationships   864758  2007
4   5   7   2   NaN     relationships   1288382     2008


位置

cw_id   country_code
0   1   US
1   2   AT
2   3   US
3   4   US
5   5   SU
6   6   US
7   7   US


这和预期的一样有效,但我想减少它的冗余

merged = pd.merge(left=relations, right=companies, left_on="source_cw_id", right_on="cw_id", how="left")
merged = pd.merge(left=merged, right=companies, left_on="target_cw_id", right_on="cw_id", how="left",  suffixes=('_source', '_target'))
merged = pd.merge(left=merged, right=locations, left_on="source_cw_id", right_on="cw_id", how="left")
merged = pd.merge(left=merged, right=locations, left_on="target_cw_id", right_on="cw_id", how="left",  suffixes=('_source', '_target'))


所以我尝试maplambda

merged = pd.DataFrame()

dfs = [relations, merged, merged, merged]
dfs2 = [companies, companies, locations, locations]
ids = ["source_cw_id","target_cw_id","source_cw_id","target_cw_id"]

merged = map(lambda x, y, z: pd.merge(left=x, right=y, left_on=z, right_on="cw_id", how="left",suffixes=('_source','_target')), dfs,dfs2,ids)


然而,第一次迭代返回一个列表而不是一个数组,然后我得到一个
KeyError "target_cw_id"
这些是我希望在最终文件中的列名:

[u'relation_id', u'source_cw_id', u'target_cw_id', u'relation_type',
       u'relation_origin', u'origin_id', u'year', u'row_id_source',
       u'cw_id_source', u'cik_source', u'company_name_source',
       u'source_type_source', u'source_id_source', u'row_id_target',
       u'cw_id_target', u'cik_target', u'company_name_target',
       u'source_type_target', u'source_id_target', u'cw_id_source',
       u'country_code_source', u'cw_id_target', u'country_code_target']

eanckbw9

eanckbw91#

首先,您使用map错误(文档)

merged = map(
    lambda x, y, z: pd.merge(left=x, right=y, left_on=z, right_on="cw_id", how="left",suffixes=('_source','_target')),
    dfs,dfs2,ids
)

字符串
第一个参数是一个函数,这是正确的。第二个参数是一个可迭代的,这是错误的。
你执行了4个合并,我希望有4个项目被合并。我会做下面的事情。

def mymerge(x, y, z):
    kwargs = dict(left=x, right=y, left_on=z,
                  right_on="cw_id", how="left",
                  suffixes=('_source', '_targee'))
    return pd.merge(**kwargs)

# Initialize merged
merged = relations.copy()

dfs = [merged, merged, merged, merged]
dfs2 = [companies, companies, locations, locations]
ids = ["source_cw_id", "target_cw_id", "source_cw_id", "target_cw_id"]

myiterable = zip(dfs, dfs2, ids)

# Instead of map
for args in myiterables:
    merged = mymerge(*args)

相关问题