我需要它合并'target_cw_id
'上的' companies
'数据集和' locations
'数据集以及'relations
'数据集中的'source_cw_id
'
公司
row_id cw_id cik company_name source_type source_id
0 1 1 20.0 MOTHER COMPANY filers 35791
1 2 2 1750.0 FATHER COMPANY filers 40788
2 3 3 1800.0 LITTLE SISTER filers 60238
3 4 4 1800.0 MIDDLE SISTER filers 60238
4 5 5 2132.0 BABY BROTHER filers 8286
5 6 6 543.0 NAUGHTY COUSIN filers 8286
6 7 7 4546.0 BIG BROTHER filers 8286
字符串
关系
relation_id target_cw_id source_cw_id relation_type relation_origin origin_id year
0 1 3 1 NaN relationships 2507504 2010
1 2 4 1 NaN relationships 824847 2005
2 3 5 2 NaN relationships 841281 2006
3 4 6 2 NaN relationships 864758 2007
4 5 7 2 NaN relationships 1288382 2008
型
位置
cw_id country_code
0 1 US
1 2 AT
2 3 US
3 4 US
5 5 SU
6 6 US
7 7 US
型
这和预期的一样有效,但我想减少它的冗余
merged = pd.merge(left=relations, right=companies, left_on="source_cw_id", right_on="cw_id", how="left")
merged = pd.merge(left=merged, right=companies, left_on="target_cw_id", right_on="cw_id", how="left", suffixes=('_source', '_target'))
merged = pd.merge(left=merged, right=locations, left_on="source_cw_id", right_on="cw_id", how="left")
merged = pd.merge(left=merged, right=locations, left_on="target_cw_id", right_on="cw_id", how="left", suffixes=('_source', '_target'))
型
所以我尝试map
和lambda
merged = pd.DataFrame()
dfs = [relations, merged, merged, merged]
dfs2 = [companies, companies, locations, locations]
ids = ["source_cw_id","target_cw_id","source_cw_id","target_cw_id"]
merged = map(lambda x, y, z: pd.merge(left=x, right=y, left_on=z, right_on="cw_id", how="left",suffixes=('_source','_target')), dfs,dfs2,ids)
型
然而,第一次迭代返回一个列表而不是一个数组,然后我得到一个KeyError "target_cw_id"
个
这些是我希望在最终文件中的列名:
[u'relation_id', u'source_cw_id', u'target_cw_id', u'relation_type',
u'relation_origin', u'origin_id', u'year', u'row_id_source',
u'cw_id_source', u'cik_source', u'company_name_source',
u'source_type_source', u'source_id_source', u'row_id_target',
u'cw_id_target', u'cik_target', u'company_name_target',
u'source_type_target', u'source_id_target', u'cw_id_source',
u'country_code_source', u'cw_id_target', u'country_code_target']
型
1条答案
按热度按时间eanckbw91#
首先,您使用
map
错误(文档)字符串
第一个参数是一个函数,这是正确的。第二个参数是一个可迭代的,这是错误的。
你执行了4个合并,我希望有4个项目被合并。我会做下面的事情。
型