Apache Spark 为什么在合并2个数据框时无法创建具有正确值的数据框

qkf9rpyu  于 2023-04-21  发布在  Apache
关注(0)|答案(1)|浏览(141)

我有2个dataframes。我想把它们组合起来
dataframe 1具有列contentembeddings

myquerycontents_and_embeddings_df.content
0                           i live in space
1                 i live my life to fullest
2                       dogs live in kennel
3        we live to eat and not eat to live
4    cricket lives in heart of every indian
5                         live and let live
6                  my house is in someplace
7            my office is in someotherplace
8                              chair is red
Name: content, dtype: object

myquerycontents_and_embeddings_df.embeddings
0    [0.0016913715517148376, -0.013320472091436386,...
1    [-0.01872972585260868, -0.010366685688495636, ...
2    [8.654659177409485e-05, -0.024498699232935905,...
3    [-0.024393899366259575, -0.008192254230380058,...
4    [-0.021614402532577515, -0.006505827885121107,...
5    [-0.01553483959287405, -0.014875221997499466, ...
6    [0.002573014236986637, -0.005427114199846983, ...
7    [0.013354390859603882, -0.007010389119386673, ...
8    [0.00505671463906765, -0.00909961387515068, -0...
Name: embeddings, dtype: object

dataframe2具有列cosinesimilarity

similarityvaluedf.cosinesimilarity
0    0.994341
1    0.808836
2    0.818914
3    0.727792
4    0.675430
5    0.802331
6    0.849596
7    0.778798
8    0.776794
Name: cosinesimilarity, dtype: float64

我想创建一个新的dataframe,它有3列和8行,但我得到NaN

combineddf = pd.DataFrame((myquerycontents_and_embeddings_df.content,myquerycontents_and_embeddings_df.embeddings,similarityvaluedf.cosinesimilarity),columns=['content','embeddings','cosine_similarity'])
combineddf

j8ag8udp

j8ag8udp1#

我解决了

data = {
    'content': myquerycontents_and_embeddings_df.content.to_list(),
    'embeddings' : myquerycontents_and_embeddings_df.embeddings.to_list(),
    'cosine similarity': similarityvaluedf.cosinesimilarity.to_list()
}
combineddf = pd.DataFrame(data)
combineddf

相关问题