pandas 如何通过追加异构列值来创建 Dataframe

7vhp5slm  于 2023-04-19  发布在  其他
关注(0)|答案(2)|浏览(108)

我想创建一个dataframe像2列和几行

[
 ['text1',[float1, float2, float3]]
 ['text2',[float4, float5, float6]]
 .
 .
 .
]

列的名称应该是contentembeddingstext1text2content列,浮点数列表在embeddings列中。
我写的代码是

mycontent = ["i live in space","i live my life to fullest", "dogs live in kennel","we live to eat and not eat to live","cricket lives in heart of every indian","live and let live","my house is in someplace","my office is in someotherplace","chair is red"]

contents_and_embeddings_df = pd.DataFrame(columns=['content','embeddings'])

for content in mycontent:
    embedding = get_embedding(content,engine='textsearchcuriedoc001mc') #returns list of floats
    contents_and_embeddings_df.append(pd.DataFrame([content,embedding]))
   

contents_and_embeddings_df

在输出中,我得到了几个contents_and_embeddings_df.append(pd.DataFrame([content,embedding])) /tmp/ipykernel_15879/3971327095.py:8: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. contents_and_embeddings_df.append(pd.DataFrame([content,embedding]))的警告
dataframe的内容为空。我只看到两个头-content embeddings
我也尝试了一些其他的方法,但无法创建所需的dataframe

for content in mycontent:
    embedding = get_embedding(content,engine='textsearchcuriedoc001mc')
    #pd.concat(contents_and_embeddings_df,pd.DataFrame([content,embedding])) --> doesn't work
#contents_and_embeddings_df.append(pd.DataFrame([content,embedding])) --> doesn't work
    tempdf = pd.DataFrame([content,embedding]) #doesn't work.
#    tempdf = pd.DataFrame([content,embedding], columns=['content','embeddings']) --> doesn't compile
    contents_and_embeddings_df.add(tempdf) # doesn't work. 
 

contents_and_embeddings_df #shows empty
wribegjk

wribegjk1#

DataFrame.append已被弃用。这是一个低效的方法,因为它会分配新的内存来存储结果 Dataframe 。它已在pandas 2.0中被删除。
最好在将这两个列组装成 Dataframe 之前分别构造它们:

embeddings = [get_embedding(content, engine="textsearchcuriedoc001mc") for content in mycontent]
contents_and_embeddings_df = pd.DataFrame({
    "content": mycontent,
    "embeddings": embeddings
})
yhxst69z

yhxst69z2#

你可以简单地使用列表解析来生成dataframe数据:

data = [(content, get_embedding(content,engine='textsearchcuriedoc001mc')) for content in mycontent]
contents_and_embeddings_df = pd.DataFrame(data, columns=['content','embeddings'])

相关问题