pandas 如何将从现有列派生的句子嵌入添加到新列中?

xcitsw88  于 9个月前  发布在  其他
关注(0)|答案(2)|浏览(108)

我有一个有四个nw_data='Qn_id','Qn_context','Qns','Anwsers'的结构。

Qn_id  |     Qn_context       |   Qns        |     Anwsers
 01    | In 1962, Uk gave...  | what year....| the year 1962 was.....
 02    | Major kanuti raised..| Who raised...| Kanuti akorimo rasied.

字符串
我想向该数据集添加第五列,该数据集由列“Answers”]的句子嵌入组成。
我正在使用sentence_transformers来生成句子嵌入。

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')


我尝试使用一种方法,其中:

#Created a var for the column
sent = nw_data['Answers']


#Passed the variable sent into the model and created the embeddings
embeddings = model.encode(sent)


然后

#Tried passing the embeddings into a new column named Embeddings
nw_data['Embeddings'] = embeddings


我得到一个错误:

KeyError: 'Embeddings'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
KeyError: 'Embeddings'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
   1978         if len(placement) != len(values):
   1979             raise ValueError(
-> 1980                 f"Wrong number of items passed {len(values)}, "
   1981                 f"placement implies {len(placement)}"
   1982             )

ValueError: Wrong number of items passed 384, placement implies 1


如何创建这些嵌入并将其添加到同一个嵌套nw_data!!中的新列中?
无论如何,这是可能的,建议尝试使用**.apply()方法lambda函数**,但问题是我不确定如何或何时使用它们。

dfuffjeb

dfuffjeb1#

如果我理解正确的话,你想在单元格中插入一个列表(嵌入)。
尝试使用at

>>> import pandas as pd
>>> from sentence_transformers import SentenceTransformer
>>> sentences = 'Absence of sanity'
>>> embedding = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2], 'Embedding': None})
>>> df.at[0, 'Embedding'] = embedding.tolist()
>>> df.dtypes
foo           int64
Embedding    object
>>> df.head()
dtype: object
   foo                                          Embedding
0    1  [0.2954030930995941, 0.29181134700775146, 2.16...
1    2                                               None

字符串
如果你有多个句子,只需传递列表:

>>> import pandas as pd
>>> sentences = ['Absence of sanity', 'its a new day', 'make the best of it']
>>> embeddings = model.encode(sentences)
>>> df = pd.DataFrame({'foo': [1, 2, 3], 'Embedding': None})
>>> df['Embedding'] = embeddings.tolist()
>>> print(df.head())
   foo                                          Embedding
0    1  [0.29540303349494934, 0.29181137681007385, 2.1...
1    2  [0.0362740121781826, -0.8035800457000732, 2.44...
2    3  [-0.4539063572883606, -0.4333038330078125, 2.2...

inn6fuwd

inn6fuwd2#

我找到了另一种方法来做到这一点,请告诉我,如果它的工作:

def embed_text(sentence):
       return model.encode(sentence)
nw_data['Embeddings'] = nw_data['Answers'].apply(embed_text)

字符串

相关问题