pandas 在计数矢量器()中未找到get_feature_names

wqsoz72f 于 2023-01-15 发布在其他

关注(0)|答案(2)|浏览(168)

我正在挖掘关于深度学习库的帖子的Stack Overflow数据转储。我想识别语料库中的停用词（例如'python'）。我想获取我的功能名称，以便识别术语频率最高的单词。
我创建的文档和语料库如下所示：

with open("StackOverflow_2018_Data.csv") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    pytorch_doc = ''
    tensorflow_doc = ''
    cotag_list = []
    keras_doc = ''
    counte = 0
    for row in csv_reader:
        if row[2] == 'tensorflow':
            tensorflow_doc += row[3] + ' '
        if row[2] == 'keras':
            keras_doc += row[3] + ' '
        if row[2] == 'pytorch':
            pytorch_doc += row[3] + ' '

corpus = [pytorch_doc, tensorflow_doc, keras_doc]
vectorizer = CountVectorizer()
x = vectorizer.fit_transform(corpus)
print(x)
x.toarray()
Dict = []
feat = x.get_feature_names()
for i,arr in enumerate(x):
    for x, ele in enumerate(arr):
        if i == 0:
            Dict += ('pytorch', feat[x], ele)
        if i == 1:
            Dict += ('tensorflow', feat[x], ele)
        if i == 2:
            Dict += ('keras', feat[x], ele)

sorted_arr = sorted(Dict, key=lambda tup: tup[2])

然而，我得到：

File "sklearn_stopwords.py", line 83, in <module>
    main()
  File "sklearn_stopwords.py", line 50, in main
    feat = x.get_feature_names()
  File "/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py", line 686, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: get_feature_names not found

pandas

来源：https://stackoverflow.com/questions/55523491/get-feature-names-not-found-in-countvectorizer

2条答案

按热度按时间

w8rqjzmb1#

get_feature_names是CountVectorizer对象中的方法。您正在尝试访问fit_transform（scipy.稀疏矩阵）的结果get_feature_names。
您需要使用vectorizer.get_feature_names()。
尝试此MVCE：

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = ['This is the first document.',
          'This is the second second document.',
          'And the third one.',
          'Is this the first document?']

X = vectorizer.fit_transform(corpus)

features = vectorizer.get_feature_names()

features

输出：

['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

赞(0）回复(0）举报 2023-01-15

wljmcqd82#

确保您使用的sklearn版本为1.0或更高版本。
方法**get_feature_names_out（）**替换了已经弃用并删除的get_feature_names（）方法。

示例：

from sklearn.feature_extraction.text import CountVectorizer

n_gram_range = (1, 1)
stop_words = "english"

doc = """
         Supervised learning is the machine learning task of 
         learning a function that maps an input to an output based 
         on example input-output pairs.
      """

# Extract candidate words/phrases
count = CountVectorizer(ngram_range=n_gram_range,
                        stop_words=stop_words).fit([doc])

# candidates = count.get_feature_names()
candidates = count.get_feature_names_out()
candidates

输出：

array(['based', 'example', 'function', 'input', 'learning', 'machine',
       'maps', 'output', 'pairs', 'supervised', 'task'], dtype=object)

赞(0）回复(0）举报 2023-01-15

我来回答

pandas 在计数矢量器()中未找到get_feature_names

2条答案

示例：

输出：

相关问题

热门标签

最新问答