python 无法从实用程序导入process_tweets

vsdwdz23  于 2023-01-19  发布在  Python
关注(0)|答案(6)|浏览(170)

感谢您的调查,我有一个python程序,我需要有一些NLP任务的process_tweetbuild_freqsnltk已经安装和utils没有,所以我通过pip install utils安装它,但上面提到的两个模块显然没有安装,我得到的错误是标准的一个在这里,

ImportError: cannot import name 'process_tweet' from
'utils' (C:\Python\lib\site-packages\utils\__init__.py)

我做错了什么,或者有什么遗漏?我还提到了This stackoverflow answer,但它没有帮助。

lymgl2op

lymgl2op1#

您可以使用??轻松访问任何源代码,例如在本例中:process_tweet??(上面的代码来自deeplearning.aiNLP课程自定义工具库):

def process_tweet(tweet):
"""Process tweet function.
Input:
    tweet: a string containing a tweet
Output:
    tweets_clean: a list of words containing the processed tweet

"""
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')
# remove stock market tickers like $GE
tweet = re.sub(r'\$\w*', '', tweet)
# remove old style retweet text "RT"
tweet = re.sub(r'^RT[\s]+', '', tweet)
# remove hyperlinks
tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
# remove hashtags
# only removing the hash # sign from the word
tweet = re.sub(r'#', '', tweet)
# tokenize tweets
tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True,
                           reduce_len=True)
tweet_tokens = tokenizer.tokenize(tweet)

tweets_clean = []
for word in tweet_tokens:
    if (word not in stopwords_english and  # remove stopwords
            word not in string.punctuation):  # remove punctuation
        # tweets_clean.append(word)
        stem_word = stemmer.stem(word)  # stemming word
        tweets_clean.append(stem_word)
2uluyalo

2uluyalo2#

尝试这个代码,它应该工作:

def process_tweet(tweet):
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')
tweet = re.sub(r'\$\w*', '', tweet)
tweet = re.sub(r'^RT[\s]+', '', tweet)
tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
tweet = re.sub(r'#', '', tweet)
tokenizer = TweetTokenizer(preserve_case=False,        strip_handles=True,reduce_len=True)
tweet_tokens = tokenizer.tokenize(tweet)

tweets_clean = []
for word in tweet_tokens:
    if (word not in stopwords_english and  
            word not in string.punctuation): 
        stem_word = stemmer.stem(word)  # stemming word
        tweets_clean.append(stem_word)

return tweets_clean
ycl3bljg

ycl3bljg3#

如果您正在www.example.com上学习NLP课程deeplearning.ai,那么我相信utils.py文件是由该课程的教师创建的,用于实验会话,不应与常用的实用程序混淆。

yyyllmsg

yyyllmsg4#

我想你不需要把process_tweet作为全部,课程中的代码只是一个捷径,总结了你从开始到词干提取的所有工作;因此,忽略这一步,打印出tweet_stem,查看原始文本和预处理文本之间的差异。

jjhzyzn0

jjhzyzn05#

你可以试试这个。

def preprocess_tweet(tweet):

# cleaning
tweet = re.sub(r'^RT[\s]+','',tweet)

tweet = re.sub(r'https?://[^\s\n\r]+', '', tweet)

tweet = re.sub(r'#', '',tweet)
tweet= re.sub(r'@', '',tweet)

# tokenization

token = TweetTokenizer(preserve_case=False, strip_handles=True,reduce_len=True)

tweet_tokenized = token.tokenize(tweet)

# STOP WORDS

stopwords_english = stopwords.words('english')
tweet_processed = []

for word in tweet_tokenized:
    if (word not in stopwords_english and
       word not in string.punctuation):
        
        tweet_processed.append(word)
        
# stemming 
tweet_stem = []

stem = PorterStemmer()

for word in tweet_processed:
    stem_word = stem.stem(word)
    tweet_stem.append(stem_word)
    
    
    
return tweet_stem

输入和输出

3zwjbxry

3zwjbxry6#

这应该能帮你一路过关斩将。

import re
import string
import numpy as np

from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import TweetTokenizer

def process_tweet(tweet):
    """Process tweet function.
    Input:
        tweet: a string containing a tweet
    Output:
        tweets_clean: a list of words containing the processed tweet
    """
    stemmer = PorterStemmer()
    stopwords_english = stopwords.words('english')
    # remove stock market tickers like $GE
    tweet = re.sub(r'\$\w*', '', tweet)
    # remove old style retweet text "RT"
    tweet = re.sub(r'^RT[\s]+', '', tweet)
    # remove hyperlinks
    tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
    # remove hashtags
    # only removing the hash # sign from the word
    tweet = re.sub(r'#', '', tweet)
    # tokenize tweets
    tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True,
                               reduce_len=True)
    tweet_tokens = tokenizer.tokenize(tweet)

    tweets_clean = []
    for word in tweet_tokens:
        if (word not in stopwords_english and  # remove stopwords
                word not in string.punctuation):  # remove punctuation
            # tweets_clean.append(word)
            stem_word = stemmer.stem(word)  # stemming word
            tweets_clean.append(stem_word)

    return tweets_clean

def build_freqs(tweets, ys):
    """Build frequencies.
    Input:
        tweets: a list of tweets
        ys: an m x 1 array with the sentiment label of each tweet
            (either 0 or 1)
    Output:
        freqs: a dictionary mapping each (word, sentiment) pair to its
        frequency
    """
    # Convert np array to list since zip needs an iterable.
    # The squeeze is necessary or the list ends up with one element.
    # Also note that this is just a NOP if ys is already a list.
    yslist = np.squeeze(ys).tolist()

    # Start with an empty dictionary and populate it by looping over all tweets
    # and over all processed words in each tweet.
    freqs = {}
    for y, tweet in zip(yslist, tweets):
        for word in process_tweet(tweet):
            pair = (word, y)
            if pair in freqs:
                freqs[pair] += 1
            else:
                freqs[pair] = 1

    return freqs

你需要的所有工具模块都在上面。

相关问题