如何将阿拉伯语写入CSV文件

5jvtdoz2  于 2022-12-06  发布在  其他
关注(0)|答案(3)|浏览(166)

我尝试用Python提取tweet并将其存储在CSV文件中,但似乎无法包含所有语言。阿拉伯语显示为特殊字符。

def recup_all_tweets(screen_name,api):
    all_tweets = []
    new_tweets = api.user_timeline(screen_name,count=300)
    all_tweets.extend(new_tweets)
    #outtweets = [[tweet.id_str, tweet.created_at, tweet.text,tweet.retweet_count,get_hashtagslist(tweet.text)] for tweet in all_tweets]
    outtweets = [[tweet.text,tweet.entities['hashtags']] for tweet in all_tweets]
  #  with open('recup_all_tweets.json', 'w', encoding='utf-8') as f:
   #     f.write(json.dumps(outtweets, indent=4, sort_keys=True))

    with open('recup_all_tweets.csv', 'w',encoding='utf-8') as f:
       writer = csv.writer(f,delimiter=',')
       writer.writerow(["text","tag"])
       writer.writerows(outtweets)
   # pass

    return(outtweets)
cqoc49vn

cqoc49vn1#

编写CSV和JSON的示例:

#coding:utf8
import csv
import json

s = ['عربى','عربى','عربى']

with open('output.csv','w',encoding='utf-8-sig',newline='') as f:
    r = csv.writer(f)
    r.writerow(['header1','header2','header3'])
    r.writerow(s)

with open('output.json','w',encoding='utf8') as f:
    json.dump(s,f,ensure_ascii=False)

output.csv:

header1,header2,header3
عربى,عربى,عربى

在Excel中查看输出.csv:

output.json:

["عربى", "عربى", "عربى"]

注意Microsoft Excel需要utf-8-sig才能正确读取UTF-8文件。其他应用程序可能需要也可能不需要它才能正确查看。许多Windows应用程序要求在文本文件的开头使用UTF-8“BOM”签名,或者采用ANSI编码。ANSI编码因所用Windows的本地化版本而异。

jqjz2hbq

jqjz2hbq2#

也许可以试试f.write(json.dumps(outtweets, indent=4, sort_keys=True, ensure_ascii=False))

ojsjcaue

ojsjcaue3#

我搜索了很多,最后写了下面的代码:

import arabic_reshaper
from bidi.algorithm import get_display
import numpy as np
itemsX = webdriver.find_elements(By.CLASS_NAME,"x1i10hfl")
item_linksX = [itemX.get_attribute("href") for itemX in itemsX]
item_linksX = filter(lambda k: '/p/' in k, item_linksX)
counter = 0
for item_linkX in item_linksX:
     AllComments2 = []
     counter = counter + 1
     webdriver.get(item_linkX)
     print(item_linkX)
     sleep(11) 
     comments = webdriver.find_elements(By.CLASS_NAME,"_aacl")
     for comment in comments:
         try:
             reshaped_text = arabic_reshaper.reshape(comment.text)
             bidi_text = get_display(reshaped_text)
             AllComments2.append(reshaped_text)
         except:
             pass
    df = pd.DataFrame({'col':AllComments2})
    df.to_csv('C:\Crawler\Comments' + str(counter) + '.csv', sep='\t', encoding='utf-16')

这段代码非常适合我,我希望它能帮助那些没有使用过前一篇代码的人

相关问题