如何从json行格式的字段中提取数据并将其作为文本存储在python中的新文件中

2ledvvac 于 2023-01-29 发布在 Python

关注(0)|答案(1)|浏览(138)

我有json文件，看起来像这样：

{"reviewerID": "A11N155CW1UV02", "asin": "B000H00VBQ", "reviewerName": "AdrianaM", "helpful": [0, 0], "reviewText": "I had big expectations because I love English TV, in particular Investigative and detective stuff but this guy is really boring. It didn't appeal to me at all.", "overall": 2.0, "summary": "A little bit boring for me", "unixReviewTime": 1399075200, "reviewTime": "05 3, 2014"}
{"reviewerID": "A3BC8O2KCL29V2", "asin": "B000H00VBQ", "reviewerName": "Carol T", "helpful": [0, 0], "reviewText": "I highly recommend this series. It is a must for anyone who is yearning to watch \"grown up\" television. Complex characters and plots to keep one totally involved. Thank you Amazin Prime.", "overall": 5.0, "summary": "Excellent Grown Up TV", "unixReviewTime": 1346630400, "reviewTime": "09 3, 2012"}
{"reviewerID": "A60D5HQFOTSOM", "asin": "B000H00VBQ", "reviewerName": "Daniel Cooper \"dancoopermedia\"", "helpful": [0, 1], "reviewText": "This one is a real snoozer. Don't believe anything you read or hear, it's awful. I had no idea what the title means. Neither will you.", "overall": 1.0, "summary": "Way too boring for me", "unixReviewTime": 1381881600, "reviewTime": "10 16, 2013"}

我需要从字段"summary"和"reviewText"中提取数据，并将其存储在两个新文件中，以便进行进一步分析，比如标记化。
我正在尝试：

import json
rt = open("review.txt", "a") #creates new file for storage
su = open("summary.txt", "a")

with open("/Users/anano/Desktop/MAXWELL/SPRING/NLP/Amazon_Instant_Video_5.json") as json_file:
    for line in json_file: #runs the loop to extract info
        data = json.loads(line)
        rt.write(data['reviewText'])
        su.write(data['summary'])
        rt.close()
        su.closed()

因为summary中的句子末尾没有悬挂点（点），所以它将所有字符串保存为一个句子，如下所示：

A little bit boring for meExcellent Grown Up TVWay too boring for meRobson Green is mesmerizing

这使得标记化不可能。我该如何解决这个问题？

python

来源：https://stackoverflow.com/questions/75264433/how-to-extract-data-from-field-in-json-line-format-and-store-it-in-a-new-file-in

1条答案

按热度按时间

djp7away1#

你所需要做的就是在句尾添加\n。（\n是字符串的转义字符，它将被替换为new line对象）
因此，您的代码如下所示：

import json
rt = open("review.txt", "a") #creates new file for storage
su = open("summary.txt", "a")

with open("/Users/anano/Desktop/MAXWELL/SPRING/NLP/Amazon_Instant_Video_5.json") as json_file:
    for line in json_file: #runs the loop to extract info
        data = json.loads(line)
        rt.write(data['reviewText'] + '\n')
        su.write(data['summary'] + '\n')

    rt.close()
    su.close()

赞(0）回复(0）举报 2023-01-29

我来回答

如何从json行格式的字段中提取数据并将其作为文本存储在python中的新文件中

1条答案

相关问题

热门标签

最新问答