csv 无法解析Python中的JSON格式数据

s2j5cfk0  于 2023-09-28  发布在  Python
关注(0)|答案(3)|浏览(101)

此代码获取的数据未格式化为正确的csv格式。

import requests
import csv

def Download_data():
    s = requests.Session()
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36'}
    s.headers.update(headers)
    resp = s.get('https://www.nseindia.com/market-data/live-equity-market')
    resp.raise_for_status()
    resp = s.get('https://www.nseindia.com/api/equity-stockIndices?csv=true&index=NIFTY%2050')
    resp.raise_for_status()
    data_79 = resp.text
    data_79 = resp.text.replace('","', '')
    with open('___N50__.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerows([line.split(',') for line in data_79.splitlines()])

if __name__ == "__main__":
    Download_Fresh_data2()

标题都在行中而不是一列中。
经过大量的学习,我试图达到这个代码,但从我的Angular 来看,它仍然是不够的知识。请帮帮忙!

63lcw9qa

63lcw9qa1#

一种可能性是跳过标题行并显式添加它们:

from io import StringIO

import pandas as pd
import requests

def download_data():
    url = "https://www.nseindia.com/api/equity-stockIndices?csv=true&index=NIFTY%2050"
    headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"}

    with requests.Session() as request:
        request.get(url="https://www.nseindia.com/market-data/live-equity-market", headers=headers)
        response = request.get(url=url, headers=headers)
    if response.status_code != 200:
        print(response.raise_for_status())

    df = pd.read_csv(filepath_or_buffer=StringIO(initial_value=response.text), skiprows=13, header=None,
                     names=["SYMBOL", "OPEN", "HIGH", "LOW", "PREV. CLOSE", "LTP",
                            "CHNG", "%CHNG", "VOLUME", "52W H", "52W L",
                            "30 D %CHNG", "365 D % CHNG", "TODAY"])
    df.to_csv(path_or_buf="/path/to/file/N50.csv", index=False)

if __name__ == "__main__":
    download_data()
o4tp2gmn

o4tp2gmn2#

你用文本编辑器看过数据了吗?resp.text似乎以一个BOM开始,标题的每个字段都以一个换行符结束。IMHO数据需要一些清理:

# ...

    // strip BOM:
    data_79 = re.sub(r'\A[^"]+"', '"', resp.text, 1)

    // strip unwanted linefeeds:
    data_79 = re.sub(r'([^"])\n', '\\1', data_79)

    // save the data in a file
    with open('___N50__.csv', 'w', newline='') as file:
        file.write(data_79)
rseugnpd

rseugnpd3#

如果你访问的url有一个有效的CSV文件,那么你可以直接将csv读取到pandas dataframe中,然后保存到本地机器,如下所示:

import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))
c.to_csv(r"C:\users\123\Downloads\countries.csv")

相关问题