如何从zip文件python中读取csv?

yqlxgs2m  于 2023-11-20  发布在  Python
关注(0)|答案(2)|浏览(123)

我试图读取csv这是在zip文件.我的任务是读取文件rad_15min.csv文件,但问题是当我读取zip文件(我复制链接地址通过点击下载按钮)它给我错误:

产品代码:

import pandas as pd
df = pd.read_csv('https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich/download?datasetVersionNumber=7')

字符串

**错误:**ParserError:标记数据时出错。C错误:第9行应为1个字段,看到2个

数据:https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich
Zip文件链接:https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich/download?datasetVersionNumber=7
我必须动态地读取这个csv,我不想下载它,所有只是为了制作一个下载链接,然后动态地读取csv。有没有其他方法,我可以尝试?

2lpgd968

2lpgd9681#

对我来说,它是转发到HTML页面,而不是下载。为什么不使用提供的kaggle API?(您需要首先提供一个令牌)
这就是我所尝试的:

import csv
import requests

url = 'https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich/download?datasetVersionNumber=7'

# Open the URL and create a response object
response = requests.get(url)

# Create a CSV reader object
csv_reader = csv.reader(response.iter_lines(decode_unicode=True), delimiter=',')

# Iterate over each row in the CSV file
for row in csv_reader:
    # Process each row as needed
    print(row)

字符串
我得到的结果是:

[]
[]
['<!DOCTYPE html>']
['<html lang="en">']
[]
['<head>']
['  <title>Bike Traffic in Munich | Kaggle</title>']
['  <meta charset="utf-8" />']
['    <meta name="robots" content="index', ' follow" />']
['  <meta name="description" content="Bike traffic measured over time at different stations in Munich." />']
['  <meta name="turbolinks-cache-control" content="no-cache" />']

hlswsv35

hlswsv352#

  • 我尝试使用kaggle API..但我不想下载数据,只是动态读取 *.
  • 我只想读取一个名为rad15_min.csvzip文件,带有pandas*

您可以尝试使用__Host-KAGGLEID cookie创建request
我不确定是否有一个简单的方法来获得这个cookie,但你可以硬编码它。在键盘上,按下(CTRL+ CTRL +I)打开浏览器的开发者工具,转到Applications/Cookies并复制相关cookie(并确保你在kaggle之前登录过)。

import requests

url = "https://www.kaggle.com/datasets/" \
      "lucafrance/bike-traffic-in-munich/" \
      "download?datasetVersionNumber=7"

cookies = {"__Host-KAGGLEID": "CfDJ8IPkmlRqhQhDn1PidxljKKQWcrozwJuFfsIn..."}

response = requests.get(url, cookies=cookies)

from zipfile import ZipFile
from io import BytesIO

with ZipFile(BytesIO(response.content)) as zf:
    df = pd.read_csv(zf.open("rad_15min.csv")) # not rad15_min.csv

字符串
注意:如果zip只有一个csv,或者数据集不是存档(* 即单个csv*),则可以将BytesIO(response.content)直接传递给read_csv
输出量:

print(df)

              datum uhrzeit_start  ... richtung_2 gesamt
0        2017.01.01         00:00  ...          0      0
1        2017.01.01         00:00  ...          0      0
2        2017.01.01         00:00  ...          0      0
3        2017.01.01         00:00  ...          0      0
4        2017.01.01         00:00  ...          0      0
...             ...           ...  ...        ...    ...
1255761  2022.12.31         23:45  ...          2      7
1255762  2022.12.31         23:45  ...          0      0
1255763  2022.12.31         23:45  ...          0      0
1255764  2022.12.31         23:45  ...          0      0
1255765  2022.12.31         23:45  ...          5     17

[1255766 rows x 7 columns]

相关问题