使用www.example.com _csv从url阅读csv文件时出现问题pandas.read

dgjrabp2  于 2022-12-06  发布在  其他
关注(0)|答案(1)|浏览(130)

我正在尝试从以下URL导入csv文件

"https://www.marketwatch.com/games/stackoverflowq/download?view=holdings&pub=4JwsLs_Gm4kj&isDownload=true"

使用panda read_csv函数。但是,我得到以下错误:

StopIteration: 

The above exception was the direct cause of the following exception:
...
--> 386         raise EmptyDataError("No columns to parse from file") from err
    388     line = self.names[:]
    390 this_columns: list[Scalar | None] = []

EmptyDataError: No columns to parse from file

手动下载csv,然后用pd.read_csv阅读,可以得到预期的输出,没有任何问题。由于我需要对多个csv重复此操作,我希望直接导入csv,而不必每次都手动下载。
我也尝试过这个解决方案[https://stackoverflow.com/questions/47243024/pandas-read-csv-on-dynamic-url-gives-emptydataerror-no-columns-to-parse-from-fi](https://stackoverflow.com/questions/47243024/pandas-read-csv-on-dynamic-url-gives-emptydataerror-no-columns-to-parse-from-fi%5B%5D(https://www.stackoverflow.com/)),它也导致了"No columns to parse from file“错误。
我只能从html和网站上的按钮找到一个链接,没有.csv结尾:

<a href="/games/stackoverflowq/download?view=holdings&amp;pub=4JwsLs_Gm4kj&amp;isDownload=true" download="Holdings - Stack Overflowq.csv" rel="nofollow">Download</a>

编辑:清理问题,以防有人有类似的问题。

rxztt3cl

rxztt3cl1#

问题确实是数据只能在登录后访问。我已经设法使用Selenium和这个answer解决了这个问题。

from io import StringIO 
import pandas as pd
import requests
from selenium import webdriver

#start requests session with login from selenium driver
s = requests.Session()
selenium_user_agent = driver.execute_script("return navigator.userAgent;")
s.headers.update({"user-agent": selenium_user_agent})

#copy cookies from selenium driver
for cookie in driver.get_cookies():
    s.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])

#read csv
response = s.get(url)
if response.ok:
    data = response.content.decode('utf8') 
    df = pd.read_csv(StringIO(data))

相关问题