json 使用$limit和$offset在API上获取超过1,000行

qv7cva1a  于 2023-05-02  发布在  其他
关注(0)|答案(3)|浏览(122)

我使用以下Python代码通过API提取数据

response = requests.get('https://healthdata.gov/resource/uqq2-txqb.json')

数据集包含434,865行,但当我使用API时,它只返回前1,000行。我在另一个问题中看到,$limit可以用来获取前50,000行,但我如何将其与$offset结合起来获取所有434,865行?

**我弄清楚了如何使用$offset,现在有了结果代码,有没有办法压缩它?

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=50001')
response3 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=100002')
response4 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=150003')
response5 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=200004')
response6 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=250005')
response7 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=300006')
response8 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=350007')
response9 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=400008')
dzhpxtsq

dzhpxtsq1#

这被称为paging,你可以在这里找到文档:https://dev.socrata.com/docs/paging.html
其中,还指定了两个版本的API:

  • v2.0,其中$limit最大可为50,000
  • v2.1,其中$limit是无限的

您正在使用的端点似乎支持v2。1,至少基于这个https://dev.socrata.com/foundry/healthdata.gov/uqq2-txqb,所以你应该能够为$limit使用一个大的值,并一次检索整个集合。
在分页路由中,$offset的值是0-based,因此您的查询应该正确地重写为:

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=50000')
response3 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=100000')
response4 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=150000')

注意$limit的倍数上的对齐。

zhte4eai

zhte4eai2#

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=50001')
response3 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=100002')
response4 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=150003')
response5 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=200004')
response6 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=250005')
response7 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=300006')
response8 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=350007')
response9 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=400008')
qij5mzcb

qij5mzcb3#

# Initialize variables for pagination

limit = 50000
offset = 0
data = []

while True:

    # Set query parameters
    params = {
        '$limit': limit,
        '$offset': offset
    }

    # Make a GET request to the API endpoint with the query parameters
    response = requests.get(url, headers=headers, params=params)

相关问题