大型Json文件将批处理发送到HubSpot API

xfb7svmp  于 2022-12-01  发布在  其他
关注(0)|答案(1)|浏览(110)

我尝试了很多方法,测试了很多场景,做了很多研发工作,但无法找到问题/解决方案
我有一个要求,HubSpot API每次只接受15 k rec,所以我们有大的json文件,所以我们需要像批处理一样拆分/划分15 k rec需要发送api,一旦15 k添加到api中,它将休眠10秒,并像这样捕获每个响应,该过程将继续,直到所有rec完成
我尝试使用块代码和模数运算符,但没有得到任何响应
不确定下面的代码工作与否可以任何人请建议更好的方式
如何发送批次到HubSpot API,如何发布
提前感谢,这对我会有很大的帮助!

with open(r'D:\Users\lakshmi.vijaya\Desktop\Invalidemail\allhubusers_data.json', 'r') as run:
                    dict_run = run.readlines()
                    dict_ready = (''.join(dict_run))
                    count = 1000
                    subsets = (dict_ready[x:x + count] for x in range(0, len(dict_ready), count))
                    url = 'https://api.hubapi.com/contacts/v1/contact/batch'
                    headers = {'Authorization' : "Bearer pat-na1-**************************", 'Accept' : 'application/json', 'Content-Type' : 'application/json','Transfer-encoding':'chunked'}
                    for subset in subsets:
                       #print(subset)
                       urllib3.disable_warnings()
                       r = requests.post(url, data=subset, headers=headers,verify=False, 
                        timeout=(15,20), stream=True)     
                       print(r.status_code)
                       print(r.content)

错误:;; 400 b '\r\n400错误的要求\r\n\r\n

400错误请求

\r\n云耀斑\r\n\r\n\r\n'
这是别的方法:

with open(r'D:\Users\lakshmi.vijaya\Desktop\Invalidemail\allhubusers_data.json', 'r') as run:
                    dict_run = run.readlines()
                    dict_ready = (''.join(dict_run))
                    url = 'https://api.hubapi.com/contacts/v1/contact/batch'
                    headers = {'Authorization' : "Bearer pat-na1***********-", 'Accept' : 'application/json', 'Content-Type' : 'application/json','Transfer-encoding':'chunked'}

                    urllib3.disable_warnings()
                    r = requests.post(url, data=dict_ready, headers=headers,verify=False, 
                     timeout=(15,20), stream=True) 
                    r.iter_content(chunk_size=1000000)    
                    print(r.status_code)
                    print(r.content)

错误::::引发SSLError(e,request=request)请求。异常错误。SSLError:连接池(主机=“api.hubapi.com”,端口=443):URL超过最大重试次数:/contacts/v1/contact/batch(由SSLError(SSLEOFError(8,'EOF发生违反协议(_ssl.c:2396)'))引起)
这是json数据在大json文件中的样子

{
    "email": "aaazaj21@yahoo.com",
    "properties": [
        {
            "property": "XlinkUserID",
            "value": 422211111
        },
        {
            "property": "register_time",
            "value": "2021-09-02"
        },
        {
            "property": "linked_alexa",
            "value": 1
        },
        {
            "property": "linked_googlehome",
            "value": 0
        },
        {
            "property": "fan_speed_switch_0x51_",
            "value": 2
        }
    ]
},
{
    "email": "zzz7@gmail.com",
    "properties": [
        {
            "property": "XlinkUserID",
            "value": 13333666
        },
        {
            "property": "register_time",
            "value": "2021-04-24"
        },
        {
            "property": "linked_alexa",
            "value": 1
        },
        {
            "property": "linked_googlehome",
            "value": 0
        },
        {
            "property": "full_colora19_st_0x06_",
            "value": 2
        }
    ]
}

我尝试添加对象列表

[
{
    "email": "aaazaj21@yahoo.com",
    "properties": [
        {
            "property": "XlinkUserID",
            "value": 422211111
        },
        {
            "property": "register_time",
            "value": "2021-09-02"
        },
        {
            "property": "linked_alexa",
            "value": 1
        },
        {
            "property": "linked_googlehome",
            "value": 0
        },
        {
            "property": "fan_speed_switch_0x51_",
            "value": 2
        }
    ]
},
{
    "email": "zzz7@gmail.com",
    "properties": [
        {
            "property": "XlinkUserID",
            "value": 13333666
        },
        {
            "property": "register_time",
            "value": "2021-04-24"
        },
        {
            "property": "linked_alexa",
            "value": 1
        },
        {
            "property": "linked_googlehome",
            "value": 0
        },
        {
            "property": "full_colora19_st_0x06_",
            "value": 2
        }
    ]
}
]
kwvwclae

kwvwclae1#

你还没有说你的JSON文件是一个对象数组的表示还是一个对象的表示。数组被json.load转换成Python列表,对象被转换成Python字典。
下面是一些代码,假设它是一个对象数组,如果它不是一个对象数组,请参见https://stackoverflow.com/a/22878842/839338,但也可以使用相同的原理
假设您需要15k字节而不是记录,如果它是记录的数量,您可以简化代码,只需将15000作为第二个参数传递给chunk_list()。

import json
import math
import pprint

# See https://stackoverflow.com/a/312464/839338
def chunk_list(list_to_chunk, number_of_list_items):
    """Yield successive chunk_size-sized chunks from list."""
    for i in range(0, len(list_to_chunk), number_of_list_items):
        yield list_to_chunk[i:i + number_of_list_items]

with open('./allhubusers_data.json', 'r') as run:
    json_data = json.load(run)
    desired_size = 15000
    json_size = len(json.dumps(json_data))
    print(f'{json_size=}')
    print(f'Divide into {math.ceil(json_size/desired_size)} sub-sets')
    print(f'Number of list items per subset = {len(json_data)//math.ceil(json_size/desired_size)}')
    if isinstance(json_data, list):
        print("Found a list")
        sub_sets = chunk_list(json_data, len(json_data)//math.ceil(json_size/desired_size))
    else:
        exit("Data not list")
    for sub_set in sub_sets:
        pprint.pprint(sub_set)
        print(f'Length of sub-set {len(json.dumps(sub_set))}')
        # Do stuff with the sub sets...
        text_subset = json.dumps(sub_set)  # ...

如果子集的文本长度不同,您可能需要向下调整desired_size的值。

根据意见更新如果您每次请求只需要15000条记录,则此代码应该适合您

import json
import pprint
import requests

# See https://stackoverflow.com/a/312464/839338
def chunk_list(list_to_chunk, number_of_list_items):
    """Yield successive chunk_size-sized chunks from list."""
    for i in range(0, len(list_to_chunk), number_of_list_items):
        yield list_to_chunk[i:i + number_of_list_items]

url = 'https://api.hubapi.com/contacts/v1/contact/batch'
headers = {
    'Authorization': "Bearer pat-na1-**************************",
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Transfer-encoding': 'chunked'
}

with open(r'D:\Users\lakshmi.vijaya\Desktop\Invalidemail\allhubusers_data.json', 'r') as run:
    json_data = json.load(run)
    desired_size = 15000
    if isinstance(json_data, list):
        print("Found a list")
        sub_sets = chunk_list(json_data, desired_size)
    else:
        exit("Data not list")
    for sub_set in sub_sets:
        # pprint.pprint(sub_set)
        print(f'Length of sub-set {len(sub_set)}')
        r = requests.post(
            url,
            data=json.dumps(sub_set),
            headers=headers,
            verify=False,
            timeout=(15, 20),
            stream=True
        )
        print(r.status_code)
        print(r.content)

相关问题