使用Python同时迭代数据集行部分

q7solyqu  于 2023-02-26  发布在  Python
关注(0)|答案(1)|浏览(130)

我正在寻找方向,以迭代超过10万行和两列的数据集(CSV或谷歌表)部分同时,但有延迟。
任务是使用每一行数据执行API请求,返回的API响应ID保存在第三列中(迭代超过100 K行预计需要几个小时)。
为了节省时间,我想知道是否可以/适合使用保存在第三列中的请求ID信息启动第二个API请求(从第一个请求开始-很可能仍在填充数据集下方),但延迟约10分钟(十分钟允许远程设备上的任务完成,从第一个API请求触发),而不是在跨所有行运行第二API请求之前等待所有100 K+行完成(第二API请求检查来自第一请求的任务是否完成)。
在这个阶段,我正在用一种特定的方法寻找方向,以免走得太远,谢谢

qhhrdooz

qhhrdooz1#

也许你可以使用线程模块中的Thread
写代码然后用语言解释它更容易,所以:

from threading import Thread
from time import sleep

def function(minutes):
    sleep(60*minutes) # will wait for minutes you define
    # your code for second API request

Thread(target=function, # take an function 
       args=(10,) # Pass in the arguments
       ).start() # start a function

# If your main code stops code will wait for it to end
# and then it will end the program

# Elsely if you want to end when main program ends 

Thread(target=function, # take an function 
       args=(10,), # Pass in the arguments
       daemon=True
       ).start() # start a function

# With this when main program ends the Thread will be forced to end.

参见螺纹文件
如果您想要处理一个无法同时更改的数据集,那么我将使用简单的创建文件和检查文件是否存在

from threading import Thread
from time import sleep
import os
from random import uniform

def function(minutes):
    sleep(60*minutes) # will wait for minutes you define
    # your code for second API request
    # now if you want to change dataset check if file exists

    while os.path.isfile('RUNNING'):
        sleep(uniform(0.5, 1.5))
        # sleep with random number, because Threads would leave
        # loop at the same time and the in next line open() function
        # would raise error: FileExixts, because all of the Threads
        # would want to create same file

    # if not create it
    open('RUNNING', 'x')
    # work with dataset
    os.remove('RUNNING') # remove file

    # This is good if you want to have 2 or more functions
    # to work on something that cannot be done simultaneously
    # because it prevent it

Thread(target=function, # take an function 
       args=(10,) # Pass in the arguments
       ).start() # start a function

相关问题