python-3.x 请求.需要很多时间

0kjbasz6  于 2023-05-30  发布在  Python
关注(0)|答案(2)|浏览(159)

这段代码已经变得很耗时,但它以前做得很快

from openpyxl import load_workbook
import requests

wb = load_workbook(r"C:\TranslateMfestSite\ProductsInformation.xlsx")

sheet=wb['Sheet']
Designer=551
 
for i in range(1, 14498):
    if int(sheet["C"+str(i)].value)==Designer and 'xlink:href="' in requests.get(str(sheet["B"+str(i)].value)).text:
        r = requests.get(str(sheet["B"+str(i)].value)).text
rjjhvcjd

rjjhvcjd1#

这段代码写得不是很好。
从重写可读性开始:

from openpyxl import load_workbook
import requests

workbook = load_workbook(r"C:\TranslateMfestSite\ProductsInformation.xlsx")

sheet=workbook['Sheet']
designer=551
 
for i in range(1, 14498):
    response = requests.get(
        str(sheet["B"+str(i)].value)
    ).text
    
    if int(sheet["C"+str(i)].value)==designer and 'xlink:href="' in response:
        r = response

我们可以看到,您调用了一个REST调用,只是为了检查它是否具有某些属性,然后再次调用它-这很糟糕。
但这里还有更多的工作要做!你的一个逻辑检查与rest调用没有任何关系--所以你可能会浪费很多对这个端点的调用--所以首先在那个条件下无效!

from openpyxl import load_workbook
import requests

workbook = load_workbook(r"C:\TranslateMfestSite\ProductsInformation.xlsx")

sheet=workbook['Sheet']
designer=551
 
for i in range(1, 14498):
    if int(sheet["C"+str(i)].value)!=designer:
        continue

    response = requests.get(
        str(sheet["B"+str(i)].value)
    ).text
    
    if 'xlink:href="' in response:
        r = response

现在...我明白了一点但我想知道...什么是r = response?这是终端吗?一旦设置好了,现在怎么办?
让我们假设它不是,让我们进一步清理它-迭代是混乱的

from openpyxl import load_workbook
import requests

workbook = load_workbook(r"C:\TranslateMfestSite\ProductsInformation.xlsx")

sheet=workbook['Sheet']
designer=551

for B, C in sheet.iter_rows(max_row=1 max_row=14498, min_col=2, max_col=3):
    if C.value != designer:
        continue

    response = requests.get(
        str(B.value)
    ).text
    
    if 'xlink:href="' in response:
        r = response

现在这段代码更干净、更清晰了,您可以看到它在做什么。它删除了额外的REST调用。现在剩下的是代码是否应该退出。

mwkjh3gx

mwkjh3gx2#

如果电子表格中的每一行都有一个等于DESIGNER的值,那么您将发出超过14,000个HTTP GET请求。可能不是这样的。然而,同步地做这件事不太可能表现良好。
多线程处理对于这种情况非常有用。
让我们使用多线程,其中线程构建一个全局字典,该字典以从电子表格的B列中提取的URL为关键字,其关联值是从URL获取的文本

**注意:**这是未经测试的(出于明显的原因)

import requests
import openpyxl
from concurrent.futures import ThreadPoolExecutor

WORKBOOK = r"C:\TranslateMfestSite\ProductsInformation.xlsx"
SHEET = 'Sheet'
DESIGNER = 551
B, C = 1, 2

results = {}
wb = None

def get_text(url):
    print(f'{url=}')
    try:
        with requests.get(url) as response:
            response.raise_for_status()
            results[url] = response.text
    except Exception:
        pass

try:
    wb = openpyxl.load_workbook(WORKBOOK)
    ws = wb[SHEET]
    with ThreadPoolExecutor() as tpe:
        for row in ws.iter_rows():
            if len(row) > 2 and int(row[C].value) == DESIGNER:
                tpe.submit(get_text, str(row[B].value))
except Exception as e:
    print(e)
finally:
    if wb:
        wb.close()

for url, text in results.items():
    print(url, text)

相关问题