我怎样才能得到一个没有用python提供post的网站的post方法?

wfveoks0  于 2022-12-30  发布在  Python
关注(0)|答案(1)|浏览(140)

我想通过抓取在网站https://gmail.inputekno.com/上列表的形式获得数据,但当我检查时,该网站不提供发布请求,有没有办法解决这个问题?

我试过了,但是失败了

import requests

cookies = {
    '_ga': 'GA1.1.1869494453.1672283765',
    '__gads': 'ID=e350f661fb3c1b6a-22c8b78c11d900b8:T=1672283764:RT=1672283764:S=ALNI_MYcfleQdj417a3BQakIyzzrp83MdQ',
    '__gpi': 'UID=00000b9a196b924e:T=1672283764:RT=1672283764:S=ALNI_MZ4UaRvjE4GzR-k0Na5Jj-HBksD4w',
    '_ga_R3D1879B9V': 'GS1.1.1672283764.1.1.1672284364.0.0.0',
}

headers = {
    'authority': 'gmail.inputekno.com',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'no-cache',
    # 'cookie': '_ga=GA1.1.1869494453.1672283765; __gads=ID=e350f661fb3c1b6a-22c8b78c11d900b8:T=1672283764:RT=1672283764:S=ALNI_MYcfleQdj417a3BQakIyzzrp83MdQ; __gpi=UID=00000b9a196b924e:T=1672283764:RT=1672283764:S=ALNI_MZ4UaRvjE4GzR-k0Na5Jj-HBksD4w; _ga_R3D1879B9V=GS1.1.1672283764.1.1.1672284364.0.0.0',
    'pragma': 'no-cache',
    'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"',
    'sec-ch-ua-arch': '""',
    'sec-ch-ua-bitness': '"64"',
    'sec-ch-ua-full-version-list': '"Not?A_Brand";v="8.0.0.0", "Chromium";v="108.0.5359.125", "Google Chrome";v="108.0.5359.125"',
    'sec-ch-ua-mobile': '?1',
    'sec-ch-ua-model': '"Nexus 5"',
    'sec-ch-ua-platform': '"Android"',
    'sec-ch-ua-platform-version': '"6.0"',
    'sec-ch-ua-wow64': '?0',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36',
}

payload : { 'username':'danielmantha'}
response = requests.post('https://gmail.inputekno.com/', cookies=cookies, headers=headers,data=payload)
ut6juiuv

ut6juiuv1#

正如CharlesHan评论的那样,
没有POST,因为电子邮件是由其Javascript代码生成的。https://cdn.jsdelivr.net/gh/rulnoveid/CodeBlog@main/gmail%20trick/trick.js
因此,无法使用基于请求的方法抓取这些特定的数据。
如果你真的想这么做的话,有selenium [和其他网络自动化工具],你可以使用scrapeGmailDotTrix [下面的简短版本] --例如,调用 * scrapeGmailDotTrix('somename') * 将在输入somenamein the input后从网站上抓取并返回一个生成的电子邮件列表。

### full version at https://pastebin.com/tgC4EpGZ
## YOU MUST DOWNLOAD CHROMEDRIVER.EXE FOR THIS ##
## https://chromedriver.chromium.org/downloads ##

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def scrapeGmailDotTrix(username: str, wait_start=10, wait_op=200):
    driver = webdriver.Chrome()
    driver.get('https://gmail.inputekno.com/')

    WebDriverWait(driver, wait_start).until(
        EC.visibility_of_element_located((
            By.XPATH, '//input[@id="username"][@type="text"]')))

    driver.find_element(
        By.XPATH, '//input[@id="username"][@type="text"]'
    ).send_keys(str(username))

    WebDriverWait(driver, wait_op).until(
        EC.text_to_be_present_in_element_value(
            (By.XPATH, '//textarea[@id="emails"]'),
            '.'.join(str(username))))

    return [em.strip() for em in driver.find_element(
        By.XPATH, '//textarea[@id="emails"]'
    ).get_attribute('value').splitlines()]

[The full version具有错误处理和预防性等待以加载输入等]
然而,如果你只是想以同样的方式生成电子邮件,这种方法是不必要的复杂。
相比之下,the main JavaScript logic在python中的复制相当简单:

def gmailTrix_gen(uname:str):
    if len(uname) > 1: 
        head, tail = uname[0], uname[1:]
        for item in gmailTrix_gen(tail):
            yield head + item
            yield head + '.' + item
    else: yield uname

def gmailTrix_list(username:str):
    return [f'{gu}@gmail.com' for gu in gmailTrix_gen(username)]

例如,gmailTrix_list('rand')将返回(just like the site):

['rand@gmail.com', 'r.and@gmail.com', 'ra.nd@gmail.com', 'r.a.nd@gmail.com', 
 'ran.d@gmail.com', 'r.an.d@gmail.com', 'ra.n.d@gmail.com', 'r.a.n.d@gmail.com']

相关问题