我认为我对控制流的理解是有缺陷的。我想要的是抓取一堆url,当我遇到它们时,从池中启动一个连接,将url插入我的数据库。
我认为这里的问题是我对收益率的理解。如何使tormysql接受一个参数并异步使用连接将该参数写入数据库?
基本上,我只想insert \u to \u db()异步插入一个指向我数据库的url。但是insert \u to \u db()实际上没有运行。
from lxml import html
import tormysql
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import WebDriverException
import time
import sys
import csv
from shutil import copyfile
import requests
from bs4 import BeautifulSoup
pool = tormysql.ConnectionPool(
max_connections = 20, #max open connections
idle_seconds = 7200, #conntion idle timeout time, 0 is not timeout
wait_connection_timeout = 3, #wait connection timeout
host = "unidb.cfavdkskfrrc.us-west-2.rds.amazonaws.com",
user = "#",
passwd = "#",
db = "diplomacy_data",
charset = "utf8"
)
def insert_to_db(game_url):
print(game_url)
with (yield pool.Connection()) as conn:
try:
with conn.cursor() as cursor:
yield cursor.execute("INSERT INTO FvA_urls(URL) VALUES('%s')" % game_url)
except:
yield conn.rollback()
else:
print('committed', game_url)
yield conn.commit()
username = "#"
password = "#"
login_url = "https://webdiplomacy.net/logon.php"
driver = webdriver.Chrome()
driver.get(login_url)
driver.find_element_by_css_selector('body > div.content.content-follow-on > form > ul > li:nth-child(2) > input[type="text"]').send_keys(username)
driver.find_element_by_css_selector('body > div.content.content-follow-on > form > ul > li:nth-child(5) > input[type="password"]').send_keys(password)
driver.find_element_by_css_selector('body > div.content.content-follow-on > form > ul > li:nth-child(10) > input').click()
url = "https://webdiplomacy.net/gamelistings.php?"
params = "page-games=1&gamelistType=Finished&searchOn=on"
driver.get(url + params)
driver.find_element_by_css_selector('body > div:nth-child(5) > div:nth-child(2) > form > li:nth-child(5) > input[type="radio"]:nth-child(6)').click()
driver.find_element_by_css_selector('body > div:nth-child(5) > div:nth-child(2) > form > input').click()
# Get all the URLS
page_num = 0
while True:
page_num += 1
if page_num % 20 == 0:
print(page_num)
a = driver.find_elements(By.XPATH, '/html/body/div[5]/div[3]/div[*]/div[6]/div[2]/a')
if len(a) < 1:
pool.close()
exit()
else:
for button in a:
game_url = button.get_attribute('href')
insert_to_db(game_url)
driver.find_element_by_css_selector('body > div:nth-child(6) > div:nth-child(4) > div:nth-child(1) > div:nth-child(2) > a:nth-child(3) > img').click()
我在任何地方都能找到使用tormysql的例子,这些例子只是在表中重复异步插入1。显然完全没用。
是不是不可能让每个连接独立地做一些事情?就像这个库只对异步多次执行完全相同的操作有用吗?
1条答案
按热度按时间qij5mzcb1#
我相信字符串插值的“%”语法是一种被证明易受sql注入攻击的模式。必须正确转义包含在sql文本中的任何潜在不安全值。
我还认为,包含单引号会导致生成无效的sql语法,并且语句的执行会因语法错误而失败
)可能是我找错树了。我还没有通过测试来证实这一点。可能需要或允许单引号,也可能正确转义了值。)
我认为更好的模式是使用逗号代替百分号,并传递元组作为参数,如下所示:
为了进行比较,请查看中的这行代码
example/pool.py
```cur = yield POOL.execute("SELECT SLEEP(%s)", (t,))
^ ^ ^^