Selenium,试图弄清楚如何在LinkedIn上循环搜索工作并抓取数据

cunj1qz1  于 2022-11-24  发布在  其他
关注(0)|答案(1)|浏览(142)
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome()

driver.get("https://www.linkedin.com/jobs/search/?currentJobId=3354966649&geoId=103644278&keywords=Software%20Engineer"
           "&location=United%20States&refresh=true")

try:
    main = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located(
        (By.ID, "main")))

    jobList = main.find_elements(By.CLASS_NAME, "scaffold-layout__list-container")
    for companyName in jobList:
        name = companyName.find_element(By.XPATH, "/html/body/div[5]/div[3]/div[4]/div/div/main/div/section[1]/div/ul/li[1]/div/div[1]/div[1]/div[2]/div[2]/a")
        print(name.text)
except:
    driver.quit()

我似乎找不到打印公司名称的方法(我也希望向下移动列表,抓取比如20家公司的名称)。我希望对其他属性也这样做,比如职位,但是我被难住了。LinkedIn链接:https://www.linkedin.com/jobs/search/?currentJobId=3354951485&geoId=103644278&keywords=Software%20Engineer&location=United%20States&refresh=true

kokeuurv

kokeuurv1#

1.最好还是登录linkedin来了解真实的情况。
1.您必须以正确的方式选择正确的元素定位器策略
1.你必须滚动页面才能抓取页面中的所有元素

完整的工作代码及示例:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

options = webdriver.ChromeOptions()
#All are optional
options.add_experimental_option("detach", True)
options.add_argument("--disable-extensions")
options.add_argument("--disable-notifications")
options.add_argument("--disable-Advertisement")
options.add_argument("--disable-popup-blocking")
options.add_argument("start-maximized")

s=Service('./chromedriver')
driver= webdriver.Chrome(service=s,options = options)

driver.get('https://www.linkedin.com/')
time.sleep(4)

username = driver.find_element(By.CSS_SELECTOR,'#session_key')
username.send_keys('your email')
time.sleep(1)

passward = driver.find_element(By.CSS_SELECTOR,'#session_password')
passward.send_keys('your password')
time.sleep(1)

signin = driver.find_element(By.XPATH,'//*[@class="sign-in-form__submit-button"]').click()
time.sleep(1)

data = []

driver.get('https://www.linkedin.com/jobs/search/?currentJobId=3354951485&geoId=103644278&keywords=Software%20Engineer&location=United%20States&refresh=true')
time.sleep(5)

jobs_block = driver.find_element(By.CSS_SELECTOR,'.scaffold-layout__list-container')
jobs_list= jobs_block.find_elements(By.CSS_SELECTOR, '.jobs-search-results__list-item')
    
for job in jobs_list:
    a = job.find_element(By.XPATH,'.//*[@class="disabled ember-view job-card-container__link job-card-list__title"]')
    title = a.text if a else None
    print(title)
        
    driver.execute_script("arguments[0].scrollIntoView();", job)

输出:

Machine Learning Engineer
Angular Web Developer (C#)
Growth Hacker
Python Developer
Python Developer
Senior Data Engineer
Senior Frontend Developer
SQL Developer
Senior Site Reliability Engineer
Senior Software Engineer
Entry level Software Engineer Developer
Software Engineer - Data, ML | Talos (Remote US)
Lead Data Solutions Engineer (remote)
C Developer
Salesforce Developer
Senior Data Engineer
Lead Data Engineer
Software Engineer, Growth Systems
Software Engineer, Developer Productivity
Data Engineer
Data Engineer
SQL Developer Intern
System Engineer II
Field Sales Professional (AE/SE)

相关问题