使用ralger“scrap”的网页搜罗返回空值

xbp102n0  于 2023-04-18  发布在  其他
关注(0)|答案(1)|浏览(113)

我正在尝试抓取确实,但当我运行代码以获取Job description时,我得到了部分结果example,total jobs = 1500,total links = 1500,description = less than 1500,有时当我运行特定的块来获取描述时,结果也会发生变化。我将感谢您的帮助,了解如何获得所有的值或如何更改NA的缺失结果。

library(ralger)

#Search Method
base_link <- "https://www.indeed.com/jobs?q&l=mexico&from=searchOnHP&vjk=c339451b33a29c91"
links <- paste0(base_link, 1:100)

#Getting link
scraped_url<- attribute_scrap(links, node = '[data-hide-spinner = "true"]', attr = 'href')
job_url <- paste0("https://www.indeed.com",scraped_url)

#Getting Job Description
job_description <- scrap(link = job_url, node = '.jobsearch-jobDescriptionText')

#Creating Data Frame
df <- data.frame(job_description,job_url)

Error in data.frame(fullds, job_description, job_url) : 
arguments imply differing number of rows: 1500, 1485
nhhxz33t

nhhxz33t1#

我已经能够用以下代码提取RSelenium的职位描述.我认为你是不能够提取与R包ralger网站的所有信息,因为页面是不是完全加载在你提取信息的时刻. RSelenium允许页面加载时,我们提取网站的信息.我添加了一个例子下面的一个链接.

library(RSelenium)
library(rvest)

shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
remDr$open()

base_link <- "https://www.indeed.com/jobs?q&l=mexico&from=searchOnHP&vjk=c339451b33a29c91"
links <- paste0(base_link, 1)
job_Url <- list()
remDr$navigate(links)  

for(i in 1 : 200)
{
  print(i)
  java_Script <- paste0("scroll(0,", i * 20, ")")
  remDr$executeScript(java_Script)
}

counter <- 0

for(i in 1 : 30)
{
  print(i)
  xpath <- paste0('/html/body/main/div/div[1]/div/div/div[5]/div[1]/div[5]/div/ul/li[', i, ']/div/div[1]/div/div[1]/div/table[1]/tbody/tr/td/div[1]/h2/a')
  web_Obj <- tryCatch(remDr$findElement("xpath", xpath), error = function(e) NA)
  
  if(is.na(web_Obj))
  {

  }else
  {
    counter <- counter + 1
    job_Url[[counter]] <- web_Obj$getElementAttribute("href")[[1]]
    print(job_Url[[counter]])
  }  
}

nb_Job_Url <- length(job_Url)
list_Text_Job_Description <- list()

for(i in 1 : nb_Job_Url)
{
  print(i)
  remDr$navigate(job_Url[[i]])
  Sys.sleep(2)
  web_Obj_Job_Description <- remDr$findElement('id', "jobDescriptionText") 
  list_Text_Job_Description[[i]] <-  web_Obj_Job_Description$getElementText()
}

list_Text_Job_Description[[1]]

[1] "Job Description\nCiudad de México, México\nNivel de Estudios\nBachelor´s degree\nExperiencia Requerida\n3+\n3+ years office administration experience with senior level management and front desk experience.\nResumen\nThe receptionist will perform a variety of tasks to directly support daily activities for the Mexico City Beer division office and said front desk responsibilities, including but not limited to answering the main telephone lines, routing calls, greeting visitors, ordering supplies, managing couriers and cross training with the Facilities Supervisor to assist when needed on other office related needs and asks of the Facilities team.\nHabilidades\nTelephone Skills\nAssertive Communication & Listening skill.\nCustomer Service Attitude.\nExcellent service and positive attitude.\nWillingness to help everyone.\nResilience & Professionalism.\nStrong verbal and written communication.\nStrong analytical and problem-solving skills.\nLearning agility.\nOrganized.\nResponsabilidades\n1.Responsible for the Front Desk (manages 2.5 floors for + 200 employees) from 8am to 5pm, (with a 1-hour lunch) Monday to Thursday, and 8am to 1:00pm on Friday. Activities to include routing incoming calls, greeting visitors, sending/receiving couriers, and parcel carrier packages, overseeing vendor access and issuing parking ticket vouchers as requested internally or externally.\n2.Greet visitors, track and manage visitors through logbooks or electronic system, notify employees of visitor arrivals, provide visitors with a positive experience (i.e. coffee, water, take coats, etc.) and work with building security on visitor access as applicable\n3.Assist Security in the administration of the access cards for employees in Mexico City office, to include printing and deliver of cards to employees, maintenance of an active card inventory summary, ensuring compliance with corporate Security policy.\n4.Issue new hire welcoming e- mail to include guidance about local Product Allowance program, stationary ordering, parking regulation, etc. Responsibilities to include maintenance and edits to electronic guide as directed by Human Resources or Facilities.\n5.Assist local administrative assistants, as needed, with on-site meeting conference room reservations and scheduling to ensure rooms are ready for meetings (i.e. proper number of chairs, clean/ready to use room, etc.). Work with local IT team to ensure AV is functioning and ready for meeting use.\n6.Responsible for inventory and ordering of office supplies and bar products.\n7.Place vendor service calls and issue Ariba purchase orders as required for all maintenance and service required for equipment, goods and other maintenance services, as directed by the Facilities Supervisor.\n8.Assist in the delivery of employee product allowance orders, business cards, or other seasonal employee gifts.\n9.Completes a variety of responsibilities, administrative duties and special projects as assigned by Facilities Management.\nLocation\nMexico City\nAdditional Locations\nJob Type\nFull time\nJob Area\nOperations and Production\nEqual Opportunity\nConstellation Brands is committed to a continuing program of equal employment opportunity. All persons have equal employment opportunities with Constellation Brands, regardless of their sex, race, color, age, religion, creed, sexual orientation, national origin or citizenship, ancestry, physical or mental disability, medical condition (cancer or genetic characteristics), marital status, gender (including gender identity or gender expression), familial status, military or veteran status, genetic information, pregnancy, childbirth, breastfeeding, or related conditions (or any other group or category within the framework of the applicable discrimination laws and regulations)."

相关问题