我正在从父类别和子类别中抓取嵌套文本。这里我的循环看起来像:
first for loop will scrape all parent category:
...seond for loop will scrape child1 category of parent category
...third for loop will scrape child2 category of child1 category
我正在尝试从此page中抓取所有父类别和子类别
如果我的"sub_cat_1 = y.text"
无或空字符串,那么我想增加1 Level_1_Category_No{increment_by_1}
,这里是我的变量sub_category_one = driver.find_elements(By.CSS_SELECTOR , ".Level_1_Category_No1 .lzd-site-menu-sub-item > a span")
,这里是我的完整代码:
driver.get("https://www.daraz.com.bd/")
time.sleep(10)
main_category = driver.find_elements(By.CSS_SELECTOR , '.lzd-site-menu-root-item span')
with open("all_category_subcat.csv", "w",encoding="utf-8",newline="") as f:
writer = csv.writer(f)
writer.writerow(["Main Category", "Sub Category 1", "Sub Category 2"])
for i in main_category:
hover = ActionChains(driver).move_to_element(i)
hover.perform()
main_cat = i.text
print(main_cat)
sub_category_one = driver.find_elements(By.CSS_SELECTOR , ".Level_1_Category_No1 .lzd-site-menu-sub-item > a span")
for y in sub_category_one:
hover = ActionChains(driver).move_to_element(y)
hover.perform()
sub_cat_1 = y.text
print("--------------",sub_cat_1,"--------------")
if sub_cat_1 == None sub_cat_1 == "":
#update the value of sub_category_one and run for loop again
sub_category_two = driver.find_elements(By.CSS_SELECTOR , ".lzd-site-menu-grand-active span")
for z in sub_category_two:
sub_cat_2 = z.text
print(sub_cat_2)
writer.writerow([main_cat, sub_cat_1, sub_cat_2])
1条答案
按热度按时间fivyi3re1#
它给你这样的输出--我相信这就是你在一天结束时真正想要的?
[[“女性时尚”、“女性服装”、“https://www.daraz.com.bd/womens-clothing/?price = 999-& service = OS & from = filter/'、”女性连衣裙“、”https://www.daraz.com.bd/womens-dresses/?from = filter & price = 1200-& service = OS & from = filter/'],
【“女性时尚”、“女性服装”、“女性针织品”、“女性针织品”、“女性针织品”、“女性针织品”】、
【《女性时尚》、《女性-服装》、《女性-服装》、《女性-库尔提斯》、《女性-库尔提斯》、《女性-服装》......》
【“健康与美容”、“沐浴-身体”、“沐浴-身体”、“身体-精油”、“身体-精油”、“身体-精油”】、......
[“手表、包、珠宝”、“书包”、“儿童背包”、“儿童背包”]、
【“手表、包、珠宝”、“书包”、“https://www.daraz.com.bd/school-bags/”、“儿童双肩包”、“https://www.daraz.com.bd/kids-shoulder-bags/”】,
[“手表、包、珠宝”、“书包”、“https://www.daraz.com.bd/school-bags/ '、”书包-背包“、”https://www.daraz.com.bd/school-bags-backpack/']、......[“汽车与摩托车”、“汽车”、“https://www.daraz.com.bd/automotive/?spm = a2a0e.home.cate_12.1.735212f73vbvt5”、“摩托车 Helm ”、“https://www.daraz.com.bd/motorcycle-helmets/ ']、
[“汽车和摩托车”、“汽车”、“https://www.daraz.com.bd/automotive/?spm = a2a0e.home.cate_12.1.735212f73vbvt5”、“摩托车工具维护”、“https://www.daraz.com.bd/motorcycle-tools-maintenance/ ']]
💪🏽
注:(节略版,附赠=个别网址可按需要舍弃;代码需要修改-查询中没有“hover”/for循环)。