pandas 从www.example.com获取数据clutch.co：使用BS4，请求失败：

mznpcxlj 于 2023-06-04 发布在其他

关注(0)|答案(1)|浏览(299)

试图从页面https://clutch.co/il/it-services收集数据，我认为可能有几个选项可以做到这一点
使用bs4并请求B.使用Pandas
第一种方法使用A。

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://clutch.co/il/it-services"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

company_names = soup.find_all("h3", class_="company-name")
locations = soup.find_all("span", class_="locality")

company_names_list = [name.get_text(strip=True) for name in company_names]
locations_list = [location.get_text(strip=True) for location in locations]

data = {"Company Name": company_names_list, "Location": locations_list}
df = pd.DataFrame(data)

df.to_csv("it_services_data.csv", index=False)

此代码将刮
a.来自指定网页的公司名称和位置，以及b.将它们存储在Pandas DataFrame中。c.然后将数据保存到当前工作目录中名为it_services_data.csv的CSV文件中。
但我最终得到了一个空的结果文件。实际上，该文件是空的：
我所做的是：
1.安装所需的软件包：

pip install beautifulsoup4 requests pandas

1.导入必要的库：
import requests
from bs4 import BeautifulSoup
import pandas as pd
1.向网页发送GET请求并检索HTML内容：
url = "https://clutch.co/il/it-services"
response = requests.get(url)
1.创建一个BeautifulSoup对象来解析HTML内容：
soup = BeautifulSoup(response.content, "html.parser")
1.识别包含我们要抓取的数据的HTML元素。检查网页的源代码以查找相关的标签和属性。例如，假设我们要提取公司名称及其各自的位置。在本例中，公司名称包含在类名为“company-name”的标记中，位置包含在类名为“locality”的标记中：
company_names = soup.find_all("h3", class_="company-name")
locations = soup.find_all("span", class_="locality")
1.从HTML元素中提取数据并将其存储在列表中：
company_names_list = [name.get_text(strip=True) for name in company_names] locations_list = [location.get_text(strip=True) for location in locations]
1.创建一个Pandas DataFrame来组织提取的数据：
data = {"Company Name": company_names_list, "Location": locations_list}
df = pd.DataFrame(data)
8：可选地，您可以使用Pandas DataFrame执行进一步的数据处理或分析，或将数据导出到文件。例如，要将数据保存到CSV文件：

`df.to_csv("it_services_data.csv", index=False)`

就是这样！我就做了这么多：我认为，通过这种方法，我可以使用Python和Beautiful Soup，Requests和Pandas包从指定的网页中抓取公司名称及其位置。
好吧-我也需要有公司的网址。如果我能收集到更多的数据，那就太好了。
更新：非常感谢badduker：我在Colab中尝试了一下-在安装cloudsraper-plugin之后-运行代码并得到以下结果：

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

During handling of the above exception, another exception occurred:

AttributeError: 'CloudflareChallengeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

AssertionError
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

During handling of the above exception, another exception occurred:

AttributeError: 'CloudflareChallengeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

TypeError: object of type 'NoneType' has no len()

During handling of the above exception, another exception occurred:

AttributeError: 'TypeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

AssertionError
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

During handling of the above exception, another exception occurred:

AttributeError: 'CloudflareChallengeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

TypeError: object of type 'NoneType' has no len()

During handling of the above exception, another exception occurred:

AttributeError: 'TypeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

TypeError: object of type 'NoneType' has no len()

During handling of the above exception, another exception occurred:

AttributeError: 'TypeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

AssertionError

pandas

来源：https://stackoverflow.com/questions/76362600/getting-data-out-of-clutch-co-with-bs4-and-requests-failed

1条答案

按热度按时间

juzqafwq1#

该站点返回一个错误，提示您需要启用JavaScript。换句话说，普通的requests可能不够。
但是，您可以尝试使用cloudscraper模块。
例如：

import cloudscraper
import pandas as pd
from bs4 import BeautifulSoup
from tabulate import tabulate

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.57",
}

scraper = cloudscraper.create_scraper()
response = scraper.get("https://clutch.co/il/it-services", headers=headers)
soup = BeautifulSoup(response.content, "html.parser")

company_names = soup.select(".directory-list div.provider-info--header .company_info a")
locations = soup.select(".locality")

company_names_list = [name.get_text(strip=True) for name in company_names]
locations_list = [location.get_text(strip=True) for location in locations]

data = {"Company Name": company_names_list, "Location": locations_list}
df = pd.DataFrame(data)
df.index += 1
print(tabulate(df, headers="keys", tablefmt="psql"))
df.to_csv("it_services_data.csv", index=False)

输出：

+----+-----------------------------------------------------+--------------------------------+
|    | Company Name                                        | Location                       |
|----+-----------------------------------------------------+--------------------------------|
|  1 | Brainhub                                            | Gliwice, Poland                |
|  2 | Vates                                               | Atlanta, GA                    |
|  3 | UVIK Software                                       | Tallinn, Estonia               |
|  4 | TLVTech                                             | Ramat Gan, Israel              |
|  5 | Broscorp                                            | Beersheba, Israel              |
|  6 | Exoft                                               | Vienna, VA                     |
|  7 | EchoGlobal                                          | Tallinn, Estonia               |
|  8 | Codup                                               | Karachi, Pakistan              |
|  9 | Dofinity                                            | Bnei Brak, Israel              |
| 10 | Insitu S2 Tikshuv LTD                               | Haifa, Israel                  |
| 11 | Sogo Services                                       | Tel Aviv-Yafo, Israel          |
| 12 | Naviteq LTD                                         | Tel Aviv-Yafo, Israel          |
| 13 | BMT - Business Marketing Tools                      | Ra'anana, Israel               |
| 14 | Accedia                                             | Sofia, Bulgaria                |
| 15 | Profisea                                            | Hod Hasharon, Israel           |
| 16 | Trivium Solutions                                   | Herzliya, Israel               |
| 17 | Dynomind.tech                                       | Jerusalem, Israel              |
| 18 | Madeira Data Solutions                              | Kefar Sava, Israel             |
| 19 | Titanium Blockchain                                 | Tel Aviv-Yafo, Israel          |
| 20 | Octopus Computer Solutions                          | Tel Aviv-Yafo, Israel          |
| 21 | Reblaze                                             | Tel Aviv-Yafo, Israel          |
| 22 | ELPC Networks Ltd                                   | Rosh Haayin, Israel            |
| 23 | Taldor                                              | Holon, Israel                  |
| 24 | Opsfleet                                            | Kfar Bin Nun, Israel           |
| 25 | Clarity                                             | Petah Tikva, Israel            |
| 26 | Hozek Technologies Ltd.                             | Petah Tikva, Israel            |
| 27 | ERG Solutions                                       | Ramat Gan, Israel              |
| 28 | SCADAfence                                          | Ramat Gan, Israel              |
| 29 | Ness Technologies | נס טכנולוגיות                   | Tel Aviv-Yafo, Israel          |
| 30 | Bynet Data Communications Bynet Data Communications | Tel Aviv-Yafo, Israel          |
| 31 | Radware                                             | Tel Aviv-Yafo, Israel          |
| 32 | BigData Boutique                                    | Rishon LeTsiyon, Israel        |
| 33 | NetNUt                                              | Tel Aviv-Yafo, Israel          |
| 34 | Asperii                                             | Petah Tikva, Israel            |
| 35 | PractiProject                                       | Ramat Gan, Israel              |
| 36 | K8Support                                           | Bnei Brak, Israel              |
| 37 | Odix                                                | Rosh Haayin, Israel            |
| 38 | Adaptiq                                             | Tel Aviv-Yafo, Israel          |
| 39 | Israel IT                                           | Tel Aviv-Yafo, Israel          |
| 40 | Panaya                                              | Hod Hasharon, Israel           |
| 41 | MazeBolt Technologies                               | Giv'atayim, Israel             |
| 42 | ActiveFence                                         | Binyamina-Giv'at Ada, Israel   |
| 43 | Komodo Consulting                                   | Ra'anana, Israel               |
| 44 | MindU                                               | Tel Aviv-Yafo, Israel          |
| 45 | Valinor Ltd.                                        | Petah Tikva, Israel            |
| 46 | entrypoint                                          | Modi'in-Maccabim-Re'ut, Israel |
| 47 | Code n' Roll                                        | Haifa, Israel                  |
| 48 | Linnovate                                           | Bnei Brak, Israel              |
| 49 | Adelante                                            | Tel Aviv-Yafo, Israel          |
| 50 | develeap                                            | Tel Aviv-Yafo, Israel          |
| 51 | Chalir.com                                          | Binyamina-Giv'at Ada, Israel   |
| 52 | Trinity Agency                                      | Tel Aviv-Yafo, Israel          |
| 53 | MeteorOps                                           | Tel Aviv-Yafo, Israel          |
| 54 | Penguin Strategies                                  | Ra'anana, Israel               |
| 55 | ANG Solutions                                       | Tel Aviv-Yafo, Israel          |
| 56 | Sanapix - Web & Media Services                      | Umm al-Fahm, Israel            |
| 57 | Pen and Chip Consulting                             | Netanya, Israel                |
+----+-----------------------------------------------------+--------------------------------+

赞(0）回复(0）举报 2023-06-04

我来回答

pandas 从www.example.com获取数据clutch.co：使用BS4，请求失败：

1条答案

相关问题

热门标签

最新问答