python_BeautifulSoup爬取汽车评论数据

爬取的网站:

完整代码在文章末尾

https://koubei.16888.com/57233/0-0-0-2

使用方法: 

from bs4 import BeautifulSoup

拿到html后使用find_all()拿到文本数据,下图可见,数据标签为:

content_text = soup.find_all('span', class_='show_dp f_r')

 因为优点,缺点,综述的classname一样,所以写了个小分类:

   for index,x in enumerate(content_text):
        if index % 3 == 0:
            with open("car_post.txt", "a", encoding='utf-8') as f:
                f.write(x.text+"\n")
        elif index % 3 == 1:
            with open("car_nev.txt", "a", encoding='utf-8') as f:
                f.write(x.text+"\n")
        else:
            with open("car_text.txt", "a", encoding='utf-8') as f:
                f.write(x.text+"\n")

结果预览 

 

消极: 

 积极:

 综述:

 

完整代码 

from bs4 import BeautifulSoup
import requests
for j in range(1,300):
    url="https://koubei.16888.com/57233/0-0-0-{}".format(j)
    headers={
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.35"
    }
    resp=requests.get(url,headers=headers)
    resp.encoding="utf-8"
    soup=BeautifulSoup(resp.text,"html.parser")
    content_text = soup.find_all('span', class_='show_dp f_r')

    for index,x in enumerate(content_text):
        if index % 3 == 0:
            with open("car_post.txt", "a", encoding='utf-8') as f:
                f.write(x.text+"\n")
        elif index % 3 == 1:
            with open("car_nev.txt", "a", encoding='utf-8') as f:
                f.write(x.text+"\n")
        else:
            with open("car_text.txt", "a", encoding='utf-8') as f:
                f.write(x.text+"\n")
    print(j)

 

相关推荐

  1. 数据】Jsoup数据的使用

    2024-03-23 03:58:02       59 阅读
  2. 【python 接口数据

    2024-03-23 03:58:02       56 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-03-23 03:58:02       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-03-23 03:58:02       100 阅读
  3. 在Django里面运行非项目文件

    2024-03-23 03:58:02       82 阅读
  4. Python语言-面向对象

    2024-03-23 03:58:02       91 阅读

热门阅读

  1. AI大模型学习

    2024-03-23 03:58:02       39 阅读
  2. LeetCode2671. Frequency Tracker

    2024-03-23 03:58:02       34 阅读
  3. mysql char 与 varchar 的区别

    2024-03-23 03:58:02       38 阅读
  4. Mac安装Homebrew

    2024-03-23 03:58:02       42 阅读
  5. 1063:最大跨度值

    2024-03-23 03:58:02       42 阅读
  6. 洛谷入门——P1567 统计天数

    2024-03-23 03:58:02       44 阅读
  7. 合并两个 Git 仓库,保存所有提交记录

    2024-03-23 03:58:02       39 阅读
  8. CentOS7 安装和使用Docker

    2024-03-23 03:58:02       44 阅读
  9. HTML是什么,它在前端开发中扮演什么角色?

    2024-03-23 03:58:02       43 阅读