爬虫爬取豆瓣电影、价格、书名

1、爬取豆瓣电影top250

import requests
from bs4 import BeautifulSoup

headers = {
   
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}

for i in range(0, 250, 25):
    print(f"--------第{i+1}到{i+25}个电影------------")
    response = requests.get(f"https://movie.douban.com/top250?start={i}", headers=headers)

    if response.ok:
        html = response.text
        soup = BeautifulSoup(html, "html.parser")
        all_titles = soup.findAll("span", attrs={
   "class": "title"})
        j = i
        for title in all_titles:
            title_string = title.string
            if "/" not in title_string:
                j += 1
                print(f"{j}、{title_string}")
    else:
        print("请求失败")

2、爬取价格

import requests
from bs4 import BeautifulSoup

content = requests.get("http://books.toscrape.com/").text
soup = BeautifulSoup(content, "html.parser")
# 因为价格在标签为p的里面,所以写p,它的属性为class="price_color"
all_prices = soup.findAll("p", attrs={
   "class": "price_color"})
print(all_prices)
for price in all_prices:
    print(price.string[2:])

3、爬取书名

import requests
from bs4 import BeautifulSoup

content = requests.get("http://books.toscrape.com/").text
soup = BeautifulSoup(content, "html.parser")
# 因为书名在h3中,又包了一层a,所以先找h3,再找a
all_titles = soup.findAll("h3")
for title in all_titles:
    all_links = title.findAll("a")
    for link in all_links:
        print(link.string)

相关推荐

  1. 爬虫豆瓣电影价格书名

    2023-12-27 02:54:02       49 阅读
  2. 利用Python爬虫豆瓣电影排名信息

    2023-12-27 02:54:02       64 阅读
  3. 使用爬虫豆瓣电影Top250(方法二)

    2023-12-27 02:54:02       29 阅读
  4. 使用爬虫豆瓣电影Top250(方法一)

    2023-12-27 02:54:02       83 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2023-12-27 02:54:02       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2023-12-27 02:54:02       101 阅读
  3. 在Django里面运行非项目文件

    2023-12-27 02:54:02       82 阅读
  4. Python语言-面向对象

    2023-12-27 02:54:02       91 阅读

热门阅读

  1. 【AI】人工智能复兴的推进器之神经网络

    2023-12-27 02:54:02       61 阅读
  2. 二叉树路径总和系列问题

    2023-12-27 02:54:02       65 阅读
  3. 一、引言( C#的应用领域)

    2023-12-27 02:54:02       55 阅读
  4. 离线安装Python依赖:以six和websocket-client为例

    2023-12-27 02:54:02       56 阅读
  5. termux_ubuntu 系统配置

    2023-12-27 02:54:02       54 阅读