python爬虫之豆瓣首页图片爬取

 网址:https://movie.douban.com/

import requests
from lxml import etree
import re
url = 'https://movie.douban.com'
headers = {
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36'
}
session = requests.session()
response = session.get(url,headers = headers)
# response.encoding='utf-8'
# response.encoding = response.apparent_encoding
index_url = 'https://movie.douban.com'
res = session.get(index_url,headers=headers)
# print(res.text)
# 输出:页面源代码
tree = etree.HTML(res.text)
# print(tree)
# 输出:<Element html at 0x186fa6a3100>
img_all = tree.xpath('//img')
# print(img_all)
for i in img_all:
    img = etree.tostring(i, encoding='UTF-8').decode('UTF-8')
    # 得到所有的img标签
    # print(img)
    # <img src="https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2900931370.jpg" alt="&#x5C0F;&#x884C;&#x661F;&#x730E;&#x4EBA;" rel="nofollow" class=""/>
    img_url = tree.xpath('//img/@src')
    # img_name = tree.xpath('//img/@alt')
    # print(img_url,img_name)
    # 输出:许多个列表
    for i in img_url:
        # print(i)
        last_str = i.split('/')[-1]
        # print(last_str)
        # 输出:多个p2900931370.jpg  p2901057189.jpg
        every_name = last_str.split('.')[0]
        # print(every_name)
        # 输出:多个p2900931370  p2901057189
        res_url = session.get(i,headers=headers)
        with open(f'./img/{every_name}.jpg','wb') as f:
            f.write(res_url.content)

运行结果:

相关推荐

  1. 利用Python爬虫豆瓣电影排名信息

    2024-02-01 02:54:04       64 阅读
  2. python爬虫豆瓣TOP250用csv文件

    2024-02-01 02:54:04       30 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-02-01 02:54:04       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-02-01 02:54:04       100 阅读
  3. 在Django里面运行非项目文件

    2024-02-01 02:54:04       82 阅读
  4. Python语言-面向对象

    2024-02-01 02:54:04       91 阅读

热门阅读

  1. oracle 监听的主机名出现异常时候,排查放向

    2024-02-01 02:54:04       59 阅读
  2. 关于我用AI编写了一个聊天机器人……(8)

    2024-02-01 02:54:04       65 阅读
  3. VUE3中路由常用配置及常见问题解决方法

    2024-02-01 02:54:04       57 阅读
  4. Revit 二次开发过滤项目元素

    2024-02-01 02:54:04       56 阅读
  5. 将一个excel中的数据分发到多个excel文件中

    2024-02-01 02:54:04       48 阅读
  6. ChatGPT 和文心一言哪个更好用?

    2024-02-01 02:54:04       56 阅读
  7. MySQL运维实战(5.2) charset基本概念

    2024-02-01 02:54:04       72 阅读
  8. mysql二进制文件恢复为sql

    2024-02-01 02:54:04       46 阅读