【Python爬虫】案例_斗鱼

声明:案例只用于学习,不得恶意使用

要求:获取直播间标题、类型、主播、热度,并实现翻页

定位随着网站更新可能不会实现,请自行更改

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

chrome_options = Options()
chrome_options.page_load_strategy = 'eager'
service = Service('chromedriver.exe路径')

class Douyu(object):

    def __init__(self):
        self.url = 'https://www.douyu.com/directory/all'
        self.driver = webdriver.Chrome(service=service, options=chrome_options)
        self.driver.implicitly_wait(5)

    def parse_data(self):
        time.sleep(3)
        data_list= []
        # 遍历房间列表,从每一个房间节点中获取数据
        for i in range(1,121):
            temp = {}
            temp['title'] = self.driver.find_element(By.XPATH, f'//li[{i}]/div/a/div[2]/div[1]/h3').text
            temp['type'] = self.driver.find_element(By.XPATH, f'//li[{i}]/div/a/div[2]/div[1]/span').text
            temp['owner'] = self.driver.find_element(By.XPATH, f'//li[{i}]/div/a/div[2]/div[2]/h2').text
            temp['num'] = self.driver.find_element(By.XPATH, f'//li[{i}]/div/a/div[2]/div[2]/span').text
            data_list.append(temp)
        return data_list

    def save_data(self,data_list):
        for data in data_list:
            print(data)

    def run(self):
        self.driver.get(self.url)
        while True:
            data_list = self.parse_data()
            self.save_data(data_list)

            try:
                el_next = self.driver.find_element(By.XPATH, '//*[@title="下一页"][@aria-disabled="false"]')
                self.driver.execute_script('scrollTo(0,1000000)')
                el_next.click()
            except:
                break

if __name__ == '__main__':
    douyu = Douyu()
    douyu.run()

【Python爬虫】Selenium使用

相关推荐

  1. Python爬虫案例_

    2024-05-25 18:44:10       38 阅读
  2. 应用开发:python解析弹幕

    2024-05-25 18:44:10       38 阅读
  3. Python爬虫实战案例

    2024-05-25 18:44:10       63 阅读
  4. Python爬虫案例分享

    2024-05-25 18:44:10       55 阅读
  5. python爬虫案例分享

    2024-05-25 18:44:10       55 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-05-25 18:44:10       91 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-05-25 18:44:10       97 阅读
  3. 在Django里面运行非项目文件

    2024-05-25 18:44:10       78 阅读
  4. Python语言-面向对象

    2024-05-25 18:44:10       88 阅读

热门阅读

  1. 【Linux+Docker】修改Docker容器中的hosts文件

    2024-05-25 18:44:10       29 阅读
  2. GPT-3:自然语言处理的预训练模型

    2024-05-25 18:44:10       31 阅读
  3. 语言模型的发展

    2024-05-25 18:44:10       35 阅读
  4. Android实现二维码扫描自定义扫描界面

    2024-05-25 18:44:10       31 阅读
  5. pytorch学习(四):Dataloader使用

    2024-05-25 18:44:10       31 阅读
  6. torchdata pytorch2.3 报错

    2024-05-25 18:44:10       28 阅读