scrapy爬虫实践(部分源代码)

items.py

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class SyItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    name=scrapy.Field()

spider_title.py

import scrapy
from sy.items import SyItem

class SpiderTitleSpider(scrapy.Spider):
    name = "spider_title"
    allowed_domains = ["www.zongheng.com"]
    start_urls = ["https://read.zongheng.com/chapter/1215341/68208370.html"]

    def parse(self, response):
        item = SyItem()
        titles = [each.extract() for each in response.xpath('//*[@id="Jcontent"]/div/div[4]/p/text()')]
        print(titles)
        item['name']=titles
        print(type(titles))
        f=open('aa.txt','w')
        for asd in titles:
            f.write(asd+'\n')
        return item

相关推荐

  1. scrapy爬虫实践部分源代码

    2024-04-21 11:58:07       181 阅读
  2. scrapy爬虫实战部分源代码

    2024-04-21 11:58:07       39 阅读
  3. Scrapy爬虫开发实验

    2024-04-21 11:58:07       44 阅读
  4. Scrapy+Selenium项目实战--携程旅游信息爬虫

    2024-04-21 11:58:07       53 阅读
  5. 第十八天-Scrapy爬虫框架实战(瓜子二手车)

    2024-04-21 11:58:07       50 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-04-21 11:58:07       98 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-04-21 11:58:07       106 阅读
  3. 在Django里面运行非项目文件

    2024-04-21 11:58:07       87 阅读
  4. Python语言-面向对象

    2024-04-21 11:58:07       96 阅读

热门阅读

  1. MATLAB初学者入门(8)—— 动态规划

    2024-04-21 11:58:07       38 阅读
  2. 顺序表的就地倒置(C语言)

    2024-04-21 11:58:07       40 阅读
  3. wx小程序-input事件改变数据

    2024-04-21 11:58:07       39 阅读
  4. 数据库第五次作业官方答案

    2024-04-21 11:58:07       40 阅读
  5. 聊聊linux的文件缓存

    2024-04-21 11:58:07       39 阅读
  6. 从表中生成SQL*Loader insert into 语句

    2024-04-21 11:58:07       31 阅读
  7. 框架中的单例模式

    2024-04-21 11:58:07       34 阅读