简单爬虫,爬取某东某商品评论前十页

商品链接地址:【博世四坑5系 6x100x160】博世(BOSCH)四坑5系(1支装)圆柄两坑两槽混凝土钻头 6x100x160mm【行情 报价 价格 评测】-京东

首先抓包,用搜索框搜索评论,看评论在哪个包中

为了好看筛选出含评论的包

点击下一页,观察包所需的参数, 发现参数只有页码和时间戳在变动,其他的是不变的。为多页采集打下伏笔。

注意:由于是登录后才看到的数据,所以请求头必须带上cookie。

代码展现:

import time
import csv
import requests
import random
f = open('京东评论.csv','w',encoding='utf-8',newline='')
csv_writer = csv.DictWriter(f,fieldnames=[
'用户名',
                'id',
                '地区',
                '评论内容',
                '评分',
                '发布时间',
                '产品名称',
])
csv_writer.writeheader()
for page in range(11):
    time.sleep(random.randint(1,2))
    time_stamp = round(time.time()*1000)
    url = f'https://api.m.jd.com/?appid=item-v3&functionId=pc_club_productPageComments&client=pc&clientVersion=1.0.0&t={time_stamp}&loginType=3&uuid=181111935.1930210335.1708610333.1709306254.1711849783.4&productId=4196453&score=0&sortType=5&page={page}&pageSize=10&isShadowSku=0&rid=0&fold=1&bbtf=&shield='

    headers = {
        "Cookie":"__jdu=1930210335; shshshfpa=50f375eb-d40e-875b-b12e-60f25c558f68-1708833205; shshshfpx=50f375eb-d40e-875b-b12e-60f25c558f68-1708833205; pinId=TE2ybcOoPKG8rrYc3tjSZQ; pin=jd_EqEAJlJpFMdo; unick=jd_EqEAJlJpFMdo; _tp=RWXTruHLzClu1Jr8dalxnA%3D%3D; _pst=jd_EqEAJlJpFMdo; unpl=JF8EAJlnNSttXBlQAxkAS0ZHQ19QWwgOSB4DPzAMA1gKTlYCElVIExN7XlVdWBRKFR9tZxRVVVNOXA4eBysSEXteU11bD00VB2xXXAQDGhUQR09SWEBJJVlQXl4ITxcFZ2A1ZF5Ye1QEKwITEBFIXVVcXwx7FjNoVzVkW15LXAAfMhoiEXsfAAJaCkkWBWsqBVxfWUhUBBkAHyIRe14; __jdv=76161171|haosou-search|t_262767352_haosousearch|cpc|5512151796_0_5c5733aef9354d7281af8f4c4368fb02|1711849782659; 3AB9D23F7A4B3CSS=jdd03ACXEBVQFK5CENBAAIJBFXEVTJJLASPPEJFZ3OCGW4XMUIJOJEFOYFJW65TOP4KLW5NUCXDZJI6EZMVMZZBGCGSTDK4AAAAMOSIZMZYQAAAAACNODVDBW2XGC7AX; areaId=16; PCSYCityID=CN_350000_350200_0; jsavif=1; mba_muid=1930210335; mba_sid=17118498046253294520777664478.1; wlfstk_smdl=ruq378izoxctsf68q0moa62n09avs0vk; 3AB9D23F7A4B3C9B=ACXEBVQFK5CENBAAIJBFXEVTJJLASPPEJFZ3OCGW4XMUIJOJEFOYFJW65TOP4KLW5NUCXDZJI6EZMVMZZBGCGSTDK4; TrackID=1ImlE8evpOBJ-A7TfoaTf1rj17ecUqX_rFVmKW5koLDJ-z-hPF891kXrl_pPR-Vl_OreLzXiFrIAlSNa8u7EJ7VljgDRFlmcDgDwXDVtoQzc; thor=14A2BF46C9D373164FD5F0F0853ABB0A32213C0D7A122DF30D8F8EE5ABA85119CD6D2E7F58BC88478FAC27561ACEA557C5AF3D3C9CAB12D8AFC18C78B4F2604EEB88104BDCC0E46B424C96FF4A7BF4919F39CE2A6B9F05A3AE1F3EC169975EA6EDD572EACAF20DBB2F06C7747B18EEA871EB8163C380EE4669DD2B3C63E22684C9C584434F1270A149839B8B75B5037818D994C78A8099CD96DCEDFE2B6C1282; flash=2_MuVfxCqVzUYO62kneu2pLOLP0HXmZ_vy1NhBv2n_l3cJIO_SbKDwIGmxmM6TrJuMScbxB0IM961Talmbo9UN9HMSu9bQ-0b_g6iIVbVOGXP*; ceshi3.com=000; token=1b7217805648cce5b01ea895ad9a9580,3,951027; __tk=047903b7493ced1bbf22203c719a07f2,3,951027; __jda=181111935.1930210335.1708610333.1709306254.1711849783.4; __jdc=181111935; ipLoc-djd=16-1315-3486-59641; __jdb=181111935.7.1930210335|4.1711849783; shshshfpb=BApXeo-o8ketABdv7Jklp7wPnCYXnkz6yBkrCNSpg9xJ1MguCx4O2",
        "Referer":"https://item.jd.com/",
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    }

    response = requests.get(url=url,headers=headers)
    response.encoding = response.apparent_encoding
    for index in response.json()['comments']:
        try:
            dit = {
                '用户名':index['nickname'],
                'id':index['id'],
                '地区':index['location'],
                '评论内容':index['content'],
                '评分':index['score'],
                '发布时间':index['creationTime'],
                '产品名称':index['referenceName'],
            }
        except:
            dit = {
                '用户名': index['nickname'],
                'id': index['id'],
                '地区': '未知',
                '评论内容': index['content'],
                '评分': index['score'],
                '发布时间': index['creationTime'],
                '产品名称': index['referenceName'],
            }
        print(dit)
        csv_writer.writerow(dit)

结果展现:

注意:保存在csv中时,打开会出现乱码,有人能解决吗。 

相关推荐

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-03-31 13:42:03       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-03-31 13:42:03       100 阅读
  3. 在Django里面运行非项目文件

    2024-03-31 13:42:03       82 阅读
  4. Python语言-面向对象

    2024-03-31 13:42:03       91 阅读

热门阅读

  1. 使用 Newtonsoft.Json 将表单数据转换成对象

    2024-03-31 13:42:03       39 阅读
  2. Python学习之-分支语句-基础训练

    2024-03-31 13:42:03       38 阅读
  3. 常见的 BlockingQueue

    2024-03-31 13:42:03       36 阅读
  4. Hive常用函数_20个字符串处理

    2024-03-31 13:42:03       32 阅读
  5. Codeforces Round 817 (Div. 4)

    2024-03-31 13:42:03       38 阅读
  6. vue中v-model与:model以及v-bind区别

    2024-03-31 13:42:03       39 阅读
  7. 如何冷迁移Oracle RAC到单机(非RMAN)

    2024-03-31 13:42:03       29 阅读