Python实验项目9 ：网络爬虫与自动化

2023-12-17 18:58:01
开发
62

实验 1：爬取网页中的数据。

要求：使用 urllib 库和 requests 库分别爬取 http://www.sohu.com 首页的前 360 个字节的数据。

# 要求：使用 urllib 库和 requests 库分别爬取 http://www.sohu.com 首页的前 360 个字节的数据。
import urllib.request
import requests
# 使用 urllib 库爬取 http://www.sohu.com 首页的前 360 个字节的数据。
url = 'http://www.sohu.com'
req = urllib.request.Request(url)
res = urllib.request.urlopen(req)
data = res.read(360)
print(data)


# 使用 requests 库爬取 http://www.sohu.com 首页的前 360 个字节的数据。
#url = 'http://www.sohu.com'
#res = requests.get(url)
#data = res.content[:360]
#print(data)

实验 2：测试 BeautifulSoup 对象的方法。

要求：

1）创建 BeautifulSoup 对象。

2）测试搜索文档树的 find_all()方法和 find()方法。

# 实验 2：测试 BeautifulSoup 对象的方法。
# 要求：
# 1）创建 BeautifulSoup 对象。
# 2）测试搜索文档树的 find_all()方法和 find()方法。
from bs4 import BeautifulSoup
import requests
# 过http请求加载网页
response = requests.get("http://www.sohu.com")
# 创建BeautifulSoup对象
soup = BeautifulSoup(response.text, "html.parser")
# 搜索文档树的find_all()方法
print(soup.find_all("a"))
# 搜索文档树的find()方法
print(soup.find("a"))

实验 3：爬取并分析网页页面数据。

（1）使用requests库爬取https://www.hnnu.edu.cn/main.htm首页内容。

（2）编写程序获取https://www.hnnu.edu.cn/119/list.htm的通知公告的信息。

# 实验 3：爬取并分析网页页面数据。
# （1）使用requests库爬取https://www.hnnu.edu.cn/main.htm首页内容。
# （2）编写程序获取https://www.hnnu.edu.cn/119/list.htm的通知公告的信息。
import requests
from bs4 import BeautifulSoup
url = 'https://www.hnnu.edu.cn/main.htm'
res = requests.get(url)
soup = BeautifulSoup(res.text,'html.parser')
print(soup.find_all('a'))
print(soup.find('a'))

for i in range(1,23,1):
    url = 'https://www.hnnu.edu.cn/119/list.htm{}.htm'.format(i)
    res = requests.get(url)
    soup = BeautifulSoup(res.text,'html.parser')
    print("-------------------------------------------------------")
    print(soup)
    #print(soup.find('a'))

原文地址:https://blog.csdn.net/m0_63949203/article/details/135044464 本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：https://www.suanlizi.com/kf/1736339962841927680.html 如若内容造成侵权/违法违规/事实不符，请联系《酸梨子》网邮箱：1419361763@qq.com进行投诉反馈，一经查实，立即删除！

阅读全部