网页如何集成各社区征文活动

2024-05-14 06:54:15
开发
30

Helllo , 我是小恒
由于我需要腾讯云社区，稀土掘金以及CSDN的征文活动RSS，找了一下没发现，所以使用GET
请求接口对网页定时进行拉取清洗，甚至无意间做了一个简单的json格式API

最终网址:hub.liheng.work
API:http://hub.liheng.work/activities.json
GitHub:https://github.com/lmliheng/hub
在这里插入图片描述

原理

由于浏览器的同源策略产生的跨域问题，使得CSDN官方URL无法被请求获取展示到前端
使用后端代码GET网页代码，对其进行数据清洗，并导入json文件
注意后端程序的定时任务以及日志打印
前端代码调用本地json，也不存在跨域，从而实现需求

代码结构

├───pyproject/
│   ├───activities.json
│   ├───htmlone.py
│   ├───index.html
│   ├───script.log

后端

实现HTML转json的数据清洗，以及打印日志到scripts.log文件

#作者：小恒不会java
#时间：2024年5月13日
#微信：a13551458597
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
import json
import logging
from datetime import datetime

logging.basicConfig(filename='script.log', level=logging.INFO)
logging.info('Script started at {}'.format(datetime.now()))

# 获取HTML内容,这种形式是避免get请求的跨域问题
url = 'https://bbs.csdn.net/forums/activity?spm=1035.2022.3001.8781&typeId=745490'
response = requests.get(url)
html_content = response.text

# 使用BeautifulSoup解析HTML
soup = BeautifulSoup(html_content, 'html.parser')

activities = []

# 检查做到避免重复活动
posts = soup.find_all('div', {'class': 'content'})
for post in posts:
    activity = {}
    
    # 获取活动名称
    title_element = post.find('div', {'class': 'long-text-title'})
    if title_element:
        activity['name'] = title_element.text.strip()
    
    # 获取活动简介
    desc_element = post.find('div', {'class': 'item-desc'})
    if desc_element:
        activity['description'] = desc_element.text.strip()
    
    # 获取活动链接
    link_element = post.find('a', href=True)
    if link_element:
        activity['link'] = link_element['href']
    
    # 检查活动是否已存在
    if 'link' in activity and not any(existing_activity['link'] == activity['link'] for existing_activity in activities):
        activities.append(activity)

print(activities)

with open('activities.json', 'w', encoding='utf-8') as f:
    json.dump(activities, f, ensure_ascii=False, indent=4)


logging.info('Script finished at {}'.format(datetime.now()))

定时任务

我服务器系统是linux centos7
使用cron完成定时运行，并通过python代码日志打印检验运行情况

检查cron服务是否正在运行：
```shell
sudo systemctl status cron或者ceond

如果cron服务未运行，请使用以下命令启动它：

sudo systemctl start cron

编辑crontab文件

crontab -e

在打开的编辑器中，添加一行以设置定时任务。例如，要每天凌晨1点运行Python脚本，请添加以下行

0 1 * * * /usr/bin/python /path/to/your/script.py

列出当前用户的crontab条目：

crontab -l

日志打印检查

scripts.log

[root@iZ7xvavc793m36sybr4bw4Z hub.liheng.work]# cat scripts.log
INFO:root:Script started at 2024-05-13 21:11:36.571745
INFO:root:Script finished at 2024-05-13 21:11:37.311995
[root@iZ7xvavc793m36sybr4bw4Z hub.liheng.work]#

原文地址:https://blog.csdn.net/2302_81578551/article/details/138820849 本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：https://www.suanlizi.com/kf/1790153611699949568.html 如若内容造成侵权/违法违规/事实不符，请联系《酸梨子》网邮箱：1419361763@qq.com进行投诉反馈，一经查实，立即删除！

阅读全部

网页如何集成各社区征文活动

原理

代码结构

后端

定时任务

日志打印检查

相关推荐

最近更新

热门阅读