python爬取drugbank

爬虫代码:

#coding:utf-8
import requests, json, random, time
from bs4 import BeautifulSoup

def dig(drugbank_accession_number="DB00460"):
    url = "https://go.drugbank.com/drugs/" + drugbank_accession_number
    # url = "https://en.wikipedia.org/wiki/Verteporfin"
    headers = {
        "User-Agent": "User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
    }

    # 代理IP池
    proxies = {
        "http": "http://127.0.0.1:7890",
        "https": "http://127.0.0.1:7890",
    }
    # proxy = random.choice(proxy_pool)  # 随机选择代理IP

    # # 发送请求获取响应
    response = requests.get(url, headers=headers, proxies=proxies)
    # print(response.text)
    soup = BeautifulSoup(response.content, 'html.parser')
    # soup = BeautifulSoup(hhhh(), 'html.parser')

    # 提取Drug Name
    drug_name = soup.find('dt', {'id': 'generic-name'}).find_next_sibling('dd').text.strip()

    # # 提取DrugBank Accession Number
    # drugbank_accession_number = soup.find('dt', {'id': 'drugbank-accession-number'}).find_next_sibling('dd').text.strip()

    # 提取Background
    background = soup.find('dt', {'id': 'background'}).find_next_sibling('dd').text.strip()

    # 提取Type
    type_value = soup.find('dt', {'id': 'type'}).find_next_sibling('dd').text.strip()

    # 提取Chemical Formula
    if soup.find('dt', {'id': 'chemical-formula'}):
        chemical_formula = soup.find('dt', {'id': 'chemical-formula'}).find_next_sibling('dd').text.strip()
    else:
        chemical_formula = ''

    # drug text
    drug_text = ''
    if background !='':
        drug_text += background + ' '
    if drug_name != '':
        drug_text += drug_name
    if type_value !='':
        drug_text += ' is of the type {}'.format(type_value)
    drug_text += ', number {}'.format(drugbank_accession_number)
    if chemical_formula != '':
        drug_text += ' and has the molecular formula {}.'.format(chemical_formula)

    with open('drug_text.json', 'a', encoding='utf-8') as f:
        f.write(json.dumps({drug_name: drug_text}, ensure_ascii=False) + '\n')
    with open('drug_order_name.json', 'a', encoding='utf-8') as f:
        f.write(json.dumps({drugbank_accession_number: drug_name}, ensure_ascii=False) + '\n')
# dig()

def main():
    # 从0到1709找到每个药物的DrugBank Accession Number,然后调用dig函数获取相关信息
    with open('id2node.json', 'r', encoding='utf-8') as f:
        id2node = json.load(f)
        
    for i in range(1007,len(id2node)):
        drugbank_accession_number = id2node[str(i)]
        print("{},{}".format(i,drugbank_accession_number), end='')
        dig(drugbank_accession_number)
        print(', over.')
        time.sleep(3)
        # break
main()

其中,

    # 代理IP池
    proxies = {
        "http": "http://127.0.0.1:7890",
        "https": "http://127.0.0.1:7890",
    }

指的是本地的vpn代理,我用的是clash客户端,默认是"http://127.0.0.1:7890",

相关推荐

  1. pythondrugbank

    2024-03-13 14:34:01       45 阅读
  2. python电影

    2024-03-13 14:34:01       33 阅读
  3. Python小说

    2024-03-13 14:34:01       31 阅读
  4. Pytho音乐

    2024-03-13 14:34:01       42 阅读
  5. python 接口数据】

    2024-03-13 14:34:01       57 阅读
  6. 图片python代码

    2024-03-13 14:34:01       60 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-03-13 14:34:01       99 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-03-13 14:34:01       107 阅读
  3. 在Django里面运行非项目文件

    2024-03-13 14:34:01       90 阅读
  4. Python语言-面向对象

    2024-03-13 14:34:01       98 阅读

热门阅读

  1. 人工智能的迷惑行为:一场技术与期望的较量

    2024-03-13 14:34:01       48 阅读
  2. C#多态例讲

    2024-03-13 14:34:01       43 阅读
  3. VC下显示位图的几种方法

    2024-03-13 14:34:01       32 阅读
  4. C语言代码 判断输入的字符是不是字母

    2024-03-13 14:34:01       41 阅读
  5. c#检查两个时间段是否重叠

    2024-03-13 14:34:01       49 阅读
  6. .NET CORE Aws S3 使用

    2024-03-13 14:34:01       36 阅读
  7. 【R3F】11.模型加载

    2024-03-13 14:34:01       42 阅读
  8. 修改ubuntu的子网掩码

    2024-03-13 14:34:01       43 阅读
  9. 神经网络中的先验知识

    2024-03-13 14:34:01       45 阅读
  10. 什么是网站服务器?

    2024-03-13 14:34:01       45 阅读
  11. LeetCode 面试题08.04.幂集

    2024-03-13 14:34:01       41 阅读