docker搭建普罗米修斯监控gpu

ip8的服务器监控ip110和ip111的服务器

被监控的服务器110和111只需要安装node-export和nvidia-container-toolkit

下载镜像包

docker pull prom/node-exporter
docker pull prom/prometheus
docker pull grafana/grafana

新建目录

mkdir /opt/prometheus
cd /opt/prometheus/
vim prometheus.yml
global:
  scrape_interval:     60s
  evaluation_interval: 60s
 
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']
        labels:
          instance: prometheus
 
  - job_name: linux
    static_configs:
      - targets: ['10.20.13.8:9100']
        labels:
          instance: master

  - job_name: node
    static_configs:
      - targets: ['10.20.13.111:9100','10.20.13.110:9100']

启动普罗米修斯

docker run  -d \
  -p 9090:9090 \
  -v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml  \
  prom/prometheus

启动node-export  

docker run -d -p 9100:9100 \
  -v "/proc:/host/proc:ro" \
  -v "/sys:/host/sys:ro" \
  -v "/:/rootfs:ro" \
  prom/node-exporter

新建目录

mkdir /opt/grafana-storage
chmod 777 -R /opt/grafana-storage

启动grafana

docker run -d \
  -p 3000:3000 \
  --name=grafana \
  -v /opt/grafana-storage:/var/lib/grafana \
  grafana/grafana

访问grafana  url

10.20.13.8:3000
默认会先跳转到登录页面,默认的用户名和密码都是admin

添加data source时,ip地址要填写本机Ip地址     http://ip:9090

安装显卡监控

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg   && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |     sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |     sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt update
apt upgrade 
apt-get install -y nvidia-container-toolkit
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

运行容器

docker run -d     --restart always     --gpus all     -p 9400:9400     --name gpu-exporter     nvcr.io/nvidia/k8s/dcgm-exporter:3.2.5-3.1.8-ubuntu22.04

在配置文件中加入端口

vim /opt/prometheus/prometheus.yml

加入一段

- job_name: gpu_metrics
    static_configs:
      - targets: ['10.20.13.111:9400','10.20.13.110:9400']

在grafanan导入监控gpu模板  id12239

相关推荐

  1. Prometheus()简介(1)

    2024-07-17 21:20:03       34 阅读
  2. Docker项目:服务器监控面板

    2024-07-17 21:20:03       50 阅读
  3. Docker环境监控备份

    2024-07-17 21:20:03       34 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-17 21:20:03       67 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-17 21:20:03       72 阅读
  3. 在Django里面运行非项目文件

    2024-07-17 21:20:03       58 阅读
  4. Python语言-面向对象

    2024-07-17 21:20:03       69 阅读

热门阅读

  1. [C++11] 模板函数的默认模板参数

    2024-07-17 21:20:03       17 阅读
  2. python-Web

    2024-07-17 21:20:03       20 阅读
  3. 企业和个人在网络安全方面需承担哪些责任?

    2024-07-17 21:20:03       18 阅读
  4. mysql高版本(8.0+)group_by报错的处理方法

    2024-07-17 21:20:03       18 阅读
  5. arm64机器指令转换为汇编指令

    2024-07-17 21:20:03       21 阅读
  6. 【Python Cookbook】S03E07 处理无穷大以及NaN

    2024-07-17 21:20:03       18 阅读
  7. 构建新纪元:Gradle中Kotlin插件的配置全指南

    2024-07-17 21:20:03       22 阅读
  8. 软设之命令模式

    2024-07-17 21:20:03       21 阅读
  9. Linux系统中调试蓝牙的常用命令

    2024-07-17 21:20:03       19 阅读
  10. C++中调用Pytorch模型

    2024-07-17 21:20:03       17 阅读
  11. 若依自定义文件上传下载

    2024-07-17 21:20:03       17 阅读