Python 获取 SQL 指纹和 HASH 值

前言

本文介绍一个提取 SQL 指纹的方法,就是将 SQL 语句的条件转换为 ?可用于脱敏和 SQL 聚类分析的场景。

1. 工具安装

这里用到的工具,就是 pt 工具集中的 pt-fingerprint 含在 Percona Toolkit 中,安装方法可参考 Percona Toolkit Install 这篇文章的 1.1 小节。

2. 实验案例

测试一个简单的 SQL:

pt-fingerprint --query "select a, b, c from users where id = 500"

输出:
select a, b, c from users where id = ?

复杂的长 SQL 建议使用文件的方式,否则会出现一些预期之外的问题,另外 SQL 中不能包含 # – 这些注释符号。

pt-fingerprint select.sql

3. Python 组合分析

代码中的 command_bin_path 就是 pt-fingerprint 的路径,可使用 which 命令查看,

# -*- encoding: utf-8 -*-
import os
import uuid
import time
import hashlib
import subprocess

# which pt-fingerprint
command_bin_path = "/usr/local/bin/pt-fingerprint"


def get_sql_hash(sql_query: str):
    sql_info = ' '.join(sql_query.split())
    sql_hash = hashlib.md5(sql_info.encode()).hexdigest()
    return sql_hash


def exec_dos_command(command):
    """
    Execute system commands.
    """
    process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE,
                               stderr=subprocess.STDOUT)
    content = process.stdout.read()
    process.communicate()

    if process.returncode != 0:
        print('Program Error: {0}'.format(command))
        print(content)
        sys.exit(0)
    else:
        return content.decode()


def get_sql_fingerprint(sql_text):
    """
    输出 SQL 语句,返回指纹化的 SQL 语句和 SQL HASH
    """
    timestamp_ns = time.time_ns()

    uuid_str = str(uuid.uuid4())[:16]

    file_name = f"{timestamp_ns}_{uuid_str}" + '.sql'

    with open(file_name, 'w') as w1:
        w1.write(sql_text)

    dos_command = command_bin_path + ' ' + file_name

    content = exec_dos_command(dos_command)

    sql_hash = get_sql_hash(content)

    os.remove(file_name)

    return content, sql_hash


sql1 = "select * from tb_user where id = 10;"
sql2 = "select * from tb_user where id = 11;"
sql3 = "select * from tb_user where id = 13;"
sql4 = "select * from tb_user where id = 14;"
sql5 = "select * from tb_user where id = 1576;"
sql6 = "select * from tb_user where id = 19;"

print(get_sql_fingerprint(sql1))
print(get_sql_fingerprint(sql2))
print(get_sql_fingerprint(sql3))
print(get_sql_fingerprint(sql4))
print(get_sql_fingerprint(sql5))
print(get_sql_fingerprint(sql6))

输出:

('select * from tb_user where id = ?;\n', 'ea72157cdf3e46c55792f49d01d1ce19')
('select * from tb_user where id = ?;\n', 'ea72157cdf3e46c55792f49d01d1ce19')
('select * from tb_user where id = ?;\n', 'ea72157cdf3e46c55792f49d01d1ce19')
('select * from tb_user where id = ?;\n', 'ea72157cdf3e46c55792f49d01d1ce19')
('select * from tb_user where id = ?;\n', 'ea72157cdf3e46c55792f49d01d1ce19')
('select * from tb_user where id = ?;\n', 'ea72157cdf3e46c55792f49d01d1ce19')

提供了将 sql 转换为指纹和 sql hash 的函数,大家可以将此定制到自己的代码和功能中。

相关推荐

  1. Python 获取 SQL 指纹 HASH

    2024-07-11 14:02:03       26 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-11 14:02:03       67 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-11 14:02:03       71 阅读
  3. 在Django里面运行非项目文件

    2024-07-11 14:02:03       58 阅读
  4. Python语言-面向对象

    2024-07-11 14:02:03       69 阅读

热门阅读

  1. 井字棋 AI-Python

    2024-07-11 14:02:03       25 阅读
  2. android解锁remount

    2024-07-11 14:02:03       27 阅读
  3. 洛谷 P3008 [USACO11JAN] Roads and Planes G

    2024-07-11 14:02:03       22 阅读
  4. 2.Spring的IOC容器里面加入对象的常见方式

    2024-07-11 14:02:03       24 阅读
  5. React基础学习-Day02

    2024-07-11 14:02:03       20 阅读
  6. MyClass.static_method() 加不加括号有什么区别

    2024-07-11 14:02:03       23 阅读
  7. AcWing 1633:外观数列

    2024-07-11 14:02:03       26 阅读
  8. nginx的重定向

    2024-07-11 14:02:03       24 阅读
  9. SpringBoot整合Easy-Es最佳实践

    2024-07-11 14:02:03       21 阅读