【ElasticSearch】ES 5.6.15 向量插件支持

参考 :
https://github.com/lior-k/fast-elasticsearch-vector-scoring

  1. 下载插件

  2. 安装插件
    插件目录:
    elasticsearch/plugins,
    安装后的目录如下

     plugins
     └── vector
         ├── elasticsearch-binary-vector-scoring-5.6.9.jar
         └── plugin-descriptor.properties
    

    修改 plugin-descriptor.properties 中的 elasticsearch.version 为 5.6.15(因为这里使用的是5.6.15版本ES),安装完成后重启ES。

  3. 构建测试索引

    PUT /vector_test
    {
      "settings": {
        "index": {
          "number_of_shards": 3,
          "number_of_replicas": 0
        }
      },
      "mappings": {
        "resume": {
          "dynamic": "strict",
          "properties": {
            "file_hash": {
              "type": "keyword"
            },
            "embedding_vector": {
              "type": "binary",
              "doc_values": true
            },
            "doc": {
              "type": "text"
            }
          }
        }
      }
    }
    
  4. 构建测试数据

使用如下方法生成向量base64字符串

import base64
import numpy as np
 
dfloat32 = np.dtype('>f4')
 
def decode_float_list(base64_string):
    bytes = base64.b64decode(base64_string)
    return np.frombuffer(bytes, dtype=dfloat32).tolist()
 
def encode_array(arr):
    base64_str = base64.b64encode(np.array(arr).astype(dfloat32)).decode("utf-8")
    return base64_str

print(encode_array([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]))
print(encode_array([0.001,0.002,0.003,0.004,0.005,0.006,0.007,0.008,0.009,0.010]))

将上述得到的结果放到下面内容(embedding_vector)中,这里 embedding_vector 要求传入上述方式base64生成的字符串

PUT /vector_test/resume/1
{
  "file_hash": "hash1",
  "embedding_vector": "PczMzT5MzM0+mZmaPszMzT8AAAA/GZmaPzMzMz9MzM0/ZmZmP4AAAA==",
  "doc": "This is the content of the first document."
}

PUT /vector_test/resume/2
{
  "file_hash": "hash2",
  "embedding_vector": "OoMSbzsDEm87RJumO4MSbzuj1wo7xJumO+VgQjwDEm88E3S8PCPXCg==",
  "doc": "This is the content of the second document."
}
  1. 查询测试

    POST /vector_test/resume/_search
    {
      "query": {
        "function_score": {
          "boost_mode": "replace",
          "script_score": {
            "script": {
              "source": "binary_vector_score",
              "lang": "knn",
              "params": {
                "cosine": true,
                "field": "embedding_vector",
                "vector": [
                  1.0,
                  0.8,
                  0.2223,
                  0.7,
                  0.6,
                  0.5,
                  0.4,
                  0.3,
                  0.2,
                  0.1
                ]
              }
            }
          }
        }
      },
      "size": 2,
      "_source": [
        "file_hash"
      ]
    }
    

    查询结果

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 4,
        "max_score": 0.998783,
        "hits": [
          {
            "_index": "vector_test",
            "_type": "resume",
            "_id": "4",
            "_score": 0.998783,
            "_source": {
              "file_hash": "hash4"
            }
          },
          {
            "_index": "vector_test",
            "_type": "resume",
            "_id": "1",
            "_score": 0.5818508,
            "_source": {
              "file_hash": "hash1"
            }
          }
        ]
      }
    }
    

相关推荐

  1. ElasticSearch】ES 5.6.15 向量支持

    2024-07-11 05:34:01       22 阅读
  2. Elasticsearch 支持 —— 筑梦之路

    2024-07-11 05:34:01       36 阅读
  3. ElasticSearch手动安装

    2024-07-11 05:34:01       69 阅读
  4. ElasticSearch安装及配置

    2024-07-11 05:34:01       37 阅读
  5. 【C/C++】VSCode 支持

    2024-07-11 05:34:01       29 阅读
  6. Elasticsearch:(二)3.安装Elasticsearch-head

    2024-07-11 05:34:01       57 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-11 05:34:01       53 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-11 05:34:01       55 阅读
  3. 在Django里面运行非项目文件

    2024-07-11 05:34:01       46 阅读
  4. Python语言-面向对象

    2024-07-11 05:34:01       56 阅读

热门阅读

  1. netty udp创建服务端+客户端

    2024-07-11 05:34:01       21 阅读
  2. 用SmartSql从数据库表中导出文档

    2024-07-11 05:34:01       19 阅读
  3. 速盾:cdn 缓存图片

    2024-07-11 05:34:01       19 阅读
  4. 【seo常见的问题】搜索引擎

    2024-07-11 05:34:01       23 阅读
  5. D1.排序

    D1.排序

    2024-07-11 05:34:01      21 阅读
  6. Leetcode 1143. Longest Common Subsequence

    2024-07-11 05:34:01       22 阅读
  7. 从像素角度出发使用OpenCV检测图像是否为彩色

    2024-07-11 05:34:01       25 阅读
  8. ES索引模板

    2024-07-11 05:34:01       17 阅读
  9. ”极大似然估计“和”贝叶斯估计“思想对比

    2024-07-11 05:34:01       20 阅读
  10. 理解Gunicorn:Python WSGI服务器的基石

    2024-07-11 05:34:01       21 阅读