【Elasticsearch7.0】文档接口之mtermvectors接口

  |   0 评论   |   114 浏览

mtermvectors接口允许一次获取多个term向量,检索term向量的文档由索引和id指定,但这些文件也可以在请求本身中人为提供。响应包括一个文档数组,其中包含所有获取的termvector,每个元素都具有termvectors API提供的结构。举个例子:

curl -XPOST "http://127.0.0.1:9200/_mtermvectors?pretty" -H "Content-Type:application/json" -d'
{
   "docs": [
      {
         "_index": "twitter",
         "_id": "2",
         "term_statistics": true
      },
      {
         "_index": "twitter",
         "_id": "1",
         "fields": [
            "message"
         ]
      }
   ]
}'

返回值为

{
  "docs" : [
    {
      "_index" : "twitter",
      "_type" : "_doc",
      "_id" : "2",
      "_version" : 1,
      "found" : true,
      "took" : 0,
      "term_vectors" : {
        "text" : {
          "field_statistics" : {
            "sum_doc_freq" : 6,
            "doc_count" : 2,
            "sum_ttf" : 8
          },
          "terms" : {
            "..." : {
              "doc_freq" : 1,
              "ttf" : 1,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 3,
                  "start_offset" : 21,
                  "end_offset" : 24,
                  "payload" : "d29yZA=="
                }
              ]
            },
            "another" : {
              "doc_freq" : 1,
              "ttf" : 1,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 0,
                  "start_offset" : 0,
                  "end_offset" : 7,
                  "payload" : "d29yZA=="
                }
              ]
            },
            "test" : {
              "doc_freq" : 2,
              "ttf" : 4,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 2,
                  "start_offset" : 16,
                  "end_offset" : 20,
                  "payload" : "d29yZA=="
                }
              ]
            },
            "twitter" : {
              "doc_freq" : 2,
              "ttf" : 2,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 1,
                  "start_offset" : 8,
                  "end_offset" : 15,
                  "payload" : "d29yZA=="
                }
              ]
            }
          }
        },
        "fullname" : {
          "field_statistics" : {
            "sum_doc_freq" : 4,
            "doc_count" : 2,
            "sum_ttf" : 4
          },
          "terms" : {
            "doe" : {
              "doc_freq" : 2,
              "ttf" : 2,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 1,
                  "start_offset" : 5,
                  "end_offset" : 8,
                  "payload" : "d29yZA=="
                }
              ]
            },
            "jane" : {
              "doc_freq" : 1,
              "ttf" : 1,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 0,
                  "start_offset" : 0,
                  "end_offset" : 4,
                  "payload" : "d29yZA=="
                }
              ]
            }
          }
        }
      }
    },
    {
      "_index" : "twitter",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 1,
      "found" : true,
      "took" : 0,
      "term_vectors" : { }
    }
  ]
}

也可以指定具体的某个索引,如:

curl -XPOST "http://127.0.0.1:9200/twitter/_mtermvectors?pretty" -H "Content-Type:application/json" -d'
{
   "docs": [
      {
         "_id": "2",
         "fields": [
            "message"
         ],
         "term_statistics": true
      },
      {
         "_id": "1"
      }
   ]
}'

返回结果跟上面的差不多。
如果所有的请求文档都在相同的索引里,参数也是一样的,那么可以更加简单的调用,如:

curl -XPOST "http://127.0.0.1:9200/twitter/_mtermvectors?pretty" -H "Content-Type:application/json" -d'
{
    "ids" : ["1", "2"],
    "parameters": {
    	"fields": [
         	"text"
      	],
      	"term_statistics": true
    }
}'

返回值为:

{
  "docs" : [
    {
      "_index" : "twitter",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 1,
      "found" : true,
      "took" : 0,
      "term_vectors" : {
        "text" : {
          "field_statistics" : {
            "sum_doc_freq" : 6,
            "doc_count" : 2,
            "sum_ttf" : 8
          },
          "terms" : {
            "test" : {
              "doc_freq" : 2,
              "ttf" : 4,
              "term_freq" : 3,
              "tokens" : [
                {
                  "position" : 1,
                  "start_offset" : 8,
                  "end_offset" : 12,
                  "payload" : "d29yZA=="
                },
                {
                  "position" : 2,
                  "start_offset" : 13,
                  "end_offset" : 17,
                  "payload" : "d29yZA=="
                },
                {
                  "position" : 3,
                  "start_offset" : 18,
                  "end_offset" : 22,
                  "payload" : "d29yZA=="
                }
              ]
            },
            "twitter" : {
              "doc_freq" : 2,
              "ttf" : 2,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 0,
                  "start_offset" : 0,
                  "end_offset" : 7,
                  "payload" : "d29yZA=="
                }
              ]
            }
          }
        }
      }
    },
    {
      "_index" : "twitter",
      "_type" : "_doc",
      "_id" : "2",
      "_version" : 1,
      "found" : true,
      "took" : 0,
      "term_vectors" : {
        "text" : {
          "field_statistics" : {
            "sum_doc_freq" : 6,
            "doc_count" : 2,
            "sum_ttf" : 8
          },
          "terms" : {
            "..." : {
              "doc_freq" : 1,
              "ttf" : 1,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 3,
                  "start_offset" : 21,
                  "end_offset" : 24,
                  "payload" : "d29yZA=="
                }
              ]
            },
            "another" : {
              "doc_freq" : 1,
              "ttf" : 1,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 0,
                  "start_offset" : 0,
                  "end_offset" : 7,
                  "payload" : "d29yZA=="
                }
              ]
            },
            "test" : {
              "doc_freq" : 2,
              "ttf" : 4,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 2,
                  "start_offset" : 16,
                  "end_offset" : 20,
                  "payload" : "d29yZA=="
                }
              ]
            },
            "twitter" : {
              "doc_freq" : 2,
              "ttf" : 2,
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 1,
                  "start_offset" : 8,
                  "end_offset" : 15,
                  "payload" : "d29yZA=="
                }
              ]
            }
          }
        }
      }
    }
  ]
}

此外,就像termvectors API一样,可以为用户提供的文档生成termvectors,使用的映射由_index确定,如:

curl -XPOST "http://127.0.0.1:9200/_mtermvectors?pretty" -H "Content-Type:application/json" -d'
{
   "docs": [
      {
         "_index": "twitter",
         "doc" : {
            "text" : "John Doe",
            "message" : "twitter test test test"
         }
      },
      {
         "_index": "twitter",
         "doc" : {
           "text" : "Jane Doe",
           "message" : "Another twitter test ..."
         }
      }
   ]
}'

返回值为:

{
  "docs" : [
    {
      "_index" : "twitter",
      "_type" : "_doc",
      "_version" : 0,
      "found" : true,
      "took" : 0,
      "term_vectors" : {
        "text" : {
          "field_statistics" : {
            "sum_doc_freq" : 6,
            "doc_count" : 2,
            "sum_ttf" : 8
          },
          "terms" : {
            "doe" : {
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 1,
                  "start_offset" : 5,
                  "end_offset" : 8
                }
              ]
            },
            "john" : {
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 0,
                  "start_offset" : 0,
                  "end_offset" : 4
                }
              ]
            }
          }
        }
      }
    },
    {
      "_index" : "twitter",
      "_type" : "_doc",
      "_version" : 0,
      "found" : true,
      "took" : 0,
      "term_vectors" : {
        "text" : {
          "field_statistics" : {
            "sum_doc_freq" : 6,
            "doc_count" : 2,
            "sum_ttf" : 8
          },
          "terms" : {
            "doe" : {
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 1,
                  "start_offset" : 5,
                  "end_offset" : 8
                }
              ]
            },
            "jane" : {
              "term_freq" : 1,
              "tokens" : [
                {
                  "position" : 0,
                  "start_offset" : 0,
                  "end_offset" : 4
                }
              ]
            }
          }
        }
      }
    }
  ]
}

也可以关注我的公众号:程序之声
图片
关注公众号,领取更多资源

本文为博主原创文章,未经博主允许不得转载。

评论

发表评论