精确查询与全文检索的对比 #

您可以使用精确查询和全文检索来搜索文本，但精确查询通常用于搜索结构化数据，而全文检索则用于全文搜索。精确查询与全文检索的主要区别在于，精确查询搜索文档中的确切指定词项，而全文检索分析查询字符串。下表总结了精确查询与全文检索之间的差异。

	精确查询	全文检索
描述	精确查询回答哪些文档匹配查询。	全文检索回答文档与查询匹配的程度。
分词器	搜索词没有被分析。这意味着精确查询会按照你输入的搜索词进行搜索。	搜索词由在索引特定文档字段时使用的相同分词器进行分析。这意味着您的搜索词会经历与文档字段相同的分词过程。
相关性	词级查询仅返回匹配的文档，而不根据相关性分数对它们进行排序。它们仍然计算相关性分数，但这个分数是所有返回文档相同的。	全文检索为每个匹配计算相关性分数，并按相关性分数降序对结果进行排序。
用例	当你需要匹配精确值（如数字、日期或标签）且不需要按相关性排序时，使用词级查询。	使用全文检索来匹配文本字段，并根据大小写和词干变化等因素进行相关性排序。

Easysearch 使用 BM25 排序算法来计算相关性分数。欲了解更多信息，请参阅 Okapi BM25。

我应该使用全文检索还是精确查询？ #

为了说明全文检索和精确查询的区别，考虑以下两个搜索特定文本短语的示例。莎士比亚的全部作品在一个 Easysearch 集群中被索引。

示例：短语搜索 #

在这个示例中，你将在 text_entry 字段中搜索莎士比亚的全部作品中短语“To be, or not to be”。

首先，使用精确查询进行此搜索：

GET shakespeare/_search
{
  "query": {
    "term": {
      "text_entry": "To be, or not to be"
    }
  }
}

返回内容不包含任何匹配项，hits 为 0：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

这是因为“To be, or not to be”在倒排索引中是按字面意思进行搜索的，而倒排索引中只存储文本字段的分词后的词项内容。精确查询不适合搜索分词后的文本字段，它们通常会产生意想不到的结果。在处理文本数据时，仅将精确查询用于映射为 keyword 的字段。

现在使用全文检索搜索相同的短语：

GET shakespeare/_search
{
  "query": {
    "match": {
      "text_entry": "To be, or not to be"
    }
  }
}

搜索查询“To be, or not to be”被分词成一个与文档的 text_entry 字段相似的标记数组。全文检索在搜索查询和所有文档的 text_entry 字段之间取标记的交集，然后按相关性得分对结果进行排序：

{
  "took" : 19,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 17.419369,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_id" : "34229",
        "_score" : 17.419369,
        "_source" : {
          "type" : "line",
          "line_id" : 34230,
          "play_name" : "Hamlet",
          "speech_number" : 19,
          "line_number" : "3.1.64",
          "speaker" : "HAMLET",
          "text_entry" : "To be, or not to be: that is the question:"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "109930",
        "_score" : 14.883024,
        "_source" : {
          "type" : "line",
          "line_id" : 109931,
          "play_name" : "A Winters Tale",
          "speech_number" : 23,
          "line_number" : "4.4.153",
          "speaker" : "PERDITA",
          "text_entry" : "Not like a corse; or if, not to be buried,"
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "103117",
        "_score" : 14.782743,
        "_source" : {
          "type" : "line",
          "line_id" : 103118,
          "play_name" : "Twelfth Night",
          "speech_number" : 53,
          "line_number" : "1.3.95",
          "speaker" : "SIR ANDREW",
          "text_entry" : "will not be seen; or if she be, its four to one"
        }
      }
    ]
  }
}
...

示例：精确查询 #

如果你想在 speaker 字段中搜索精确的词“HAMLET”，并且不需要按相关性得分排序结果，那么使用精确查询会更高效：

GET shakespeare/_search
{
  "query": {
    "term": {
      "speaker": "HAMLET"
    }
  }
}

返回内容包含文档匹配：

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1582,
      "relation" : "eq"
    },
    "max_score" : 4.2540946,
    "hits" : [
      {
        "_index" : "shakespeare",
        "_id" : "32700",
        "_score" : 4.2540946,
        "_source" : {
          "type" : "line",
          "line_id" : 32701,
          "play_name" : "Hamlet",
          "speech_number" : 9,
          "line_number" : "1.2.66",
          "speaker" : "HAMLET",
          "text_entry" : "[Aside]  A little more than kin, and less than kind."
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "32702",
        "_score" : 4.2540946,
        "_source" : {
          "type" : "line",
          "line_id" : 32703,
          "play_name" : "Hamlet",
          "speech_number" : 11,
          "line_number" : "1.2.68",
          "speaker" : "HAMLET",
          "text_entry" : "Not so, my lord; I am too much i' the sun."
        }
      },
      {
        "_index" : "shakespeare",
        "_id" : "32709",
        "_score" : 4.2540946,
        "_source" : {
          "type" : "line",
          "line_id" : 32710,
          "play_name" : "Hamlet",
          "speech_number" : 13,
          "line_number" : "1.2.75",
          "speaker" : "HAMLET",
          "text_entry" : "Ay, madam, it is common."
        }
      }
    ]
  }
}
...

精确查询提供精确的词项匹配。因此，如果你搜索“Hamlet”，将不会收到任何匹配结果，因为“HAMLET”是一个关键词字段，它以大小写原文的形式存储在 Easysearch 中，而不是经过分词的形式。搜索查询“HAMLET”也是以原文形式进行搜索。因此，为了匹配这个字段，我们需要输入完全相同大小写一致的字符。