修剪词元过滤器 #
修剪(trim
)词元过滤器会从词元中去除前导和尾随的空白字符。
许多常用的分词器,例如标准(
standard
)分词器、关键字(keyword
)分词器和空白(whitespace
)分词器,在分词过程中会自动去除前导和尾随的空白字符。当使用这些分词器时,无需额外配置修剪词元过滤器。
参考样例 #
以下示例请求创建了一个名为 my_pattern_trim_index
的新索引,并配置了一个带有修剪过滤器和匹配词元生成器的分词器。其中,匹配词元生成器不会去除词元的前导和尾随空白字符。
PUT /my_pattern_trim_index
{
"settings": {
"analysis": {
"filter": {
"my_trim_filter": {
"type": "trim"
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": ","
}
},
"analyzer": {
"my_pattern_trim_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"filter": [
"lowercase",
"my_trim_filter"
]
}
}
}
}
}
产生的词元 #
使用以下请求来检查使用该分词器生成的词元:
GET /my_pattern_trim_index/_analyze
{
"analyzer": "my_pattern_trim_analyzer",
"text": " Easysearch , is , powerful "
}
返回内容包含产生的词元
{
"tokens": [
{
"token": "easysearch",
"start_offset": 0,
"end_offset": 12,
"type": "word",
"position": 0
},
{
"token": "is",
"start_offset": 13,
"end_offset": 18,
"type": "word",
"position": 1
},
{
"token": "powerful",
"start_offset": 19,
"end_offset": 32,
"type": "word",
"position": 2
}
]
}