dump_hash #
Description #
The dump_hash processor is used to export index documents of a cluster and calculate the hash value.
Configuration Example #
A simple example is as follows:
pipeline:
- name: bulk_request_ingest
auto_start: true
keep_running: true
processor:
- dump_hash: #dump es1's doc
indices: "medcl-dr3"
scroll_time: "10m"
elasticsearch: "source"
query: "field1:elastic"
fields: "doc_hash"
output_queue: "source_docs"
batch_size: 10000
slice_size: 5
Parameter Description #
Name | Type | Description |
---|---|---|
elasticsearch | string | Name of a target cluster |
scroll_time | string | Scroll session timeout duration |
batch_size | int | Scroll batch size, which is set to 5000 by default |
slice_size | int | Slice size, which is set to 1 by default |
sort_type | string | Document sorting type, which is set to asc by default |
sort_field | string | Document sorting field |
indices | string | Index |
level | string | Request processing level, which can be set to cluster , indicating that node- and shard-level splitting are not performed on requests. It is applicable to scenarios in which there is a proxy in front of Elasticsearch. |
query | string | Query filter conditions |
fields | string | List of fields to be returned |
sort_document_fields | bool | Whether to sort fields in _source before the hash value is calculated. The default value is false . |
hash_func | string | Hash function, which can be set to xxhash32 , xxhash64 , or fnv1a . The default value is xxhash32 . |
output_queue | string | Name of a queue that outputs results |