dump_hash

dump_hash #

Description #

The dump_hash processor is used to export index documents of a cluster and calculate the hash value.

Configuration Example #

A simple example is as follows:

pipeline:
- name: bulk_request_ingest
  auto_start: true
  keep_running: true
  processor:
    - dump_hash: #dump es1's doc
        indices: "medcl-dr3"
        scroll_time: "10m"
        elasticsearch: "source"
        query: "field1:elastic"
        fields: "doc_hash"
        output_queue: "source_docs"
        batch_size: 10000
        slice_size: 5

Parameter Description #

Name	Type	Description
elasticsearch	string	Name of a target cluster
scroll_time	string	Scroll session timeout duration
batch_size	int	Scroll batch size, which is set to `5000` by default
slice_size	int	Slice size, which is set to `1` by default
sort_type	string	Document sorting type, which is set to `asc` by default
sort_field	string	Document sorting field
indices	string	Index
level	string	Request processing level, which can be set to `cluster`, indicating that node- and shard-level splitting are not performed on requests. It is applicable to scenarios in which there is a proxy in front of Elasticsearch.
query	string	Query filter conditions
fields	string	List of fields to be returned
sort_document_fields	bool	Whether to sort fields in `_source` before the hash value is calculated. The default value is `false`.
hash_func	string	Hash function, which can be set to `xxhash32`, `xxhash64`, or `fnv1a`. The default value is `xxhash32`.
output_queue	string	Name of a queue that outputs results