Document Summarization

Document Summarization Processor #

Generates an AI summary and structured insights for a document using a language model.

Documents shorter than min_input_document_length or longer than max_input_document_length are skipped.

Requirements #

A configured language model provider is required. Set model_provider and model in the processor config, or configure a default language model in the application settings.

Configuration #

ParameterTypeRequiredDefaultDescription
message_fieldstringNomessagesPipeline context key for the input messages
output_queueobjectNonullQueue to push processed documents to
model_providerstringNo(app default)Language model provider ID
modelstringNo(app default)Language model name
model_context_lengthintNoModel context window size in tokens (minimum 4000)
min_input_document_lengthintNo100Skip documents shorter than this (bytes)
max_input_document_lengthintNo100000Skip documents longer than this (bytes)
ai_insights_max_lengthintNo500Target length for the generated summary (tokens)
llm_generation_langstringNo(app default)BCP 47 language tag for generated content (e.g. en-US, zh-CN)

Example #

- document_summarization:
    model_provider: openai
    model: gpt-4o-mini
    model_context_length: 8000
    output_queue:
      name: "documents_summarized"
Edit Edit this page