Document Embedding

Document Embedding Processor #

Generates vector embeddings for a document’s text chunks using an embedding model to enable semantic search and retrieval.

Requirements #

A configured embedding model provider is required. Set model_provider and model in the processor config, or configure a default embedding model in the application settings.

The processor always produces vectors with a fixed dimension of 1024, which is the only dimension supported by Coco semantic search. For OpenAI-compatible providers the processor requests 1024 dimensions from the API automatically. For other providers, such as Ollama, the configured model itself must output 1024-dimensional vectors; otherwise the document is passed through without embeddings and its chunks will not appear in semantic search results.

Configuration #

ParameterTypeRequiredDefaultDescription
message_fieldstringNomessagesPipeline context key for the input messages
output_queueobjectNonullQueue to push processed documents to
model_providerstringNo(app default)Embedding model provider ID
modelstringNo(app default)Embedding model name

Example #

- document_embedding:
    model_provider: openai
    model: text-embedding-3-small
    output_queue:
      name: "documents_embedded"
Edit Edit this page