S3

S3 Connector #

Register S3 Connector #

curl -XPUT "http://localhost:9000/connector/s3?replace=true" -d '{
  "name": "S3 Object Storage Connector",
  "description": "Fetch file metadata from S3 cloud storage.",
  "category": "cloud_storage",
  "icon": "/assets/icons/connector/s3/icon.png",
  "tags": [
    "s3",
    "storage"
  ],
  "url": "http://coco.rs/connectors/s3",
  "assets": {
    "icons": {
      "default": "/assets/icons/connector/s3/icon.png"
    }
  },
  "processor": {
    "enabled": true,
    "name": "s3"
  }
}'

Use s3 as a unique identifier, as it is a builtin connector.

Use the S3 Connector #

The S3 Connector allows you to index data from your S3 service into your system. Follow these steps to set it up:

Configure S3 Client #

To configure your S3 connection, you’ll need to provide several key parameters. This setup allows your application to securely authenticate with and access objects within a specified S3 bucket.

First, you’ll need your S3 credentials:

access_key_id: This is your public identifier, similar to a username.

secret_access_key: This is your confidential key, like a password, used to sign requests and prove your identity. Keep this highly secure.

Next, specify the location of your data:

bucket: This is the name of the S3 bucket where your objects are stored and which you intend to interact with.

endpoint: This is the URL of your S3 service. It could be an AWS S3 endpoint (e.g., s3.us-west-2.amazonaws.com) or the address of another S3-compatible service (e.g., a MinIO instance).

Finally, you have a few optional parameters to fine-tune data access:

use_ssl: A boolean flag (defaults to true) to enable or disable SSL/TLS (HTTPS) for secure communication. It’s strongly recommended to keep this enabled.

prefix: A string that acts as a filter. If provided, the system will only process objects whose keys (paths) start with this string. For instance, a prefix of documents/ would limit access to files within that “folder.”

extensions: An array of strings defining specific file extensions to include (e.g., [“pdf”, “docx”]). If this list is empty or omitted, all file types in the specified path will be considered.

Datasource Configuration #

Each datasource has its own sync configuration and S3 settings:

curl -H 'Content-Type: application/json' -XPOST "http://localhost:9000/datasource/" -d '{
    "name": "My S3 Documents",
    "type": "connector",
    "enabled": true,
    "connector": {
        "id": "s3",
        "config": {
            "access_key_id": "your-access_key_id",
            "secret_access_key": "your-secret_access_key",
            "bucket": "your-bucket",
            "endpoint": "your-s3-service-endpoint",
            "use_ssl": true,
            "prefix": "",
            "extensions": ["pdf", "docx", "txt"]
        }
    },
    "sync": {
        "enabled": true,
        "interval": "5m"
    }
}'

Supported Config Parameters for S3 Connector #

Below are the configuration parameters supported by the S3 Connector:

Datasource Config Parameters #

FieldTypeDescription
access_key_idstringYour S3 access key ID (required).
secret_access_keystringYour S3 secret access key (required).
bucketstringThe name of the S3 bucket to index (required).
endpointstringThe S3 service endpoint URL (required).
use_sslboolWhether to use SSL for the connection. Defaults to true.
prefixstringA prefix to filter objects in the bucket (e.g., documents/). Optional.
extensions[]stringAn array of file extensions to include (e.g., ["pdf", "docx"]). If omitted or empty, all files will be indexed.
sync.enabledbooleanEnable/disable syncing for this datasource.
sync.intervalstringSync interval for this datasource (e.g., “5m”, “1h”, “30s”).
Edit Edit this page