Skip to Content

OpenSearch

Open-source search and analytics engine. Use Zeotap to bulk-index documents into OpenSearch indices, keeping your search data in sync with your warehouse.

Prerequisites

  • An OpenSearch cluster (version 1.x or later) accessible over HTTPS
  • An API key or username/password credentials with write permissions on the target index
  • The cluster endpoint URL (e.g., https://my-cluster.us-east-1.es.amazonaws.com)

Authentication

OpenSearch supports two authentication methods. Choose the one that matches your cluster configuration.

API Key

FieldTypeRequiredDescription
API KeyPasswordYesBase64-encoded API key (the encoded value from the Create API Key response)

To create an API key:

  1. Open OpenSearch Dashboards and navigate to Security > Auth Tokens
  2. Click Create API key (requires the Security plugin to be enabled)
  3. Assign a name and set the appropriate permissions (at minimum, write access on the target index)
  4. Copy the encoded value from the response — this is your API key

Basic Auth

FieldTypeRequiredDescription
UsernameTextYesOpenSearch username
PasswordPasswordYesOpenSearch password

Use the credentials configured in your OpenSearch Security plugin. For Amazon OpenSearch Service, use the master user credentials or fine-grained access control credentials.

Configuration

FieldTypeRequiredDescription
Endpoint URLTextYesThe OpenSearch cluster endpoint URL. Must start with http:// or https://

Target Settings

FieldTypeRequiredDescription
Index NameTextYesThe OpenSearch index to write documents to. If the index does not exist, OpenSearch auto-creates it on the first write

Supported Operations

Sync Modes

ModeSupportedDescription
UpsertYesCreates new documents or fully replaces existing ones (uses the OpenSearch index action)
InsertYesCreates new documents only; fails if a document with the same ID already exists (uses the create action)
UpdateYesPartially updates existing documents (uses the update action with doc merge)
MirrorNot supported

Audience Sync Modes

OpenSearch does not support audience sync modes. It has no list or segment membership API.

Features

  • Field Mapping: Yes — map source columns to OpenSearch document fields
  • Schema Introspection: No — OpenSearch indices accept dynamic mappings

Required Mapping Fields

There are no strictly required mapping fields. However, mapping a field to _id is strongly recommended for upsert and update modes so that Zeotap can address specific documents.

Default Destination Fields

FieldTypeDescription
_idstringOpenSearch document ID. If mapped, used as the document _id for upserts and updates

How It Works

Zeotap writes data to OpenSearch using the Bulk API :

  1. Rows from the sync batch are converted to NDJSON (newline-delimited JSON) format
  2. Each row becomes a two-line pair: an action/metadata line and a document body line
  3. The action type depends on the sync mode:
    • Upsert: {"index": {"_index": "my-index", "_id": "doc-123"}} followed by the full document
    • Insert: {"create": {"_index": "my-index", "_id": "doc-123"}} followed by the full document
    • Update: {"update": {"_index": "my-index", "_id": "doc-123"}} followed by {"doc": {...}}
  4. Rows are sent in chunks of 500 documents per _bulk request
  5. The Content-Type header is set to application/x-ndjson
  6. Each chunk is sent with automatic retry on transient errors (429 Too Many Requests, 5xx)

Response Handling

The Bulk API returns per-item status in its response. Zeotap inspects each item:

  • 2xx status: Document indexed successfully
  • 4xx status: Permanent failure (e.g., mapping conflict, document already exists in insert mode). The row is marked as failed with the OpenSearch error type and reason.
  • 5xx status: Transient failure. The entire chunk is retried with exponential backoff.

Rate Limits

OpenSearch does not impose fixed rate limits at the API level. Instead, each shard has a configurable number of bulk request slots. When all slots are full, the cluster returns HTTP 429 (Too Many Requests).

Zeotap handles 429 responses with exponential backoff and automatic retry (up to 3 retries per chunk).

  • Small documents (< 1 KB): 500–2,000 documents per request
  • Medium documents (1–10 KB): 200–500 documents per request
  • Large documents (> 10 KB): 50–200 documents per request

Zeotap uses a default chunk size of 500 documents, which works well for typical use cases.

Best Practices

  • Map _id explicitly for upsert and update modes. Without a document ID, OpenSearch auto-generates one, making updates impossible.
  • Create the index with explicit mappings before the first sync. While OpenSearch auto-creates indices with dynamic mapping, explicit mappings give you control over field types and analyzers.
  • Use Basic Auth or API key authentication depending on your cluster setup. For Amazon OpenSearch Service, use fine-grained access control with a dedicated user for sync operations.
  • Monitor cluster health during large syncs. The Bulk API can put significant load on the cluster, especially with large documents or high throughput.
  • Use upsert mode for most use cases. It is the most forgiving — it creates documents that don’t exist and replaces those that do.
  • Avoid insert mode unless you specifically need uniqueness enforcement. Insert mode fails if the document already exists, which can cause high failure rates on re-syncs.

Troubleshooting

Authentication failed (401)

Verify your API key or username/password are correct. For API keys, ensure you are using the encoded value (Base64-encoded), not the raw id or api_key fields separately. For Amazon OpenSearch Service, verify that your fine-grained access control credentials are correct and the master user has not been changed.

Forbidden (403)

The authenticated user or API key lacks the required permissions. Ensure the credentials have write privileges on the target index. For Amazon OpenSearch Service, check the access policy and fine-grained access control role mappings.

Index not found (404)

If using insert or update mode, the index must exist before writing. Create the index manually or switch to upsert mode, which triggers auto-creation.

Mapper parsing exception

A field value does not match the index mapping. For example, sending a string to a field mapped as integer. Check the index mapping and ensure your source data types are compatible. Consider using explicit field mappings in Zeotap to cast or rename fields.

Version conflict engine exception

This occurs in update mode when there is a concurrent write to the same document. Zeotap retries these automatically. If conflicts persist, check for other processes writing to the same index.

Circuit breaker exception

The cluster is running low on memory. Reduce the sync batch size, add more nodes to the cluster, or increase the JVM heap size. This is a cluster-capacity issue, not a Zeotap issue.

Connection timeout

The OpenSearch cluster is unreachable or slow to respond. Verify the endpoint URL, check network connectivity, and ensure the cluster is healthy. For Amazon OpenSearch Service, verify the domain has not been paused or deleted, and that VPC security groups allow inbound traffic on port 443.

Too many requests (429)

The cluster’s bulk queue is full. Zeotap retries with exponential backoff automatically. If 429 errors persist, consider reducing the sync frequency, increasing the cluster’s thread pool queue size, or scaling the cluster.

Last updated on