OpenSearch Search Pipelines: Transforming Results Without Changing Your Clients

Most OpenSearch users think of search as a single request-response cycle: you send a query, OpenSearch executes it, and you get results back. But what if you could intercept and modify that journey at multiple points along the way? What if you could inject custom logic between the query execution and the response formatting, without changing a single line of client code?

That is exactly what OpenSearch Search Pipelines enable. Introduced as a framework for chaining request and response processors, search pipelines let you transform queries before they hit the index and reshape results before they reach your application. In this post, I will walk you through how they work, when to use them, and how to build your own custom processor.

Why Search Pipelines Exist

Before pipelines, if you needed to modify search behavior, you had a few options: rewrite the query client-side, use a plugin that hooks into the search flow, or abuse script fields and runtime fields for post-processing. Each approach had trade-offs. Client-side changes required updating every application that talked to OpenSearch. Plugin development meant writing Java, building a .zip artifact, and installing it on every node. Runtime fields were limited to script-based transformations and ran at query time with performance overhead.

Search pipelines provide a middle ground: a declarative, configurable way to transform search requests and responses without writing a full plugin or touching client code. They are designed for common operational patterns:

Injecting default filters or boosting rules across all queries
Removing or masking sensitive fields from results (PII filtering)
A/B testing different ranking strategies by routing queries to different pipelines
Re-ranking results using an external model or custom scoring logic
Adding query metadata or telemetry without client awareness

The key insight is that pipelines are registered at the index level and selected at query time. This means you can have multiple pipelines for the same index and choose which one to apply based on your application context.

Pipeline Architecture: Three Types of Processors

A search pipeline is a named, ordered sequence of processors. Each processor is a unit of transformation that operates on one of three stages:

1. Request Processors

Request processors run after the query is parsed but before it is executed against the shards. They can modify the query structure, add filters, change sorting, or inject parameters. This is the ideal place for:

Adding tenant-aware filters in multi-tenant applications
Injecting time-range filters for time-series data
Applying default boost values based on user context
Rewriting query types for backward compatibility

2. Response Processors

Response processors run after the shards return results but before the final response is serialized and sent to the client. They operate on the aggregated SearchResponse and can:

Remove or mask fields from the _source documents
Re-rank hits based on external signals
Add computed metadata to each hit
Truncate or paginate results differently
Filter out results based on post-processing logic

3. Search Ext Processors

Search extension processors (SearchExt) are the most powerful and least understood. They allow you to add custom sections to the search request body that are processed by your plugin. Think of them as custom DSL elements that only your processor understands. This is where you can implement:

Custom A/B testing parameters
External model invocation for neural re-ranking
Complex feature injection that does not fit standard query structures

How Pipelines Are Defined and Applied

Pipelines are defined using the Search Pipeline API and stored in the cluster state. Here is what a simple pipeline definition looks like:

PUT /_search/pipeline/my-pipeline
{
  "description": "Mask sensitive fields and boost recent documents",
  "request_processors": [
    {
      "filter_query": {
        "query": {
          "range": {
            "timestamp": {
              "gte": "now-30d"
            }
          }
        }
      }
    }
  ],
  "response_processors": [
    {
      "rename_field": {
        "field": "internal_id",
        "target_field": "id"
      }
    }
  ]
}

To apply this pipeline to a search request, you simply add it to the query:

POST /my-index/_search?search_pipeline=my-pipeline
{
  "query": {
    "match": {
      "title": "OpenSearch"
    }
  }
}

You can also set a default pipeline for an index:

PUT /my-index/_settings
{
  "index": {
    "search.default_pipeline": "my-pipeline"
  }
}

When a default pipeline is set, it applies to all search requests on that index unless the request explicitly overrides it with search_pipeline=none.

Building a Custom Processor: The Java Side

While OpenSearch ships with several built-in processors (filter_query, rename_field, override, script), the real power comes from building custom ones. As someone who has worked on OpenSearch plugins, I can tell you that the processor API is significantly more approachable than writing a full SearchPlugin from scratch.

A custom processor requires three components:

1. The Processor Class

This implements the transformation logic. For a response processor, you extend SearchResponseProcessor:

public class PiiMaskingProcessor implements SearchResponseProcessor {
    private final String field;
    private final String mask;

    public PiiMaskingProcessor(String field, String mask) {
        this.field = field;
        this.mask = mask;
    }

    @Override
    public SearchResponse processResponse(SearchRequest request, SearchResponse response) {
        // Iterate through hits and mask sensitive fields
        SearchHit[] hits = response.getHits().getHits();
        for (SearchHit hit : hits) {
            Map<String, Object> source = hit.getSourceAsMap();
            if (source != null && source.containsKey(field)) {
                source.put(field, mask);
            }
        }
        return response;
    }

    @Override
    public String getType() {
        return "pii_mask";
    }
}

2. The Factory

The factory creates processor instances from the JSON configuration:

public class PiiMaskingProcessorFactory implements Processor.Factory<SearchResponseProcessor> {
    @Override
    public SearchResponseProcessor create(
        Map<String, Processor.Factory<SearchResponseProcessor>> factories,
        String tag,
        String description,
        boolean ignoreFailure,
        Map<String, Object> config
    ) {
        String field = (String) config.get("field");
        String mask = (String) config.getOrDefault("mask", "***MASKED***");
        return new PiiMaskingProcessor(field, mask);
    }
}

3. Plugin Registration

In your plugin's SearchPlugin implementation, register the processor factory:

@Override
public Map<String, Processor.Factory<SearchResponseProcessor>> getSearchResponseProcessors(
    Parameters parameters
) {
    Map<String, Processor.Factory<SearchResponseProcessor>> processors = new HashMap<>();
    processors.put("pii_mask", new PiiMaskingProcessorFactory());
    return processors;
}

That is it. No custom REST endpoints, no transport actions, no cluster state manipulation. OpenSearch handles the pipeline execution, and your processor is called at the right point in the flow.

Real-World Use Cases

Let me share three patterns I have seen work well in production:

Use Case 1: Multi-Tenant Data Isolation

In a multi-tenant application, you need to ensure users only see documents belonging to their organization. Instead of every client query including a tenant_id filter, you can create a request processor that injects it automatically:

{
  "filter_query": {
    "query": {
      "term": {
        "tenant_id": "{{_user.tenant_id}}"
      }
    }
  }
}

Combined with the security plugin's user context, this ensures no query can ever bypass tenant isolation, even if the client forgets to include the filter.

Use Case 2: PII Filtering for Different User Roles

Not every user should see all fields. A response processor can conditionally remove fields based on the caller's role:

Regular users see: name, email, department
Managers see: name, email, department, salary_band
Admins see: all fields including ssn, home_address

This is implemented by checking the authenticated user's roles in the request context and applying a field-level filter in the response processor.

Use Case 3: A/B Testing Ranking Strategies

You can define two pipelines with different query boosts or re-ranking logic, then route a percentage of traffic to each:

Pipeline ranking-v1: standard BM25 with date decay
Pipeline ranking-v2: BM25 + custom field boost + click-through rate signal

Your application randomly selects the pipeline per request, and you measure the click-through rate difference. No application code changes are needed to switch between strategies.

Performance Considerations

Search pipelines add overhead, but it is usually minimal compared to the query execution itself. Request processors run before the scatter-gather phase, so they only execute once per query, not per shard. Response processors run after the coordinating node merges results, so they only see the final size number of hits, not every match.

That said, there are pitfalls to avoid:

Deep response processing: If your response processor iterates through all aggregation buckets or performs external API calls, latency will spike. Keep response processors lightweight.
Complex request rewrites: Request processors that drastically expand query complexity can increase shard execution time. Profile your queries before and after pipeline application.
Memory pressure: Processors that materialize large result sets in memory can cause heap pressure. Be especially careful with size values above 1000 when using response processors.

Limitations and Gotchas

Search pipelines are powerful, but they are not a replacement for every plugin use case. Here is where they fall short:

No custom query types: You cannot use a pipeline to add a new query DSL element that Lucene understands. For that, you still need a SearchPlugin with a custom QuerySpec.
No shard-level interception: Processors run at the coordinating node level. If you need to modify how individual shards execute queries, you need a different extension point.
No transport layer access: Pipelines operate at the REST/response layer. They cannot intercept or modify inter-node communication.
Order matters: Processors execute in the order they are defined. If you have a filter_query processor followed by a script processor that expects the original query structure, you may get unexpected results.

When to Use Pipelines vs Plugins vs Client Logic

Use Case	Search Pipeline	Full Plugin	Client Logic
Add default filters	Perfect	Overkill	Fragile
Mask sensitive fields	Perfect	Overkill	Risky
A/B test ranking	Perfect	Possible	Complex
Custom query DSL	Not possible	Required	Workaround
Shard-level optimization	Not possible	Required	Not possible
External API calls per hit	Careful	Better	Natural
Complex multi-request workflows	Not possible	Possible	Natural

The Bottom Line

Search pipelines are one of the most practical additions to OpenSearch in recent releases. They fill a gap between "configure it in the query" and "write a Java plugin." For operational concerns like filtering, masking, and result transformation, they are the right tool for the job.

If you are running OpenSearch 2.9 or later, you already have access to the built-in processors. If you need something custom, the processor API is a gentle introduction to OpenSearch plugin development - much gentler than writing a full SearchPlugin or ActionPlugin from scratch.

Start with a simple response processor that removes an internal field from your results. Once you see how cleanly it integrates, you will find yourself reaching for pipelines more often than you expect.

I'm Prithvi S, Staff Software Engineer at Cloudera and Open-source enthusiast. I work on OpenSearch plugins and search relevance. Follow my work on GitHub: https://github.com/iprithv

OpenSearch Search Pipelines: Transforming Results Without Changing Your Clients

Why Search Pipelines Exist

Pipeline Architecture: Three Types of Processors

1. Request Processors

2. Response Processors

3. Search Ext Processors

How Pipelines Are Defined and Applied

Building a Custom Processor: The Java Side

1. The Processor Class

2. The Factory

3. Plugin Registration

Real-World Use Cases

Use Case 1: Multi-Tenant Data Isolation

Use Case 2: PII Filtering for Different User Roles

Use Case 3: A/B Testing Ranking Strategies

Performance Considerations

Limitations and Gotchas

When to Use Pipelines vs Plugins vs Client Logic

The Bottom Line

Tags

Author

Stats

Published

You Might Also Like

Reciprocal Rank Fusion on free Elasticsearch: licensing, workarounds, and the OpenSearch alternative

Agentic RAG with OpenSearch Serverless: Anatomy of a Pattern

How I Added Typo-Tolerant Video Title Search With OpenSearch and CJK

Scalable User Search with Amazon Cognito: A Deep-Dive Analysis

Agentic RAG on AWS: Architecture Bake-Off for Financial-Grade Platforms