Most OpenSearch users think of search as a single request-response cycle: you send a query, OpenSearch executes it, and you get results back. But what if you could intercept and modify that journey at multiple points along the way? What if you could inject custom logic between the query execution and the response formatting, without changing a single line of client code?
That is exactly what OpenSearch Search Pipelines enable. Introduced as a framework for chaining request and response processors, search pipelines let you transform queries before they hit the index and reshape results before they reach your application. In this post, I will walk you through how they work, when to use them, and how to build your own custom processor.
Why Search Pipelines Exist
Before pipelines, if you needed to modify search behavior, you had a few options: rewrite the query client-side, use a plugin that hooks into the search flow, or abuse script fields and runtime fields for post-processing. Each approach had trade-offs. Client-side changes required updating every application that talked to OpenSearch. Plugin development meant writing Java, building a .zip artifact, and installing it on every node. Runtime fields were limited to script-based transformations and ran at query time with performance overhead.
Search pipelines provide a middle ground: a declarative, configurable way to transform search requests and responses without writing a full plugin or touching client code. They are designed for common operational patterns:
- Injecting default filters or boosting rules across all queries
- Removing or masking sensitive fields from results (PII filtering)
- A/B testing different ranking strategies by routing queries to different pipelines
- Re-ranking results using an external model or custom scoring logic
- Adding query metadata or telemetry without client awareness
The key insight is that pipelines are registered at the index level and selected at query time. This means you can have multiple pipelines for the same index and choose which one to apply based on your application context.
Pipeline Architecture: Three Types of Processors
A search pipeline is a named, ordered sequence of processors. Each processor is a unit of transformation that operates on one of three stages:
1. Request Processors
Request processors run after the query is parsed but before it is executed against the shards. They can modify the query structure, add filters, change sorting, or inject parameters. This is the ideal place for:
- Adding tenant-aware filters in multi-tenant applications
- Injecting time-range filters for time-series data
- Applying default boost values based on user context
- Rewriting query types for backward compatibility
2. Response Processors
Response processors run after the shards return results but before the final response is serialized and sent to the client. They operate on the aggregated SearchResponse and can:
- Remove or mask fields from the
_sourcedocuments - Re-rank hits based on external signals
- Add computed metadata to each hit
- Truncate or paginate results differently
- Filter out results based on post-processing logic
3. Search Ext Processors
Search extension processors (SearchExt) are the most powerful and least understood. They allow you to add custom sections to the search request body that are processed by your plugin. Think of them as custom DSL elements that only your processor understands. This is where you can implement:
- Custom A/B testing parameters
- External model invocation for neural re-ranking
- Complex feature injection that does not fit standard query structures
How Pipelines Are Defined and Applied
Pipelines are defined using the Search Pipeline API and stored in the cluster state. Here is what a simple pipeline definition looks like:
PUT /_search/pipeline/my-pipeline
{
"description": "Mask sensitive fields and boost recent documents",
"request_processors": [
{
"filter_query": {
"query": {
"range": {
"timestamp": {
"gte": "now-30d"
}
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "internal_id",
"target_field": "id"
}
}
]
}
To apply this pipeline to a search request, you simply add it to the query:
POST /my-index/_search?search_pipeline=my-pipeline
{
"query": {
"match": {
"title": "OpenSearch"
}
}
}
You can also set a default pipeline for an index:
PUT /my-index/_settings
{
"index": {
"search.default_pipeline": "my-pipeline"
}
}
When a default pipeline is set, it applies to all search requests on that index unless the request explicitly overrides it with search_pipeline=none.
Building a Custom Processor: The Java Side
While OpenSearch ships with several built-in processors (filter_query, rename_field, override, script), the real power comes from building custom ones. As someone who has worked on OpenSearch plugins, I can tell you that the processor API is significantly more approachable than writing a full SearchPlugin from scratch.
A custom processor requires three components:
1. The Processor Class
This implements the transformation logic. For a response processor, you extend SearchResponseProcessor:
public class PiiMaskingProcessor implements SearchResponseProcessor {
private final String field;
private final String mask;
public PiiMaskingProcessor(String field, String mask) {
this.field = field;
this.mask = mask;
}
@Override
public SearchResponse processResponse(SearchRequest request, SearchResponse response) {
// Iterate through hits and mask sensitive fields
SearchHit[] hits = response.getHits().getHits();
for (SearchHit hit : hits) {
Map<String, Object> source = hit.getSourceAsMap();
if (source != null && source.containsKey(field)) {
source.put(field, mask);
}
}
return response;
}
@Override
public String getType() {
return "pii_mask";
}
}
2. The Factory
The factory creates processor instances from the JSON configuration:
public class PiiMaskingProcessorFactory implements Processor.Factory<SearchResponseProcessor> {
@Override
public SearchResponseProcessor create(
Map<String, Processor.Factory<SearchResponseProcessor>> factories,
String tag,
String description,
boolean ignoreFailure,
Map<String, Object> config
) {
String field = (String) config.get("field");
String mask = (String) config.getOrDefault("mask", "***MASKED***");
return new PiiMaskingProcessor(field, mask);
}
}
3. Plugin Registration
In your plugin's SearchPlugin implementation, register the processor factory:
@Override
public Map<String, Processor.Factory<SearchResponseProcessor>> getSearchResponseProcessors(
Parameters parameters
) {
Map<String, Processor.Factory<SearchResponseProcessor>> processors = new HashMap<>();
processors.put("pii_mask", new PiiMaskingProcessorFactory());
return processors;
}
That is it. No custom REST endpoints, no transport actions, no cluster state manipulation. OpenSearch handles the pipeline execution, and your processor is called at the right point in the flow.
Real-World Use Cases
Let me share three patterns I have seen work well in production:
Use Case 1: Multi-Tenant Data Isolation
In a multi-tenant application, you need to ensure users only see documents belonging to their organization. Instead of every client query including a tenant_id filter, you can create a request processor that injects it automatically:
{
"filter_query": {
"query": {
"term": {
"tenant_id": "{{_user.tenant_id}}"
}
}
}
}
Combined with the security plugin's user context, this ensures no query can ever bypass tenant isolation, even if the client forgets to include the filter.
Use Case 2: PII Filtering for Different User Roles
Not every user should see all fields. A response processor can conditionally remove fields based on the caller's role:
- Regular users see:
name,email,department - Managers see:
name,email,department,salary_band - Admins see: all fields including
ssn,home_address
This is implemented by checking the authenticated user's roles in the request context and applying a field-level filter in the response processor.
Use Case 3: A/B Testing Ranking Strategies
You can define two pipelines with different query boosts or re-ranking logic, then route a percentage of traffic to each:
- Pipeline
ranking-v1: standard BM25 with date decay - Pipeline
ranking-v2: BM25 + custom field boost + click-through rate signal
Your application randomly selects the pipeline per request, and you measure the click-through rate difference. No application code changes are needed to switch between strategies.
Performance Considerations
Search pipelines add overhead, but it is usually minimal compared to the query execution itself. Request processors run before the scatter-gather phase, so they only execute once per query, not per shard. Response processors run after the coordinating node merges results, so they only see the final size number of hits, not every match.
That said, there are pitfalls to avoid:
- Deep response processing: If your response processor iterates through all aggregation buckets or performs external API calls, latency will spike. Keep response processors lightweight.
- Complex request rewrites: Request processors that drastically expand query complexity can increase shard execution time. Profile your queries before and after pipeline application.
-
Memory pressure: Processors that materialize large result sets in memory can cause heap pressure. Be especially careful with
sizevalues above 1000 when using response processors.
Limitations and Gotchas
Search pipelines are powerful, but they are not a replacement for every plugin use case. Here is where they fall short:
-
No custom query types: You cannot use a pipeline to add a new query DSL element that Lucene understands. For that, you still need a
SearchPluginwith a customQuerySpec. - No shard-level interception: Processors run at the coordinating node level. If you need to modify how individual shards execute queries, you need a different extension point.
- No transport layer access: Pipelines operate at the REST/response layer. They cannot intercept or modify inter-node communication.
- Order matters: Processors execute in the order they are defined. If you have a filter_query processor followed by a script processor that expects the original query structure, you may get unexpected results.
When to Use Pipelines vs Plugins vs Client Logic
| Use Case | Search Pipeline | Full Plugin | Client Logic |
|---|---|---|---|
| Add default filters | Perfect | Overkill | Fragile |
| Mask sensitive fields | Perfect | Overkill | Risky |
| A/B test ranking | Perfect | Possible | Complex |
| Custom query DSL | Not possible | Required | Workaround |
| Shard-level optimization | Not possible | Required | Not possible |
| External API calls per hit | Careful | Better | Natural |
| Complex multi-request workflows | Not possible | Possible | Natural |
The Bottom Line
Search pipelines are one of the most practical additions to OpenSearch in recent releases. They fill a gap between "configure it in the query" and "write a Java plugin." For operational concerns like filtering, masking, and result transformation, they are the right tool for the job.
If you are running OpenSearch 2.9 or later, you already have access to the built-in processors. If you need something custom, the processor API is a gentle introduction to OpenSearch plugin development - much gentler than writing a full SearchPlugin or ActionPlugin from scratch.
Start with a simple response processor that removes an internal field from your results. Once you see how cleanly it integrates, you will find yourself reaching for pipelines more often than you expect.
I'm Prithvi S, Staff Software Engineer at Cloudera and Open-source enthusiast. I work on OpenSearch plugins and search relevance. Follow my work on GitHub: https://github.com/iprithv









