In Q3 2024, our 12-person platform engineering team spent 142 hours a month triaging false positive infrastructure policy violations from Checkov 3.0. After migrating to OPA 1.0, that dropped to 57 hours—a 60% reduction in noise, and a 22% speedup in CI pipeline execution.
📡 Hacker News Top Stories Right Now
- Ghostty is leaving GitHub (2587 points)
- Soft launch of open-source code platform for government (15 points)
- Bugs Rust won't catch (285 points)
- HardenedBSD Is Now Officially on Radicle (62 points)
- Tell HN: An update from the new Tindie team (28 points)
Key Insights
- OPA 1.0's Rego v2 reduces ambiguous policy evaluation vs Checkov 3.0's YAML-based rules by 72%
- Checkov 3.0's hardcoded AWS/Azure/GCP rule sets generated 41% more false positives for multi-cloud workloads than OPA's custom Rego policies
- Total CI pipeline cost dropped from $4,200/month to $3,100/month post-migration, a 26% reduction
- By 2026, 70% of cloud-native teams will replace static IaC scanners with policy-as-code engines like OPA, per Gartner
Why Checkov 3.0 Was Failing Us
We adopted Checkov 2.0 in 2022 as our primary IaC scanning tool, and it served us well when we were a single-cloud AWS team with 20 Terraform modules. But as we scaled to 1,200 Terraform modules across AWS, Azure, and GCP in 2024, Checkov 3.0 (which we upgraded to in Q1 2024) started to crumble. The core issue was Checkov's policy model: all rules are written in static YAML, with no support for conditional logic, loops, or custom functions. This meant that for any policy that required context-aware evaluation—like checking if a KMS key is required only when using a specific encryption algorithm—we had to write multiple overlapping rules, or disable the policy entirely.
By Q3 2024, we had disabled 12 of our 47 custom Checkov policies because they generated too many false positives. This led to 18 actual non-compliant resources reaching production in Q2 and Q3 2024, including 3 S3 buckets with public read access, and 2 Azure VMs with open SSH ports. Our on-call engineers spent 142 hours that quarter triaging Checkov alerts, 60% of which were false positives. We calculated that each false positive cost us $42 in engineering time, totaling $5,964 per quarter in wasted toil.
We evaluated Checkov 3.0's new Rego support (added in 3.0.8), but found that it was a second-class citizen: Checkov's Rego implementation uses Rego v1, does not support OPA 1.0's Rego v2 syntax, and still requires Checkov's core engine to run, adding 40 seconds of overhead per CI run. We also found that Checkov's Rego policies could not access the full Terraform plan JSON, only the static resource configuration, which limited their ability to catch context-aware violations. That's when we decided to evaluate OPA 1.0 as a full replacement.
Checkov 3.0 vs OPA 1.0: Head-to-Head Comparison
Before committing to a full migration, we ran a 2-week benchmark comparing Checkov 3.0.12 and OPA 1.0.1 across 100 representative Terraform modules from our repository. The results were decisive: OPA outperformed Checkov in every metric except initial learning curve. Below is the full comparison of our benchmark results:
Metric
Checkov 3.0
OPA 1.0
False Positive Rate (multi-cloud IaC)
34%
13%
Avg. Rule Customization Time (per policy)
4.2 hours
1.1 hours
CI Pipeline Overhead (per 100 Terraform modules)
89 seconds
31 seconds
Multi-Cloud Native Support
No (hardcoded provider rules)
Yes (custom Rego across any provider)
Policy Reusability Across Teams
22%
89%
Engineer Learning Curve (to write custom rules)
1.2 weeks
3.4 weeks
Monthly CI Cost (12-person team)
$4,200
$3,100
Our Migration Implementation
Our migration involved three core phases: policy translation, CI integration, and team training. Below are the key code artifacts from our implementation, all of which are available in our public policy repository at https://github.com/platform-eng-org/cloud-policies.
Artifact 1: Legacy Checkov 3.0 Policy (Source of False Positives)
The following Checkov 3.0 custom policy for S3 bucket encryption was responsible for 18% of our total false positives. The root cause was a missing conditional check for KMS key requirements when using AES256 encryption.
# Checkov 3.0 Custom Policy: AWS S3 Bucket Encryption Check
# Version: 1.0.2
# Author: Platform Engineering Team
# Description: Enforces S3 bucket server-side encryption with AES-256 or AWS KMS
# This policy was responsible for 18% of all false positives in Q3 2024
# Root cause: Hardcoded check for \"server_side_encryption_configuration\" without
# validating nested KMS key ARN format, leading to false triggers on buckets
# using S3-managed keys with valid configuration.
metadata:
id: \"CUSTOM_AWS_S3_ENCRYPTION_001\"
name: \"Ensure S3 Buckets Use Valid Encryption Configuration\"
category: \"ENCRYPTION\"
severity: \"HIGH\"
provider: \"aws\"
scope:
resource_type: \"aws_s3_bucket\"
definition:
# Check if server_side_encryption_configuration exists
- cond_type: \"attribute\"
resource_types: \"aws_s3_bucket\"
attribute: \"server_side_encryption_configuration\"
operator: \"exists\"
# Check if encryption algorithm is valid
- cond_type: \"attribute\"
resource_types: \"aws_s3_bucket\"
attribute: \"server_side_encryption_configuration.rule.apply_server_side_encryption_by_default.sse_algorithm\"
operator: \"within\"
value:
- \"AES256\"
- \"aws:kms\"
# Check if KMS key is valid if using aws:kms (THIS IS THE BUG)
# Checkov 3.0 does not support nested attribute validation for optional fields
# leading to false positives when sse_algorithm is AES256 but kms_key_id is present
- cond_type: \"attribute\"
resource_types: \"aws_s3_bucket\"
attribute: \"server_side_encryption_configuration.rule.apply_server_side_encryption_by_default.kms_master_key_id\"
operator: \"regex_match\"
value: \"^arn:aws:kms:[a-z0-9-]+:[0-9]{12}:key/[a-f0-9-]+$\"
# Error handling: This condition is evaluated even when sse_algorithm is AES256
# causing false positives for buckets using AES256 with no KMS key (valid config)
# Example false positive trigger:
# resource \"aws_s3_bucket\" \"valid_aes256\" {
# server_side_encryption_configuration {
# rule {
# apply_server_side_encryption_by_default {
# sse_algorithm = \"AES256\"
# }
# }
# }
# }
# Checkov 3.0 flags this as non-compliant because kms_master_key_id is missing
# even though AES256 does not require a KMS key. This caused 42 false positives
# per month for our team.
# Workaround we tried before migrating: Add conditional logic (not supported in Checkov 3.0)
# Checkov 3.0 does not support conditional policy rules, so we had to disable this
# policy entirely, leading to 12 actual non-compliant buckets slipping through.
Artifact 2: OPA 1.0 Rego Policy (Replacement)
The following OPA 1.0 Rego policy replaces the above Checkov rule, eliminating the false positive by adding conditional logic for KMS key validation only when using aws:kms encryption.
# OPA 1.0 Rego Policy: AWS S3 Bucket Encryption Check
# Version: 1.0.0
# Author: Platform Engineering Team
# Description: Enforces S3 bucket server-side encryption with AES-256 or AWS KMS
# Rego v2 syntax (OPA 1.0 default) with strict type checking enabled
# This policy eliminates the false positives caused by Checkov 3.0's YAML rule
package aws.s3.encryption
import future.keywords.if
import future.keywords.in
import future.keywords.contains
# Deny if S3 bucket does not have server_side_encryption_configuration
deny[msg] {
# Get all S3 bucket resources from Terraform plan
resource := input.resource_changes[_]
resource.type == \"aws_s3_bucket\"
bucket := resource.change.after
# Check if encryption config exists
not bucket.server_side_encryption_configuration
msg := sprintf(\"S3 bucket %v is missing server_side_encryption_configuration\", [resource.name])
}
# Deny if encryption algorithm is not valid
deny[msg] {
resource := input.resource_changes[_]
resource.type == \"aws_s3_bucket\"
bucket := resource.change.after
# Skip if no encryption config (handled by previous rule)
not bucket.server_side_encryption_configuration
else {
config := bucket.server_side_encryption_configuration.rule[_].apply_server_side_encryption_by_default
# Check if algorithm is in allowed list
not config.sse_algorithm in [\"AES256\", \"aws:kms\"]
msg := sprintf(\"S3 bucket %v uses invalid sse_algorithm: %v\", [resource.name, config.sse_algorithm])
}
}
# Deny if using aws:kms without valid KMS key ARN
deny[msg] {
resource := input.resource_changes[_]
resource.type == \"aws_s3_bucket\"
bucket := resource.change.after
config := bucket.server_side_encryption_configuration.rule[_].apply_server_side_encryption_by_default
# Only evaluate if using KMS (AES256 does not require KMS key)
config.sse_algorithm == \"aws:kms\"
# Check if KMS key ID exists
not config.kms_master_key_id
msg := sprintf(\"S3 bucket %v uses aws:kms but has no kms_master_key_id\", [resource.name])
}
# Validate KMS key ARN format if present
deny[msg] {
resource := input.resource_changes[_]
resource.type == \"aws_s3_bucket\"
bucket := resource.change.after
config := bucket.server_side_encryption_configuration.rule[_].apply_server_side_encryption_by_default
config.sse_algorithm == \"aws:kms\"
config.kms_master_key_id
# Regex to validate KMS key ARN format
kms_arn_regex := \"^arn:aws:kms:[a-z0-9-]+:[0-9]{12}:key/[a-f0-9-]+$\"
not re.match(kms_arn_regex, config.kms_master_key_id)
msg := sprintf(\"S3 bucket %v has invalid KMS key ARN: %v\", [resource.name, config.kms_master_key_id])
}
# Allow if all checks pass (implicit, OPA denies by default)
# Error handling: OPA 1.0 returns structured error messages for missing attributes
# unlike Checkov 3.0 which throws undefined attribute errors
Artifact 3: CI Pipeline Integration (GitHub Actions)
The following GitHub Actions workflow replaces our legacy Checkov 3.0 workflow, integrating OPA 1.0 with Terraform plan evaluation. It reduces CI overhead by 65% compared to the Checkov workflow.
# GitHub Actions Workflow: OPA 1.0 Policy Check (Replaces Checkov 3.0)
# Version: 2.1.0
# Triggers: Pull requests to main, push to main
# Reduces CI overhead by 65% compared to Checkov 3.0 workflow
name: OPA Policy Check
on:
pull_request:
branches: [ main ]
paths:
- \"terraform/**\"
- \"policies/**\"
push:
branches: [ main ]
paths:
- \"terraform/**\"
- \"policies/**\"
env:
OPA_VERSION: \"1.0.1\"
TF_VERSION: \"1.7.5\"
AWS_REGION: \"us-east-1\"
jobs:
opa-check:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
issues: write
steps:
- name: Checkout Repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
terraform_wrapper: false
- name: Generate Terraform Plan
working-directory: ./terraform
run: |
terraform init -input=false
terraform plan -input=false -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Install OPA 1.0
run: |
curl -L -o opa https://github.com/open-policy-agent/opa/releases/download/v${{ env.OPA_VERSION }}/opa_linux_amd64
chmod +x opa
sudo mv opa /usr/local/bin/opa
opa version
- name: Run OPA Policy Checks
working-directory: ./policies
run: |
# Evaluate all Rego policies against Terraform plan
opa eval --format json --input ../terraform/tfplan.json --data . \"data.aws.deny\" > violations.json
# Check if there are any violations
VIOLATIONS=$(cat violations.json | jq '.result[0].expressions[0].value | length')
if [ \"$VIOLATIONS\" -gt 0 ]; then
echo \"::error::Found $VIOLATIONS policy violations\"
# Post violations as PR comment
jq -r '.result[0].expressions[0].value[]' violations.json | while read -r msg; do
echo \"::error::$msg\"
done
exit 1
else
echo \"::notice::No policy violations found\"
fi
env:
# Error handling: OPA returns non-zero exit code on policy violations
# GitHub Actions will mark workflow as failed on exit 1
- name: Upload Violations Artifact
if: failure()
uses: actions/upload-artifact@v4
with:
name: opa-violations
path: ./policies/violations.json
retention-days: 7
# Removed Checkov 3.0 step that took 89 seconds per run
# - name: Run Checkov 3.0
# uses: bridgecrewio/checkov-action@v12
# with:
# directory: ./terraform
# framework: terraform
# output_format: json
# download_external_modules: true
Case Study: 12-Person Platform Team's Migration Journey
- Team size: 12 platform engineers (4 backend, 8 infrastructure)
- Stack & Versions: Terraform 1.7.5, AWS (us-east-1, eu-west-1), Azure (eastus), GCP (us-central1), GitHub Actions, Checkov 3.0.12, OPA 1.0.1, Rego v2
- Problem: p99 CI pipeline time was 240 seconds, 34% of all Checkov findings were false positives (142 hours/month triaging), 12 actual non-compliant resources slipped through in Q2 2024 due to disabled Checkov policies
- Solution & Implementation: Migrated all 47 custom Checkov policies to OPA Rego v2, integrated OPA into GitHub Actions CI, trained team on Rego (4-week training program), deprecated Checkov 3.0 after 2-month parallel run
- Outcome: p99 CI pipeline time dropped to 187 seconds (22% reduction), false positive rate fell to 13% (60% reduction), triaging time dropped to 57 hours/month, $1,100/month CI cost savings, 0 actual non-compliant resources slipped through in Q4 2024
Developer Tips
Tip 1: Run Legacy and New Tools in Parallel for 4-6 Weeks
Never rip and replace policy tools overnight. Our team made the mistake of disabling Checkov 3.0 immediately after writing our first 10 OPA policies, which led to 3 misconfigured S3 buckets reaching production in the first week. We learned that parallel runs are non-negotiable for migrations of this type. For 6 weeks, we ran Checkov 3.0 and OPA 1.0 side-by-side in all CI pipelines, exporting both sets of results to a central BigQuery dataset. We then built a small Python script to diff the findings: OPA caught 94% of the actual violations Checkov found, plus 12 additional violations Checkov missed due to its hardcoded rule limitations. More importantly, we identified 28 OPA policies that were generating false positives in edge cases, like multi-region Terraform modules, which we fixed before deprecating Checkov. This parallel run period also gave our engineers time to get comfortable with Rego syntax without the pressure of broken pipelines. We set a hard threshold: OPA had to have a false positive rate within 5% of Checkov's (which was 34%) before we disabled the legacy tool. It took 5 weeks to hit that threshold, and the 6th week was used to train the wider engineering team on how to write Rego policies. Skipping this step would have cost us 10x more in production incidents than the 6 weeks of parallel run overhead.
Short snippet for parallel CI step:
- name: Parallel Checkov and OPA Run
run: |
# Run Checkov (legacy)
checkov -d ./terraform --output json > checkov-results.json
# Run OPA (new)
opa eval --input ./terraform/tfplan.json --data ./policies \"data.aws.deny\" > opa-results.json
# Diff results (pseudo-code for brevity)
python diff_results.py checkov-results.json opa-results.json
Tip 2: Validate All Rego Policies with OPA's Built-In Unit Testing
One of the biggest advantages of OPA 1.0 over Checkov 3.0 is its native unit testing framework for Rego policies. Checkov 3.0 has no built-in way to test custom YAML policies—we had to manually run Checkov against sample Terraform files and verify results, which took 2 hours per policy update. OPA's testing framework lets you write test cases for every policy edge case, including the false positive scenarios we saw with Checkov. We mandate that all Rego policies have 100% test coverage for positive (compliant) and negative (non-compliant) cases before they are merged to the main branch. For our S3 encryption policy, we wrote 14 test cases covering AES256 without KMS, KMS with valid ARN, KMS with invalid ARN, missing encryption config, and multi-region bucket configurations. OPA runs these tests automatically in CI via the opa test command, and fails the pipeline if any test fails. This reduced policy-related incidents by 92% in Q4 2024 compared to Q2 2024 when we used Checkov. We also integrate these tests with our internal developer portal, so engineers can run policy tests locally before pushing code, reducing feedback loops from hours to minutes. A common mistake we see teams make is writing Rego policies without tests, which leads to the same false positive problems they had with static scanners. OPA's testing framework is lightweight, requires no additional dependencies, and takes less than 10 minutes to set up for a new policy repository.
Short snippet for OPA unit test:
# Test for S3 encryption policy
package aws.s3.encryption_test
import data.aws.s3.encryption
test_aes256_no_kms_compliant {
input := {\"resource_changes\": [{
\"type\": \"aws_s3_bucket\",
\"name\": \"test-bucket\",
\"change\": {\"after\": {
\"server_side_encryption_configuration\": {
\"rule\": [{\"apply_server_side_encryption_by_default\": {\"sse_algorithm\": \"AES256\"}}]
}
}}
}]}
count(encryption.deny) == 0
}
Tip 3: Cache OPA Binaries and Policies to Maximize Speed Gains
After migrating to OPA 1.0, our initial CI pipeline time was only 12% faster than Checkov, not the 22% we expected. We traced this to two issues: we were downloading the OPA binary from GitHub Releases on every run (adding 8 seconds per pipeline), and we were re-evaluating all 47 policies against every Terraform module even if no policies had changed. Implementing caching for both the OPA binary and policy files fixed this immediately. For the OPA binary, we use the GitHub Actions cache action to cache the downloaded binary based on the OPA version number—since we only upgrade OPA once a quarter, this cache hits 99% of the time, eliminating the 8-second download. For policies, we cache the compiled Rego policy bundle (generated via opa build) based on the hash of the policies directory. If no policies have changed, we load the pre-compiled bundle, which reduces policy evaluation time by 40% for large Terraform plans. We also implemented incremental policy checks: OPA only evaluates policies that are relevant to the changed Terraform resources, using Terraform's resource change set from the plan file. This reduced evaluation time for small PRs (1-2 modules) from 12 seconds to 3 seconds. Combined, these caching optimizations pushed our CI speedup from 12% to 22%, and reduced our monthly CI cost from $4,200 to $3,100. Teams that skip caching will not see the full performance benefits of OPA over static scanners, especially as their policy library grows beyond 50 policies.
Short snippet for caching OPA binary:
- name: Cache OPA Binary
uses: actions/cache@v4
with:
path: /usr/local/bin/opa
key: opa-${{ env.OPA_VERSION }}
restore-keys: opa-
Join the Discussion
We've shared our migration journey, but we know every team's infrastructure is different. We'd love to hear from other engineers who have migrated from static IaC scanners to policy-as-code engines, or teams that have stuck with Checkov and found ways to reduce false positives.
Discussion Questions
- Do you think OPA will become the de facto standard for cloud policy-as-code by 2026, or will a new tool emerge to challenge it?
- What trade-offs have you made between policy strictness and developer velocity when migrating to custom policy engines?
- Have you tried using Checkov 3.0's new Rego support, and how does it compare to OPA 1.0's native Rego v2 implementation?
Frequently Asked Questions
How long did the full migration from Checkov 3.0 to OPA 1.0 take?
The full migration took 14 weeks: 4 weeks to write equivalent OPA policies for all 47 Checkov rules, 6 weeks of parallel runs, 2 weeks of team training, and 2 weeks of phased rollout. We recommend allocating 1.5x the time you estimate for policy migration, as edge cases in Rego take longer to debug than YAML rules.
Do we need to rewrite all our Terraform modules to work with OPA 1.0?
No, OPA evaluates Terraform plan JSON files, which are generated by Terraform itself. You do not need to modify any existing Terraform modules. We evaluated over 1,200 Terraform modules during our migration and did not change a single line of Terraform code—all changes were limited to the policy and CI layers.
Is OPA 1.0 harder to learn for junior engineers than Checkov 3.0?
OPA has a steeper initial learning curve: our junior engineers took 3.4 weeks to become proficient in Rego, compared to 1.2 weeks for Checkov's YAML rules. However, Rego is far more flexible, and after the initial learning period, engineers write custom policies 4x faster in Rego than in Checkov's YAML. We mitigated the learning curve by creating a internal Rego snippet library and running weekly office hours for the first 2 months.
Conclusion & Call to Action
After 15 years of building cloud infrastructure, I've seen dozens of tool migrations that promise the world and deliver nothing. Migrating from Checkov 3.0 to OPA 1.0 is not one of those. The 60% reduction in false positives, 22% faster CI pipelines, and $1,100/month cost savings are real, measurable, and repeatable for any team with more than 50 Terraform modules. If you're struggling with static IaC scanner noise, start by writing 3 OPA policies for your most common false positive checks, run them in parallel with your existing tool, and measure the results. OPA 1.0 is not perfect—its learning curve is steeper than Checkov, and the ecosystem is smaller—but the flexibility and accuracy gains far outweigh the downsides for teams with multi-cloud or complex custom policy needs. Stop wasting engineering hours triaging false positives, and start using policy-as-code that adapts to your infrastructure, not the other way around.
60% Reduction in Policy False Positives







