Originally published on graycloudarch.com.
The workload account had passed every review. Provisioned with the same VPC module we'd used for six months. All defaults. No customizations needed.
Three months later, an audit flagged it: traffic from that account was bypassing the centralized inspection VPC. The Network Firewall wasn't seeing it. Direct path out through an internet gateway the module had created by default.
No error. No alert. The module did exactly what it was designed to do. We just hadn't designed it for this context.
That account had an IGW it never needed, because nobody explicitly told it not to create one.
The natural instinct, and where it breaks
The pull toward "batteries included" modules makes sense early. Network module creates VPC, subnets, IGW, NAT gateways, route tables — all of it. For a single-account setup, that's convenient.
The problem appears by account three, where some VPCs should have IGWs and some shouldn't. By account six — workload VPCs routing through a hub, an inspection VPC that owns the IGW and NAT, a sandbox account with direct access — you're forking modules, adding count = 0 overrides at the call site, or writing if/else logic at every deployment root. Each workaround is a signal that the module wasn't designed for multiple contexts.
The fix is a design rule: if a resource is not universally needed, its creation variable defaults to false. The caller opts in explicitly.
variable "create_internet_gateway" {
description = "Create an IGW and default route in the public route table. Defaults to false because
workload VPCs use hub-and-spoke routing through the centralized inspection VPC for all egress."
type = bool
default = false
}
variable "create_nat_gateway" {
description = "Create NAT Gateways for private subnet egress. Defaults to false for hub-and-spoke
VPCs where egress routes through the TGW to the centralized egress/inspection VPC."
type = bool
default = false
}
variable "create_public_subnets" {
description = "Create public subnets, route table, and route table associations. Defaults to false
for hub-and-spoke design. Set to true only for hub VPCs that own an IGW."
type = bool
default = false
}
The description isn't documentation for its own sake. It explains why the default is false — the specific architectural constraint that makes true wrong for most callers. When someone reads it at the call site, they know whether their context matches the assumption.
What the call sites look like
The hub VPC — the inspection VPC that owns the Network Firewall — explicitly opts in. Workload VPCs call the module with no overrides:
module "inspection_vpc" {
source = "../../../..//common/modules/network"
name = "inspection"
vpc_cidr = "10.0.0.0/22"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
# These are true because this is the hub — explicit opt-in
create_internet_gateway = true
create_public_subnets = true
create_nat_gateway = true
}
module "workload_vpc" {
source = "../../../..//common/modules/network"
name = "workloads-prod"
vpc_cidr = "10.1.0.0/22"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
# No opt-ins needed — all defaults correct for hub-and-spoke
}
The workload_vpc call is safe to copy-paste for any new workload account. The security-relevant decisions are in the module, not scattered across caller configurations.
The resource count gate
Conditional creation only works if every resource that depends on the gated resource is also gated:
resource "aws_internet_gateway" "this" {
count = var.create_internet_gateway ? 1 : 0
vpc_id = aws_vpc.this.id
tags = merge(local.common_tags, {
Name = "${var.name}-igw"
})
}
# Routes that depend on the IGW must also be gated
resource "aws_route" "public_internet" {
count = var.create_internet_gateway ? 1 : 0
route_table_id = aws_route_table.public[0].id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this[0].id
}
Outputs have the same requirement:
output "internet_gateway_id" {
value = var.create_internet_gateway ? aws_internet_gateway.this[0].id : null
}
A plan for a workload VPC shows zero IGW-related changes. Not suppressed — genuinely not there. The module doesn't create it, reference it, or output it.
The same pattern, applied everywhere
Networking is the clearest example because the security stakes are visible, but the principle applies to every module type:
ALB module:
-
enable_deletion_protection = false— dev environments don't need it; prod opts in -
enable_access_logs = false— caller enables when the S3 bucket for logs is ready -
enable_https_redirect = false— explicit, not assumed; avoids broken behavior on internal ALBs
Security baseline module:
-
enable_guardduty = false,enable_security_hub = false,enable_config = false - One module, two contexts: the bootstrap account enables everything; sandbox accounts enable nothing
- Without this: you're writing conditional logic at the call site for every new account type
Observability baseline:
-
enable_cloudwatch_alarms = false,enable_container_insights = false - Nonprod may or may not want alarms — the caller decides, not the module author
The pattern: if a resource is conditional on the deployment context, the module expresses that conditionality as a boolean defaulting to false.
When to break it
Not every variable is a gate on resource creation. The rule doesn't apply to:
Configuration variables with opinionated defaults. instance_type = "t3.medium" should default to a sensible value. The question isn't "should we create this?" — the resource always exists, you're just setting its properties.
Required inputs with no safe default. vpc_cidr shouldn't have a default at all. Force the caller to declare it explicitly. A missing required input surfaces immediately; a wrong default doesn't.
Resources that must exist for the module to function. The VPC itself isn't gated — if the module is called, a VPC is created. If a resource is that foundational, don't hide it behind a boolean.
The line: create_* and enable_* variables gate resource existence. Configuration variables set properties of resources that always exist. Required inputs have no default.
What the audit actually fixed
The inspection gap in that workload account had existed for months. The fix was changing the module default to false and re-applying across all accounts.
Because every other resource in the module was already following this pattern, the re-apply was clean. Zero unexpected changes on correctly-configured accounts — which is the second-order effect of this design rule: the module becomes safe to re-apply.
When everything that shouldn't exist defaults to not existing, terraform plan on a correctly-configured account comes back empty. That emptiness is a signal you can rely on. It means the module isn't hiding state you didn't ask for.
That's harder to achieve if you're starting from "batteries included" defaults and trying to carve out exceptions. It's straightforward if you start from false and require callers to opt in.
Standardizing Terraform module design across multiple accounts and environments — or inheriting a module library where the defaults aren't working in your favor? This is one of the first patterns I help teams establish. Get in touch.













