Infrastructure as Code
If infrastructure cannot be recreated from code, it is not production infrastructure. IaC is not just automation — it is governance through code.
Why cloud infrastructure fails
Across organisations, infrastructure failures follow the same patterns. The root cause is almost never the cloud provider — it is the absence of engineering discipline around how infrastructure is defined, changed, and verified.
Manual changes made in cloud consoles
Engineers click through consoles to "fix things quickly." Changes are never recorded, never reviewed, and never reproducible. The system slowly diverges from any documented state.
Environments that drift over time
DEV, TEST, and PROD start identical and gradually diverge through undocumented exceptions. By the time bugs appear, no two environments are the same.
Undocumented exceptions during incidents
Emergency changes made under pressure are never reconciled back into code. Every incident leaves behind hidden configuration debt.
No clear ownership of infrastructure changes
Anyone with console access can change anything. Without version control, there is no record of who changed what, when, or why.
Infrastructure recreated differently each time
Each new environment is built from memory, tribal knowledge, or outdated runbooks. No two environments are truly equivalent.
Audits relying on screenshots and tribal knowledge
When auditors ask "what is running and why," the answer is screenshots, conversations, and guesswork. Compliance becomes a reconstruction exercise.
The consequences
- Unpredictable behaviour that cannot be reproduced in lower environments.
- Security gaps from misconfigured resources that nobody owns.
- Compliance exposure when controls cannot be evidenced.
- Failed disaster recovery because infrastructure cannot be recreated.
- Inability to onboard new engineers without significant knowledge transfer.
Infrastructure is software
If it is not versioned, tested, reviewed, and controlled, it is a liability. This principle applies equally to networks, compute, storage, identity, security policies, and monitoring.
IaC is not just automation — it is governance through code.
What it actually requires
- Infrastructure definitions are source-controlled like application code.
- Changes are peer-reviewed before being applied.
- Environments are reproducible on demand from code alone.
- History and intent are preserved in version control.
- Rollback is possible because previous state is always known.
What must be codified
Cloud accounts, subscriptions, and projects
Network topology and segmentation
Identity and access controls
Compute and runtime platforms
Storage and databases (where feasible)
Security configurations
Monitoring and alerting
Backups and retention policies
Environment architecture and promotion model
Environment parity with controlled variance. Differences between environments are intentional, documented, and injected as configuration — never embedded in templates.
Local and branch integration
Same templates, lower-tier config
Automated pipeline execution
Identical to PROD topology, test data only
Stakeholder acceptance
PROD-equivalent config, controlled access
Live system
Promoted from UAT, no manual intervention
Non-negotiable environment rules
- Same IaC templates across all environments — no environment-specific modules.
- Environment-specific configuration injected separately, never baked into templates.
- No ad-hoc environment creation — all environments provisioned from code.
- Promotion through code, not clicks — every change enters through the pipeline.
Separation of concerns in IaC design
Foundational platform
Identity, networking, core security controls. Owned by platform team. Rarely changes.
Shared services
Logging, monitoring, secret stores. Consumed by all workloads. Versioned independently.
Application infrastructure
Compute, databases, queues per workload. Owned by product teams using platform modules.
Environment configuration
Variable values injected at apply time. Never hardcoded, always externally sourced.
All changes through code
- Originate from version control — no change may start from a console.
- Be peer-reviewed before applying — infrastructure changes have the same review bar as application code.
- Be traceable to a request or incident — every change linked to its business justification.
- Be deployed through controlled pipelines — not via manual terraform apply or CLI commands.
Emergency changes are allowed, but must be reconciled back into code immediately after resolution.
Keeping code and reality in sync
- Restricted console access — read-only by default for non-platform roles.
- Policy enforcement — automated checks block non-compliant resources from being created.
- Automated drift detection — scheduled runs compare declared state against actual state.
- Regular reconciliation checks — any detected drift is treated as a defect requiring resolution.
Undetected drift is treated as a security incident.
Governance at scale
Clavon encodes governance rules into the platform so they are enforced before deployment, not reviewed after.
- Encryption must be enabled on all storage resources.
- Public access restrictions — no storage bucket or database exposed publicly without explicit approval.
- Resource naming conventions enforced through policy, not documentation.
- Region usage constraints — resources may only be deployed to approved regions.
- Tagging requirements — all resources must carry owner, environment, and cost-centre tags.
IaC as DR foundation
A backup without reproducible infrastructure is incomplete. IaC is foundational to disaster recovery, not a nice-to-have.
- Infrastructure can be recreated — from code alone, without undocumented manual steps.
- Dependencies are documented — every resource knows what it depends on and what depends on it.
- Recovery steps are tested — DR exercises validate that recreation actually works, not just in theory.
- RTO/RPO assumptions are validated — targets are agreed and tested, not assumed.
IaC in regulated and high-assurance contexts
IaC ownership and operating model
Platform team
Owns and maintains core IaC modules. Sets standards, reviews exceptions, governs the module registry.
Product teams
Consume approved modules to provision their workloads. May not build bespoke infrastructure outside the module registry without approval.
Governance
Defines what is and is not permissible. Reviews exception requests. Owns the policy-as-code rules.
What guarantees instability
Mixing environment logic into templates
Templates become unreadable, untestable, and impossible to promote. Environment differences should live in configuration, not modules.
Hard-coded secrets in IaC files
Credentials in version control are a security incident waiting to happen. All secrets must come from a secrets manager at apply time.
Manual hotfixes never codified
Every emergency change that is not reconciled back into code creates drift. The next time the environment is recreated, the hotfix is lost.
Over-engineered modules
Modules with dozens of variables become too complex to use, so teams work around them. Simplicity wins.
Lack of documentation
Without documentation, modules cannot be adopted safely. Every consumer needs to read the source code to understand what they are deploying.
Ignoring drift
Undetected drift destroys the foundation of IaC. If code no longer reflects reality, the entire value proposition is gone.
What we produce
- IaC reference architecture with module structure and ownership boundaries.
- Environment and promotion model with configuration injection strategy.
- Policy-as-code framework for governance at scale.
- Drift detection and prevention strategy with remediation procedures.
- Change governance model including review, approval, and audit trail.
- Audit-ready infrastructure evidence framework for compliance.
- Disaster recovery enablement through reproducible environment definitions.