Cloud, DevOps & Platform Engineering

Infrastructure as Code

If infrastructure cannot be recreated from code, it is not production infrastructure. IaC is not just automation — it is governance through code.

The Problem

Why cloud infrastructure fails

Across organisations, infrastructure failures follow the same patterns. The root cause is almost never the cloud provider — it is the absence of engineering discipline around how infrastructure is defined, changed, and verified.

Manual changes made in cloud consoles

Engineers click through consoles to "fix things quickly." Changes are never recorded, never reviewed, and never reproducible. The system slowly diverges from any documented state.

Environments that drift over time

DEV, TEST, and PROD start identical and gradually diverge through undocumented exceptions. By the time bugs appear, no two environments are the same.

Undocumented exceptions during incidents

Emergency changes made under pressure are never reconciled back into code. Every incident leaves behind hidden configuration debt.

No clear ownership of infrastructure changes

Anyone with console access can change anything. Without version control, there is no record of who changed what, when, or why.

Infrastructure recreated differently each time

Each new environment is built from memory, tribal knowledge, or outdated runbooks. No two environments are truly equivalent.

Audits relying on screenshots and tribal knowledge

When auditors ask "what is running and why," the answer is screenshots, conversations, and guesswork. Compliance becomes a reconstruction exercise.

The consequences

  • Unpredictable behaviour that cannot be reproduced in lower environments.
  • Security gaps from misconfigured resources that nobody owns.
  • Compliance exposure when controls cannot be evidenced.
  • Failed disaster recovery because infrastructure cannot be recreated.
  • Inability to onboard new engineers without significant knowledge transfer.
Principle

Infrastructure is software

If it is not versioned, tested, reviewed, and controlled, it is a liability. This principle applies equally to networks, compute, storage, identity, security policies, and monitoring.

IaC is not just automation — it is governance through code.

What IaC means

What it actually requires

  • Infrastructure definitions are source-controlled like application code.
  • Changes are peer-reviewed before being applied.
  • Environments are reproducible on demand from code alone.
  • History and intent are preserved in version control.
  • Rollback is possible because previous state is always known.
Scope

What must be codified

Cloud accounts, subscriptions, and projects

Network topology and segmentation

Identity and access controls

Compute and runtime platforms

Storage and databases (where feasible)

Security configurations

Monitoring and alerting

Backups and retention policies

Environments

Environment architecture and promotion model

Environment parity with controlled variance. Differences between environments are intentional, documented, and injected as configuration — never embedded in templates.

DEV

Local and branch integration

Same templates, lower-tier config

TEST

Automated pipeline execution

Identical to PROD topology, test data only

UAT

Stakeholder acceptance

PROD-equivalent config, controlled access

PROD

Live system

Promoted from UAT, no manual intervention

Non-negotiable environment rules

  • Same IaC templates across all environments — no environment-specific modules.
  • Environment-specific configuration injected separately, never baked into templates.
  • No ad-hoc environment creation — all environments provisioned from code.
  • Promotion through code, not clicks — every change enters through the pipeline.
Design

Separation of concerns in IaC design

Foundational platform

Identity, networking, core security controls. Owned by platform team. Rarely changes.

Shared services

Logging, monitoring, secret stores. Consumed by all workloads. Versioned independently.

Application infrastructure

Compute, databases, queues per workload. Owned by product teams using platform modules.

Environment configuration

Variable values injected at apply time. Never hardcoded, always externally sourced.

Change Management

All changes through code

  • Originate from version control — no change may start from a console.
  • Be peer-reviewed before applying — infrastructure changes have the same review bar as application code.
  • Be traceable to a request or incident — every change linked to its business justification.
  • Be deployed through controlled pipelines — not via manual terraform apply or CLI commands.

Emergency changes are allowed, but must be reconciled back into code immediately after resolution.

Drift Prevention

Keeping code and reality in sync

  • Restricted console access — read-only by default for non-platform roles.
  • Policy enforcement — automated checks block non-compliant resources from being created.
  • Automated drift detection — scheduled runs compare declared state against actual state.
  • Regular reconciliation checks — any detected drift is treated as a defect requiring resolution.

Undetected drift is treated as a security incident.

Policy as Code

Governance at scale

Clavon encodes governance rules into the platform so they are enforced before deployment, not reviewed after.

  • Encryption must be enabled on all storage resources.
  • Public access restrictions — no storage bucket or database exposed publicly without explicit approval.
  • Resource naming conventions enforced through policy, not documentation.
  • Region usage constraints — resources may only be deployed to approved regions.
  • Tagging requirements — all resources must carry owner, environment, and cost-centre tags.
Disaster Recovery

IaC as DR foundation

A backup without reproducible infrastructure is incomplete. IaC is foundational to disaster recovery, not a nice-to-have.

  • Infrastructure can be recreated — from code alone, without undocumented manual steps.
  • Dependencies are documented — every resource knows what it depends on and what depends on it.
  • Recovery steps are tested — DR exercises validate that recreation actually works, not just in theory.
  • RTO/RPO assumptions are validated — targets are agreed and tested, not assumed.
Regulated Context

IaC in regulated and high-assurance contexts

Auditable change history — every infrastructure change has a commit, reviewer, and timestamp.
Reproducible environments — any environment can be recreated from code in a documented, repeatable way.
Evidence generation — pipeline outputs produce deployment records usable in audits.
Segregation of duties — different roles required to write code, review code, and apply to production.
Controlled access — access to each environment governed by role-based policies, not shared credentials.
Ownership

IaC ownership and operating model

Platform team

Owns and maintains core IaC modules. Sets standards, reviews exceptions, governs the module registry.

Product teams

Consume approved modules to provision their workloads. May not build bespoke infrastructure outside the module registry without approval.

Governance

Defines what is and is not permissible. Reviews exception requests. Owns the policy-as-code rules.

Anti-Patterns

What guarantees instability

Mixing environment logic into templates

Templates become unreadable, untestable, and impossible to promote. Environment differences should live in configuration, not modules.

Hard-coded secrets in IaC files

Credentials in version control are a security incident waiting to happen. All secrets must come from a secrets manager at apply time.

Manual hotfixes never codified

Every emergency change that is not reconciled back into code creates drift. The next time the environment is recreated, the hotfix is lost.

Over-engineered modules

Modules with dozens of variables become too complex to use, so teams work around them. Simplicity wins.

Lack of documentation

Without documentation, modules cannot be adopted safely. Every consumer needs to read the source code to understand what they are deploying.

Ignoring drift

Undetected drift destroys the foundation of IaC. If code no longer reflects reality, the entire value proposition is gone.

Deliverables

What we produce

  • IaC reference architecture with module structure and ownership boundaries.
  • Environment and promotion model with configuration injection strategy.
  • Policy-as-code framework for governance at scale.
  • Drift detection and prevention strategy with remediation procedures.
  • Change governance model including review, approval, and audit trail.
  • Audit-ready infrastructure evidence framework for compliance.
  • Disaster recovery enablement through reproducible environment definitions.
Start a Conversation

Make your cloud infrastructure behave like a reliable system

We design, implement, and govern Infrastructure as Code so your environments are reproducible, auditable, and drift-free — from day one.