Contents

How to use GitOps & DevSecOps with Terraform


Here is how I use GitOps and DevSecOps to deliver better Terraform templates more efficiently and securely.

If you are like me, who loves listening to music to boost productivity. Grab yourself some Ed Sheeran songs below!๐ŸŽต ๐Ÿ‘‡

(Some songs may not be available due to copyright. ๐Ÿ™)

Intro

We are using Terraform for infrastructure auto-provision. We have a group of developers working on the Terraform templates at the same time.

Challenges We Are Facing

There are three main challenges we are facing:

  • Collaboration
  • Security
  • Reliability

Collaboration

Since there are multiple developers working on the Terraform templates at the same time, everyone has a slightly different copy of the Terraform templates.

There isn’t a single source of truth. It is challenging to coordinate the changes made to Terraform templates.

Security

Every developer needs an AWS user with an administrator role to perform infrastructure testing with the Terraform templates. It means each developer has direct admin access to the AWS cloud, and the credentials are managed by the developer locally.

If one of the developers leaks the credentials, our entire AWS cloud is at risk. The attack surface increases as more developers joining the team to work on the Terraform templates. We also need to manage the AWS users associated with the developers probably.

Reliability

The reliability of changes made by developers is not guaranteed.

We don’t have a systematic way to ensure the changes are made with best practices. There isn’t a systematic way for other developers to verify the changes made by a developer.

There isn’t a way for us to keep track of the changes. If there are some issues with the latest Terraform templates, it is a nightmare to roll back.

Solution = GitOps + DevSecOps

To overcome the challenges mentioned above, I introduce GitOps and DevSecOps into our Terraform development.

GitOps integrates best practices into IaC(Infrastructure as Code) development with a version control system, and DevSecOps ensures a flexible, secure, and efficient CI/CD pipeline.

1 GitOps

GitOps is a software development mechanism that improves developer productivity.

1.1 Gitlab - Version Control System

In this case, I am using Gitlab as a version control system to store the Terraform templates. Gitlab is the single source of truth. All teammates need to push the changes they make to Gitlab. And Gitlab has the latest copy of the Terraform templates.

By doing so, we can version the Terraform templates with commits and branches. We can also easily trace back who made what changes at what time with the Git history.

Thanks to Terraform’s declarative syntax. Each commit contains the Terraform templates that can create the final desire infrastructure. Therefore, it is easy to rollback and to perform disaster recovery.

1.2 GitOps Workflow - Development Flow

A picture is worth a thousand words, the diagram below visualizing a simple GitOps workflow.

/how-to-use-gitops-devsecops-with-terraform/GitOps.png
GitOps Workflow Diagram

When a developer wants to make some changes to the Terraform templates. The developer can make a feature branch based on the main branch, and create a merge request with the feature branch.

After he pushes the feature branch with the changes to Gitlab, a CI/CD pipeline is triggered to validate the changes and deploy the changes to the testing environment. Once the changes pass the pipeline, another developer will come and review the changes and approve them.

After that, the feature branch is merged back to the main branch, another CI/CD pipeline is triggered to validate the changes and deploy the changes to the production environment.

Therefore, we can see that GitOps ensures changes are made transparently and systematically. Tedious validation and deployment are automated and managed by Gitlab to boost the developer’s productivity. The quality of IaC also improved with the involvement of multiple developers.

GitOps Tip

In this example, we have the main branch associated with the production environment and a testing environment associated with the feature branch.

The main components of GitOps are Git, Merge Request, and CI/CD Pipeline. We can always add in more Git branches, and associate them with more deployment environments to meet our needs.

2 DevSecOps

DevSecOps is a pipeline mechanism that improves software delivery with security in mind.

2.1 DevSecOps Pipeline

Let’s dive deep into the CI/CD pipeline. There is a total of six pipeline jobs inside this pipeline involving three main components.

The six jobs are:

  • Validate - Dev
  • Build - Dev
  • Security checks - Sec
  • Generate security report - Sec
  • Deploy - Ops
  • Destroy - Ops

The three main components are:

  • Gitlab
  • Terraform Cloud
  • AWS [Amazon Web Services]
Terraform Cloud
Terraform Cloud is a managed service offered by HashiCorp. It helps us to provision infrastructure securely and reliably in the cloud with free remote state storage.

2.2 Pipeline Architecture

Below is a diagram showing how seven jobs inside the pipeline involving the three main components.

When new changes are pushed to the feature branch or the main branch, a new DevSecOps pipeline is triggered.

/how-to-use-gitops-devsecops-with-terraform/terraform_ci-cd.png
DevSecOps Pipeline

The pipeline flow is described as following:

  1. The validate job ensures the Terraform template is free of syntax errors. If the job passed, it triggers the build job next.
  2. The build job uses the Terraform Cloud API key to contact Terraform Cloud to generate an execution plan.
  3. Terraform Cloud then uses the AWS Access Credentials to check the live state of the infrastructure on AWS.
  4. The live state of infrastructure is returned to Terraform Cloud, and an execution plan is created.
  5. Terraform Cloud returns successful build outputs to Gitlab CI/CD.
  6. Security checks job is triggered if the build job is successful. I am using CheckOV to perform security scanning on the Terraform templates.
  7. After the security checks are finished, the generate security report job is triggered to produce a security report.
  8. The developer can now manually trigger the deploy job to deploy the changes to the live infrastructure.
  9. The deploy job uses the Terraform Cloud API key to trigger the Terraform API to implement the changes to the live infrastructure.
  10. Terraform Cloud implements the changes to the live infrastructure on AWS.
  11. The outputs of the implementation returned to Terraform Cloud.
  12. Terraform Cloud returns back the outputs of the implementation back to Gitlab deploy job.
  13. After the developer finishes testing the live infrastructure, he can trigger the destroy job to remove all infrastructure to avoid unnecessary charges.
  14. The destroy job uses the Terraform Cloud API key to trigger the Terraform API to remove the live infrastructure.
  15. Terraform Cloud removes the live infrastructure on AWS.
  16. The outputs of the implementation returned to Terraform Cloud.
  17. Terraform Cloud returns back the outputs of the implementation back to Gitlab deploy job.

2.3 Pipeline Jobs

Let’s dive deep into the details of each pipeline job with code snippets.

The DevSecOps pipeline is constructed with the .gitlab-ci.yml file.

.gitlab-ci.yml file
We can configure the keywords of .gitlab-ci.yml file to customize the pipeline to our needs.

There are six stages for the six pipeline jobs. We define the six stages as shown below.

1
2
3
4
5
6
7
stages:          # List of stages for jobs, and their order of execution
  - validate
  - build
  - security_checks
  - generate_security_report
  - deploy
  - destroy
2.3.1 validate job
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
terraform_validate:  
  stage: validate    
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/go/bin'
  only:
    - master
  script:
    - terraform init -backend=false
    - terraform validate

The first job in the pipeline is the validate job. The job is triggered when changes are pushed to the master branch

Job image
For terraform-related jobs, I use the HashiCorp Terraform docker image. It is preconfigured with the Terraform command.

For the job script, we have terraform init -backend=false to initialize the Terraform directory.

We can then run terraform validate to check if the Terraform templates are free of syntax errors.

2.3.2 build job
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
terraform_build:   
  needs: ['terraform_validate']
  stage: build    
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/go/bin'
  only:
    - master
  script:
    - echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json
    - terraform init
    - terraform plan 

If the validate job is successful, the build job is automatically triggered next.

For the job script, we have echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json to enable the job to have a connection with the Terraform Cloud.

$TERRAFORM_CLOUD_API_KEY

$TERRAFORM_CLOUD_API_KEY is a JSON object containing the Terraform Cloud token. The format is shown below.

1
2
3
4
5
6
7
{
  "credentials": {
    "app.terraform.io": {
      "token": <TERRAFORM_CLOUD_API_TOKEN>
    }
  }
}

The Terraform Cloud token is retrieved manually by running terrraform login.

We can then store $TERRAFORM_CLOUD_API_KEY inside the Gitlab Repo and refer it inside the job.

We then run terraform init to initialize the Terraform directory.

Lastly, we run terraform plan to build an execution plan. If the command fails, it means the changes are not implementable. Thus, it is impossible to deploy the changes onto AWS.

2.3.3 security checks job
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
terraform_security_checks:
  needs: ['terraform_build']
  stage: security_checks
  image:
    name: bridgecrew/checkov:latest
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
  only:
    - master
  script:
    - checkov -d .
  allow_failure: true

If the build job is successful, the security checks job is automatically triggered next.

Job image
For security checks job and generate security report job, I use the Bridgecrew CheckOV docker image. It is preconfigured with the CheckOV command.
CheckOV
CheckOV is a static code analysis tool for scanning infrastructure as code (IaC) files for misconfigurations that may lead to security or compliance problems. Checkov includes more than 750 predefined policies to check for common misconfiguration issues.

For the job script, we have checkov -d . to scan all Terraform templates for misconfigurations.

The command will output the scan result to the console. It first displays the number of passed checks, the number of failed checks, and the number of skipped checks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ checkov -d .
       _               _              
   ___| |__   ___  ___| | _______   __
  / __| '_ \ / _ \/ __| |/ / _ \ \ / /
 | (__| | | |  __/ (__|   < (_) \ V / 
  \___|_| |_|\___|\___|_|\_\___/ \_/  
                                      
By bridgecrew.io | version: 2.0.337 
terraform scan results:
Passed checks: 42, Failed checks: 35, Skipped checks: 0

The console output of a passed check is shown below. We have

  • The name of the check
  • The specific resource
  • The Terraform template
  • The related guide on the check
1
2
3
4
Check: CKV_AWS_41: "Ensure no hard coded AWS access key and secret key exists in provider"
	PASSED for resource: aws.default
	File: /main.tf:9-11
	Guide: https://docs.bridgecrew.io/docs/bc_aws_secrets_5

The console output of a failed check is shown below. We have

  • The name of the check
  • The specific resource
  • The Terraform template
  • The related guide on the check
  • The block of Terraform codes that causes the failed check
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Check: CKV2_AWS_4: "Ensure API Gateway stage have logging level defined as appropriate"
	FAILED for resource: aws_api_gateway_stage.example
	File: /modules/api_gateway/main.tf:85-90
	Guide: https://docs.bridgecrew.io/docs/ensure-api-gateway-stage-have-logging-level-defined-as-appropiate
		85 | resource "aws_api_gateway_stage" "example" {
		86 |   deployment_id = aws_api_gateway_deployment.example-api-gateway-deployment.id
		87 |   rest_api_id   = aws_api_gateway_rest_api.example-api.id
		88 |   stage_name    = "example"
		89 |   
		90 | }
CheckOV failed checks remediation
We can open the guide URL to obtain steps to remediate the failed checks.
Visualizing CheckOV output
Console output can be tedious to read through. Bridgecrew provides a cloud dashboard to visualize CheckOV output.
CheckOV custom policy
750 predefined policies are not enough to fulfil your needs? CheckOV allows us to write our own custom policies in Python and YAML!

For this job, I also set the keyword allow_failure to true. It means when this job fails aka there are failed checks, the pipeline continues without termination.

There are a few reasons why I want it in this way.

First, when we are building an MVP, the top priority is not to optimize everything. We want to push out the basic set of features first. Therefore, it is acceptable to have some failed security and compliance checks.

Second, some use cases may require configuration that leads to failed checks. For example, we may need public access to S3 objects for people to download without any restriction.

2.3.4 generate security report job
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
generate_terraform_security_checks:
  needs: ['terraform_security_checks']
  stage: generate_security_report
  image:
    name: bridgecrew/checkov:latest
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
  only:
    - master
  script:
    - checkov -d . -o junitxml >> checkov.test.xml
  artifacts:
    reports:
      junit: "checkov.test.xml"
    paths:
      - "checkov.test.xml"
  allow_failure: true

After the security checks job is finished, the generate security report job is triggered automatically.

For the job script, we have checkov -d . -o junitxml >> checkov.test.xml to generate a security report in XML format.

CheckOV security report format
We can export the security report in JSON format by setting the -o flag to -o json.

Then we export it as an artifact for downloading.

If there are failed checks inside the report, the job will be failed, and the pipeline will be terminated. We can prevent that by setting allow_failure to true.

2.3.5 deploy job
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
terraform_cloud_deploy:
  when: manual
  needs: ['terraform_security_checks']
  stage: deploy
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/go/bin'
  only:
    - master
  script:
    - echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json
    - terraform init 
    - terraform apply --auto-approve

When the security checks job is finished, developers can manually trigger the deploy job to get Terraform Cloud to deploy changes onto AWS.

I set the job to be manually triggered with when: manual.

Why is the deploy job manual?

The reason is that I want to involve the developers to decide when to deploy the changes after they review the outputs of the security checks job.

If the deploy job is auto-triggered after the security checks job, and there are some security or compliance issues found that are not acceptable. It will lead to unacceptable loopholes in the infrastructure.

Therefore, I set the deploy job to be triggered manually. Then developers have the responsibility to ensure the security and compliance issues reported by the security check job are acceptable before deploying the changes to the infrastructure.

This involvement of developers ensures developers produce codes that are secure and reliable. Thus, reducing potential damage caused by insecure and unreliable codes during production.

For the job script, we have echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json to enable the job to have a connection with the Terraform Cloud.

We then run terraform init to initialize the Terraform directory.

Lastly, we run terraform apply --auto-approve to get Terraform Cloud to implement the changes onto AWS.

2.3.6 destroy job
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
terraform_cloud_destroy:
  when: manual
  needs: ['terraform_cloud_deploy']
  stage: destroy
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/go/bin'
  only:
    - master
  script:
    - echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json
    - terraform init 
    - terraform destroy --auto-approve

When the deploy job is finished, developers can manually trigger the destroy job to get Terraform Cloud to remove the infrastructure created on AWS.

Why is the destroy job manual?
It is to give developers greater control over the infrastructure lifecycle. They can destroy the infrastructure based on their needs. Thus, a more flexible pipeline.

For the job script, we have echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json to enable the job to have a connection with the Terraform Cloud.

We then run terraform init to initialize the Terraform directory.

Lastly, we run terraform destroy --auto-approve to get Terraform Cloud to destroy the infrastructure on AWS.

2.4 Summary of DevSecOps Pipeline

Below is the whole .gitlab-ci.yml file for the Gitlab DevSecOps pipeline.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
stages:          # List of stages for jobs, and their order of execution
  - validate
  - build
  - security_checks
  - generate_security_report
  - deploy
  - destroy

terraform_validate:  
  stage: validate    
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/go/bin'
  only:
    - master
  script:
    - terraform init -backend=false
    - terraform validate

terraform_build:   
  needs: ['terraform_validate']
  stage: build    
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/go/bin'
  only:
    - master
  script:
    - echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json
    - terraform init
    - terraform plan 

terraform_security_checks:
  needs: ['terraform_build']
  stage: security_checks
  image:
    name: bridgecrew/checkov:latest
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
  only:
    - master
  script:
    - checkov -d .
  allow_failure: true

generate_terraform_security_checks:
  needs: ['terraform_security_checks']
  stage: generate_security_report
  image:
    name: bridgecrew/checkov:latest
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
  only:
    - master
  script:
    - checkov -d . -o junitxml >> checkov.test.xml
  artifacts:
    reports:
      junit: "checkov.test.xml"
    paths:
      - "checkov.test.xml"
  allow_failure: true

terraform_cloud_deploy:
  when: manual
  needs: ['terraform_security_checks']
  stage: deploy
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/go/bin'
  only:
    - master
  script:
    - echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json
    - terraform init 
    - terraform apply --auto-approve

terraform_cloud_destroy:
  when: manual
  needs: ['terraform_cloud_deploy']
  stage: destroy
  image:
    name: hashicorp/terraform:light
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/go/bin'
  only:
    - master
  script:
    - echo $TERRAFORM_CLOUD_API_KEY >> ~/.terraform.d/credentials.tfrc.json
    - terraform init 
    - terraform destroy --auto-approve

Below is the Gitlab visualization of the DevSecOps pipeline.

/how-to-use-gitops-devsecops-with-terraform/terraform_ci-cd-gitlab.png
Gitlab DevSecOps Pipeline
Benefits of Terraform Cloud

Terraform Cloud allows us to decouple the infrastructure from the development.

Without Terraform Cloud, we need to store the AWS credentials on Gitlab, so the CI/CD jobs have the permission to use the Terraform template to set up infrastructure on AWS.

It can be a potential security issue. Developers who have access to Gitlab are also able to access AWS credentials.

Terraform Cloud acts as an abstraction layer before the AWS cloud. It ensures only infrastructure engineers have access to AWS directly. Thus, minimizing the attack surface.

In a nutshell, we only need to call the Terraform Cloud API inside the deploy CI/CD job. Terraform Cloud will run the Terraform templates and manage the infrastructure state, credentials, and history.

Let’s zoom out. What if we have 10 Gitlab repositories to manage ten different sets of Terraform templates. Without Terraform Cloud, we need to configure AWS credentials in all 10 Gitlab repos. It means ten groups of developers have access to the AWS credentials. With Terraform Cloud, there is only one point we need to configure the AWS credentials. Thus, the attack surface is always consistent when we create more GitOps workflow and DevSecOps pipelines.

Outro

I hope this example demonstrates how Terraform templates and the associated infrastructure lifecycle can be built securely, reliability, and systematically with the involvement of both automation and developers using GitOps and DevSecOps.

Want to support me?