CI/CD Optimization
with a Case Study Involving Ansible and GitLab

Presented by Zackary Lowery
on October 24th, 2023 at Leading EDJE.

Available online at https://presentations.xcjs.com/

Press your space bar or swipe to navigate down and then forward. ยป

...or How to Write a Bad CI/CD Pipeline and Improve it Iteratively

In Review

This presentation is a spiritual follow-up to Ansible: Putting IaC in Your CI/CD.

What is Ansible?

Ansible is a state-management based IaC solution for host configuration.

https://www.ansible.com/

Ansible Logo

A Note on Ansible Concepts

Inventory

Host Icon

A list of network-accessible hosts.

Role

Role Icon

Unit of software or configuration state.

โฌŠ

โฌ‹

Playbook

Playbook Icon

Assign reusable roles to selected hosts.

How am I using Ansible?

I...have a lot of hosts.

(Still.)

351MP AmberElec Logo
AERO15 Windows 10 Logo
K8S_[0-4] Raspbian Logo
Lakka Lakka Logo
MDEV Ubuntu Logo
MDEV2 Windows 10 Logo
NAS Ubuntu Logo
PlexBox Ubuntu Logo
Runner Ubuntu Logo
UDEV3 Ubuntu Logo
USERV Ubuntu Logo
XCJS Ubuntu Logo
Z390 Windows 10 Logo

Integration Testing Options

Ansible Logo ansible-test (Containers)

โœ… ๐Ÿšซ
  • First-party
  • Native performance
  • Limited OS and
    distribution support

Ansible Logo ansible-test (Cloud VMs)

โœ… ๐Ÿšซ
  • First-party
  • Native performance
  • Monetary cost
  • Limited provider support

Docker Logo Docker

โœ… ๐Ÿšซ
  • Native performance
  • CI/CD runner was shared with
    production Docker host
  • Complexity of Docker in Docker
    for Docker-based roles
  • Limited OS support
    (Linux only on Linux hosts)

Vagrant Logo Vagrant

โœ… ๐Ÿšซ
  • Most encapsulated/isolated
  • Nearly unlimited OS support
  • Slow
  • Performance-intensive

Iteration 1

The Naive Approach

  • All roles are tested against every supported OS on every feature branch push.
  • A single VM is used for every role per supported OS.

153 roles as of writing multiplied by

Ubuntu Logo 18.04
Ubuntu Logo 20.04
Ubuntu Logo 22.04
Windows 10 Logo 10
Windows 11 Logo 11

Abbreviated Iteration 1 Pipeline

							flowchart LR
							A[Start] --> B{Bash LogoNext OS?};
							B -- Yes --> C[Vagrant LogoCreate VM];
							B -- No --> D[End];
							C --> E[Vagrant LogoProvision VM];
							E --> F{Bash LogoNext Role?};
							F -- Yes --> G[Vagrant LogoAnsible LogoVM Applies Role];
							F -- No --> H[Vagrant LogoDestroy VM];
							G --> F;
							H --> B;
						

The Problem

Screen shot of a CI/CD task that ran for just over 27 hours.

...or just over 27 hours.

...to fail.

Iteration 1 Pipeline Highlights

  • Every role is installed to the same VM.
  • Vagrant VMs are limited to 50 GB by default.
  • Disk I/O is insane, leading to...

โ˜ ๏ธ The Other Problem โ˜ ๏ธ

Dead NVME Drive

๐Ÿ˜ญ

Yet Another Reminder

โš ๏ธ If it's important, back it up! โš ๏ธ

Iteration 2

CI/CD Goals

  • Reduce Redundant Testing

Resolutions

  • Perform diff-detection on Ansible roles.
  • Dynamically iterate over roles (We'll come back to this later.)
  • Test each role in a separate VM.

Implement GitLab CI/CD Rules:Changes Directive

  • Not to be confused with the obsoleted changes directive
  • Requires a long-running branch to compare against

Rules:Changes Example


							${role} Test:
								stage: Test
								rules:
								- if: '$CI_COMMIT_BRANCH == "testing"'
								  changes:
								    - roles/.common/tasks/*.yml
								    - roles/${role}/tasks/*.yml
								    - test/generate-tests.sh
								    - .gitlab-ci.yml
								    - Vagrantfile
						

Abbreviated Iteration 2 Pipeline

							flowchart LR
							A[Start] --> B{Bash LogoNext Role?};
							B -- Yes --> C{Bash LogoNext VM?};
							B -- No --> D[End];
							C -- Yes --> E[Vagrant LogoCreate VM];
							C -- No --> B;
							E --> G[Vagrant LogoProvision VM];
							G --> H[Vagrant LogoAnsible LogoVM Applies Role];
							H --> I[Vagrant LogoDestroy VM];
							I --> B{Bash LogoNext Role?};
						

Iteration 2 Pipeline Highlights

  • Only roles altered since the last release are tested.
  • VMs have to be created and destroyed per-role.
  • Disk I/O is down significantly, but still higher than it needs to be.

Iteration 3

CI/CD Goals

  • Utilize separation of concerns to further reduce repeat testing.

Resolutions

  • Remove repeat testing of supported operating
    systems that have already passed testing for a specific role.

Rules:Changes Updates


							${role}/Ubuntu 20.04:
								stage: Test
								rules:
								  - if: '$CI_COMMIT_BRANCH == "testing"'
								    changes:
								      - roles/.common/tasks/*.yml
								      - roles/${role}/tasks/main.yml
								      - roles/${role}/tasks/linux.yml
								      - roles/${role}/tasks/ubuntu_20.04.yml
								      - test/templates/ubuntu_20.04/.gitlab-ci.part.yml
								      - test/templates/ubuntu_20.04/blacklist.yml
								      - test/templates/ubuntu_20.04/site-test.yml
								      - test/generate-tests.sh
								      - .gitlab-ci.yml
								      - Vagrantfile
						

Abbreviated Iteration 3 Flow

							flowchart LR
							A[Start] --> B{Bash LogoRole Changed?};
							B -- Yes --> C{Bash LogoNext OS?};
							B -- No --> J[End];
							C -- Yes --> D[Vagrant LogoPrepare VM Disk];
							C -- No --> B;
							D --> E[Vagrant LogoProvision VM];
							E --> F[Vagrant LogoStart VM];
							F --> G[Vagrant LogoAnsible LogoApply Role];
							G --> H[Vagrant LogoDestroy VM];
							H --> C;
						

Iteration 3 Highlights

  • Each task list in a role is for a specific operating system.
  • Diff detection now works on a per-role and per-operating-system basis.

Iteration 4

CI/CD Goals

  • Know your tooling!

Resolutions

  • Prepare virtual machine images once and revert after each role is applied.

Abbreviated Iteration 4 Flow

Setup Loop

							flowchart LR
							A[Start] --> B{Bash LogoNext OS?};
							B -- Yes --> C[Vagrant LogoCreate VM];
							B -- No --> D[Start Testing];
							C --> E[Vagrant LogoProvision VM];
							E --> F[Vagrant LogoVM Disk Snapshot];
							F --> G[Vagrant LogoHalt VM];
							G --> B;
						

Abbreviated Iteration 4 Flow

Testing Loop

							flowchart LR
							A[Start] --> B{Bash LogoRole Changed?};
							B -- Yes --> C[Vagrant LogoRestore Disk Snapshot];
							B -- No --> G[End]
							C --> E[Vagrant LogoAnsible LogoApply Role];
							E --> F[Vagrant LogoHalt VM];
							D -- Yes --> C;
							D -- No --> B;
							F --> D{Bash LogoNext OS?};
						

Iteration 4 Highlights

  • VMs are now reused for each role with a snapshot that restores the VM to its state prior to having the previous role installed.

Results!

Faster GitLab Results!

Wait!

How am I Iterating over Roles in the First Place?

Abusing GitLab CI/CD by Bash-ing it to Death

Bash Shell Logo

...and Taking Advantage of GitLab's Child Pipeline Feature

bash Script File Icon ./test/generate-tests.sh

Supported Operating Systems are Stored in a bash Array


							supportedOperatingSystems=(
								"ubuntu_18.04"
								"ubuntu_20.04"
								"ubuntu_22.04"
								"windows_10"
								"windows_11"
							)
						

Supported Roles are Iterated from the Filesystem


							for path in roles/*/; do # Whitespace-safe
								role=$(basename "${path}")
							done
						

Supported Operating Systems are Iterated per Role


							for os in "${supportedOperatingSystems[@]}"
							do
								templateDir="test/templates/${os}/"
								...
							done
						

I could directly iterate the template directories, but I wanted the option to easily sunset formerly supported operating systems.

OS-Specific Task List Files are Verified Per Role


							if [ -f "${path}/tasks/${os}.yml" ]
						      && ! containsElement "${role}" "${roleBlacklist[@]}"; then
							  ...
							endif
						

Append to the Child Pipeline Using a Template

(Using envsubst)


							echo "Role ${role} supports ${os} and will be added to CI/CD."

							gitlabRoleCi=$(envsubst '$role'
							  < "test/templates/${os}/.gitlab-ci.part.yml")

							echo "${gitlabRoleCi}" >> ".gitlab-ci-roles.yml"
						

GitLab Provided Diff Detection

.gitlab-ci.part.yml templates include rules:changes directives:


						  - if: '$CI_COMMIT_BRANCH == "testing"'
						    changes:
						      - roles/.common/tasks/*.yml
						      - roles/${role}/tasks/main.yml
						      - roles/${role}/tasks/linux.yml
						      - roles/${role}/tasks/ubuntu_18.04.yml
						      - test/templates/ubuntu_18.04/.gitlab-ci.part.yml
						      - test/templates/ubuntu_18.04/blacklist.yml
						      - test/templates/ubuntu_18.04/playbook-test.yml
						      - test/generate-tests.sh
						      - .gitlab-ci.yml
						      - Vagrantfile
						

Registering the Child Pipeline (Parent)


						  Generate Tests:
						    stage: Test Setup
						    artifacts:
						      name: gitlab-ci-roles
						      expire_in: 1 week
						      paths:
						        - .gitlab-ci-roles-ubuntu_18.04.yml
						        - .gitlab-ci-roles-ubuntu_20.04.yml
						        - .gitlab-ci-roles-ubuntu_22.04.yml
						        - .gitlab-ci-roles-windows_10.yml
						        - .gitlab-ci-roles-windows_11.yml
						        - .vagrant/
						

Registering the Child Pipeline (Child)


						  {$role}/{$os} Test:
						    Stage: Test
						    needs:
						      - pipeline: $PARENT_PIPELINE_ID
						        job: Generate Tests
						

Additional Enhancements

Use Language Tooling for CI/CD Resources

shellcheck allows us to fail early!


						  Lint Test Generation:
						    stage: Lint
						    tags:
						      - linux
						      - shell
						      - shellcheck
						    script:
						      - shellcheck test/generate-tests.sh
						

Throw More Hardware at the Problem!

New Gitlab-Runner Host

Future Iterations?

  • Persist VM snapshots across pipelines and update them on box updates?
    vagrant box update
  • Maintain my own base VM images?
  • Support multiple testing strategies?
    • Containers for simple Linux-supported roles?
    • Virtual machines for tests requiring isolation and non-Linux hosts?

End (of the Pipeline)

Questions? Suggestions? Rude Insults?

Return to the rest of the presentations.