[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1140

2026-03-03T22:24:38Z

github-actions[bot]
bot Mar 3, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and layered CI/CD pipeline with 56 total GitHub Actions workflows spanning static analysis, unit tests, integration tests, security scanning, smoke testing, and AI-driven quality checks. Recent run data shows most pipelines healthy, with one notable active failure.

Workflow Category	Count	Status
Build & Lint	3	✅ All passing
Unit Test + Coverage	1	✅ Passing
Integration Tests	4 jobs in suite	❌ Currently failing
Chroot Integration Tests	4 jobs	✅ Passing
Security Scanning	3 (CodeQL, Trivy, npm audit)	✅ Passing
Smoke Tests (AI agents)	4 (Claude, Codex, Copilot, Chroot)	✅ Passing
Agentic Build Tests	8 languages	✅ All passing
PR Governance	2 (title check, security guard)	✅ Passing
Docs Deployment	1	✅ Passing

✅ Existing Quality Gates

The following checks currently run on pull requests:

Code Quality

✅ ESLint (lint.yml) — TypeScript linting, runs on all PRs
✅ TypeScript Type Check (test-integration.yml) — strict type checking via tsc --noEmit
✅ Build Verification (build.yml) — builds on Node 20 + 22 matrix, includes API proxy unit tests
✅ PR Title Check (pr-title.yml) — enforces Conventional Commits format with allowed scope list

Testing

✅ Unit Test Coverage (test-coverage.yml) — runs jest with coverage, comments on PRs, fails on regression
✅ Integration Tests (test-integration-suite.yml) — 4 parallel job groups: domain/network, protocol/security, container ops, API proxy
✅ Chroot Integration Tests (test-chroot.yml) — language support, package managers, procfs, edge cases
✅ Examples Test (test-examples.yml) — validates example shell scripts run end-to-end
✅ Test Setup Action (test-action.yml) — validates the GitHub Action itself installs correctly

Security

✅ CodeQL (codeql.yml) — static analysis for JS/TS and GitHub Actions YAML
✅ Container Vulnerability Scan (container-scan.yml) — Trivy scans both squid and agent images, uploads SARIF
✅ Dependency Audit (dependency-audit.yml) — npm audit for main and docs-site packages, SARIF upload
✅ AI Security Guard (security-guard.md) — Claude-based review flags changes that weaken firewall security posture

Smoke / E2E

✅ Smoke Tests for Claude, Codex, Copilot, and Chroot modes — end-to-end runs using local container builds

🔍 Identified Gaps

🔴 High Priority

1. Integration Tests suite is currently failing
The Integration Tests workflow (test-integration-suite.yml) is actively failing in recent runs. This is the most critical gap: PRs may be merging while the primary integration suite is broken, reducing confidence in every merge.

Likely root cause: Docker network conflicts or container build issues (the "Pool overlaps" class of errors documented in AGENTS.md)
Risk: Functional regressions in domain blocking, protocol filtering, or API proxy behavior go undetected

2. Unit test coverage thresholds are critically low
Coverage thresholds are set at only 30–38%, and the most critical production files are nearly untested:

File	Coverage	Risk
`cli.ts` (entry point)	0%	High
`docker-manager.ts` (core orchestration)	18%	High
`host-iptables.ts` (security enforcement)	83%	Medium

A 0% threshold for cli.ts means signal/error handling paths, cleanup logic, and CLI argument parsing are completely unvalidated by unit tests.

3. Container scan does not run on code-only changes
container-scan.yml triggers only on paths: containers/** changes. If a code change in src/ modifies how containers are configured or introduces a misconfiguration, Trivy never runs. Additionally, when base images receive new CVEs between release cycles, there is no automated detection until the next containers/ commit.

4. Shell scripts have no linting
The repository contains security-critical shell scripts (containers/agent/setup-iptables.sh, containers/agent/entrypoint.sh, scripts/ci/cleanup.sh, etc.) that are not linted with shellcheck. Shell script bugs in setup-iptables.sh could silently break firewall rules without any CI signal.

🟡 Medium Priority

5. Smoke tests are opt-in via emoji reaction on PRs
The main smoke tests (smoke-claude.md, smoke-codex.md, smoke-copilot.md) require specific emoji reactions (👁️, ❤️, 🎉) to trigger on PRs. Without the reaction, these tests only run on a 12-hour schedule. PRs that change core container/proxy logic can merge without any human-triggered smoke test running.

6. No secret scanning on PRs
While hourly secret-digger workflows run on the main branch, there is no pre-merge secret scanning (e.g., gitleaks or trufflehog) on pull requests. A credential accidentally committed in a PR would only be caught after merging to main.

7. No performance/startup regression testing
AWF starts Docker containers and configures iptables rules as part of every invocation. There are no benchmarks or timing guards to detect startup time regressions. A slow container start could degrade UX for all users of the tool without any CI signal.

8. test-integration.yml filename/name mismatch
The file test-integration.yml contains a workflow named TypeScript Type Check, while test-integration-suite.yml contains Integration Tests. This naming inconsistency makes CI status confusing and may cause required status checks to be misconfigured if branch protection rules reference workflow names vs file names.

9. No mutation testing
Current coverage metrics measure line execution but not test quality. With only 38% line coverage and low thresholds, mutation testing (e.g., Stryker) would reveal whether the existing tests actually assert meaningful behavior or just execute code paths.

10. Build-test agentic workflows use external test repos
The build-test-*.md workflows clone external test repositories (e.g., Mossaka/gh-aw-firewall-test-node) via AI agents. If those repos become unavailable or change, CI silently degrades. There's no fallback or local fixture option.

🟢 Low Priority

11. Documentation build not verified on PRs
deploy-docs.yml only builds the Astro docs site on pushes to main with docs-site/** changes. There is no preview build or broken-link check for documentation PRs, so broken docs only surface after merging.

12. No Node.js version compatibility matrix for integration tests
Unit tests run on Node 20 and 22 (matrix in build.yml), but integration tests only run on Node 22. This means a Node 20 regression in container orchestration code would not be caught.

13. No license compliance check
No automated license scanning (e.g., license-checker) verifies that new dependencies comply with the project's MIT license. A copyleft dependency introduced via a PR would not be detected.

14. Coverage upload to external service missing
Coverage reports are generated and uploaded as GitHub Actions artifacts, but not uploaded to Codecov, Coveralls, or similar. This means there's no coverage badge, no historical trend tracking, and no PR-level coverage diff visible in external tooling.

15. No Dockerfile linting
containers/agent/Dockerfile and containers/squid/Dockerfile are not linted with hadolint. Best-practice violations (e.g., apt-get without --no-install-recommends, ADD instead of COPY) go undetected.

📋 Actionable Recommendations

Gap	Recommended Solution	Complexity	Impact
Integration tests failing	Investigate root cause; add pre-run cleanup to `test-integration-suite.yml` matching pattern in `test-chroot.yml`	Low	🔴 Critical
Shell script linting	Add `shellcheck` step to `build.yml` or a new `lint-scripts.yml` workflow	Low	🔴 High
Coverage thresholds too low	Raise thresholds incrementally: `lines: 50`, `functions: 50`, `branches: 40` as a first step; require `cli.ts` and `docker-manager.ts` coverage > 50%	Medium	🔴 High
Container scan on code changes	Remove `paths:` filter from `container-scan.yml` or add a scheduled weekly scan without path filtering	Low	🟡 Medium
Secret scanning on PRs	Add `gitleaks/gitleaks-action` to a PR-gated workflow	Low	🟡 Medium
Smoke tests not mandatory	Add a non-reaction smoke test that runs on `src/` and `containers/` path changes	Medium	🟡 Medium
Filename/name mismatch	Rename `test-integration.yml` to `type-check.yml` to match its `name: TypeScript Type Check`	Low	🟡 Medium
No Dockerfile linting	Add `hadolint` to `build.yml` or a new step scanning `containers/*/Dockerfile`	Low	🟢 Low
No docs build check	Add a PR-triggered `docs-site` build step (without deploy)	Low	🟢 Low
No license compliance	Add `npx license-checker --failOn GPL` to `dependency-audit.yml`	Low	🟢 Low

📈 Metrics Summary

Metric	Value
Total workflows	56
Workflows triggered on PRs	~18
Agentic workflows (AI-driven)	28
Recent workflow success rate (non-agentic)	~94% (1 of 18 failing)
Unit test coverage (lines)	38.31% (threshold: 38%)
Unit test coverage (branches)	31.78% (threshold: 30%)
Integration test groups	8 (4 in integration suite, 4 in chroot suite)
Integration test files	27 test files
Security scan tools	4 (CodeQL, Trivy, npm audit, AI Security Guard)

Critical stat: cli.ts at 0% unit coverage and docker-manager.ts at 18% unit coverage — these two files implement all container lifecycle, signal handling, and cleanup logic.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 10, 2026, 10:24 PM UTC

2026-03-04T01:02:02Z

github-actions[bot]
bot Mar 4, 2026
Author

🔮 The ancient spirits stir; the smoke test agent was here, and the omens are noted.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1140

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1140

Uh oh!

github-actions[bot] bot Mar 3, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 4, 2026 Author

github-actions[bot]
bot Mar 3, 2026

github-actions[bot]
bot Mar 4, 2026
Author