Skip to content

CI/CD Pipelines

All automation for this repository lives in .github/workflows. There are ten workflow files: four that deploy, one PR gate, one teardown job, one DNS-cleanup job (currently disabled), one no-op deployment-freeze placeholder, one docs publisher, and two Claude-bot integrations. The deployment workflows are thin GitHub Actions wrappers around the PowerShell scripts in .scripts/ — the heavy lifting (azd, slot swaps, DB-migration handshake, DNS/TLS) happens there, not in YAML.

What CI/CD does here in one sentence

Every push to main bumps the version and rolls the change through both Azure paths — an ephemeral Aspire / Container Apps environment and the App Service dev slots — while a GitHub release drives a gated staging → production slot swap; pull requests are gated by a build/test/format job.

For the deployment architecture these pipelines drive (the "two parallel Azure paths", the Aspire app graph, the App Service slots), read the Deployment Overview first. This page is the per-workflow reference: triggers, jobs, the secrets/inputs each one consumes, and how they relate.

Workflow catalogue

File Name Trigger Purpose
azure-dev.yml Deploy Aspire Environment push main, manual azd up → ephemeral Container Apps env + DNS/TLS
azure-dev-app-services.yml Deploy Dev App Services push main, manual version bump + publish + App Service dev slots + swap
release-deploy-app-services.yml Deploy Release App Services GitHub release published release version + stage Aspire + prod-staging + approval + prod swap
pr-validation.yml PR Validation PR → main/develop .NET build/test/format + frontend build/format gate
teardown.yml Teardown Environments daily cron 0 2 * * *, manual delete aged rg-ABP-* resource groups by tag
cleanup-dns.yml.inactive Daily DNS Cleanup (disabled — .inactive suffix) would prune dangling CNAMEs daily
deployment-freeze.yml Deployment Freeze manual placeholder no-op (does nothing yet)
docs.yml Deploy Documentation push main (docs/**), manual mkdocs build --strict → GitHub Pages
claude.yml Claude Code @claude mentions on issues/PRs interactive Claude bot
claude-code-review.yml Claude Code Review PR opened/synchronize automated /code-review on PRs

How they relate

flowchart TD
    PR["Pull request → main / develop"] --> PRV["pr-validation.yml<br/>build · test · csharpier · prettier"]
    PR --> CR["claude-code-review.yml<br/>/code-review"]

    Push["Push to main"] --> AD["azure-dev.yml<br/>azd up → rg-ABP-main (ACA)"]
    Push --> ADAS["azure-dev-app-services.yml"]
    Push -. "docs/** only" .-> Docs["docs.yml → GitHub Pages"]

    subgraph ADAS_jobs["azure-dev-app-services.yml"]
        direction TB
        U["update: UpdateVersion.ps1 -incVersion build"] --> D["deploy: PublishApp → HubTest → HubProd → swap → HubTest"]
    end

    Rel["GitHub release published"] --> RUP["update: UpdateVersion.ps1 -version tag"]
    RUP --> RA["deployAspire → stage.cargonerds.dev"]
    RUP --> RAS["deployAppService → Prod-Staging slot"]
    RAS --> AP["approve: environment 'production' gate"]
    AP --> SW["swap: SwapAppServiceSlots → production"]

    Cron["Daily 02:00 UTC"] --> TD["teardown.yml<br/>delete aged rg-ABP-*"]

    Comment["@claude on issue/PR"] --> CL["claude.yml"]

Shared conventions

A few things are wired identically across the deploying workflows; they are described once here and not repeated per workflow.

OIDC Azure login

Every Azure-touching job authenticates via federated OIDC — no stored Azure password is used for the az CLI. This requires the job-level permission id-token: write and the azure/login@v2 action:

OIDC login block (repeated in every deploy/teardown job)
permissions:
  contents: read
  id-token: write

# …
      - name: Azure Login (OIDC)
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

The Aspire workflows additionally run azd auth login with secrets.AZURE_CLIENT_SECRET because azd (unlike the az CLI) does not consume the OIDC token from azure/login.

.NET 10 + ABP CLI toolchain

The deploy jobs set up .NET 10 and the ABP CLI (Volo.Abp.Cli) before building. abp install-libs restores the client-side libraries ABP needs at build time; the AppHost itself uses ABP's .NET Aspire integration:

Toolchain setup (Aspire / App Service deploy jobs)
      - name: Setup .NET
        uses: actions/setup-dotnet@v5
        with:
          dotnet-version: |
            10.x.x
      - name: Install ABP CLI
        run: |
          dotnet tool install -g Volo.Abp.Cli
          abp install-libs
        shell: bash

azure-dev.yml and azure-dev-app-services.yml also run dotnet workload restore (Aspire workloads). PR validation uses a slimmer setup (dotnet-version: '10.0.x', actions/setup-dotnet@v4) and does not install the ABP CLI.

GitHub environments and secrets

Almost every deploy/teardown job declares environment: name: stage, and the production swap declares environment: name: production. These are GitHub deployment environments, not the Azure/Aspire "Spark environment" — environment-scoped secrets and required reviewers attach to them. The comments in the release workflow call this out explicitly:

    environment:
      name: stage #env name for github env not the azd env
Secret Used by What for
AZURE_CLIENT_ID / AZURE_TENANT_ID / AZURE_SUBSCRIPTION_ID all deploy/teardown jobs OIDC azure/login
AZURE_CLIENT_SECRET Aspire jobs only azd auth login
GIT_ADMIN_TOKEN update jobs (version bump) checkout with write creds so the bump commit can be pushed
GITHUB_TOKEN docs.yml, claude*.yml release/PR API reads (docs), bot identity
CLAUDE_CODE_OAUTH_TOKEN claude.yml, claude-code-review.yml authenticate the Claude bot

Where the values come from

These are GitHub repository/environment secrets, not Azure Key Vault entries and not appsettings keys. Runtime application configuration (connection strings, OpenIddict, etc.) is a separate concern — see Configuration Reference and appsettings.

Deploy workflows

azure-dev.yml — Deploy Aspire Environment

Deploys the Aspire app graph to an ephemeral per-branch Azure Container Apps environment. See Azure Container Apps for the target details.

Triggers

on:
  push:
    branches: [ main ]
  workflow_dispatch:
    inputs:
      environment_name:   # optional override (defaults to branch name)
      spark_environment:  # choice: Default | Test (default Default)

Jobs (run in sequence via needs)

  1. resolve-environment (ubuntu) — derives the env name from the branch (or the manual environment_name override) by lowercasing and replacing every non-alphanumeric run with a single hyphen. It also computes two flags later consumed by teardown tagging:
    • PROTECTED_BRANCHES="main|production|staging" — a protected branch name cannot be supplied as a manual override (the step exit 1s), and protected branches are never auto-torn-down.
    • Outputs env_name, is_protected_branch, is_manual_deployment.
  2. deploy (ubuntu, env stage) — installs azd + .NET 10 + ABP CLI, OIDC login, azd auth login, then runs the wrapper script:
          - name: Deploy Aspire
            working-directory: ./.scripts
            run: >
              ./DeployToAzureContainerApps.ps1
              -EnvironmentName '${{ needs.resolve-environment.outputs.env_name }}'
              -SkipEnvSetup $true
              -SparkEnvironment '${{ inputs.spark_environment || 'Default' }}'
    
  3. configure (ubuntu, env stage) — runs SetupDnsAndCertificates.ps1 against rg-ABP-<env>, then tags the resource group with deployment metadata. autoTeardownEnabled is set to true only when the deploy is a manual run of a non-protected branch:
    AUTO_TEARDOWN="false"
    if [ "…is_manual_deployment" == "true" ] && [ "…is_protected_branch" == "false" ]; then
      AUTO_TEARDOWN="true"
    fi
    az group update --name "rg-ABP-<env>" --tags \
      deployedAt= branch= isProtectedBranch= isManualDeployment= \
      autoTeardownEnabled="$AUTO_TEARDOWN" commitSha= workflowRunId=

These RG tags are the contract read by teardown.yml (see below) to decide what may be deleted.

Resource-group naming

The script prefixes the azd env with ABP-, so EnvironmentName main becomes azd env ABP-main, resource group rg-ABP-main, and domain main.cargonerds.dev. Region for new envs is germanywestcentral.

AZD_UP_CONCURRENCY=1 is mandatory

DeployToAzureContainerApps.ps1 sets $env:AZD_UP_CONCURRENCY = "1". Without it, parallel dotnet publish runs collide — a regression since azure-dev-cli 1.25.0. The script comment documents this; do not remove it.

azure-dev-app-services.yml — Deploy Dev App Services

The primary dev App Service pipeline. See Azure App Service for the slot model.

Triggers: push to main, or manual workflow_dispatch.

Jobs

  1. update (windows, env stage, permissions: contents: write) — bumps the 4th version part of common.props and pushes the bump commit, then verifies the push landed on origin:

          - name: Update version
            run: powershell -ExecutionPolicy Bypass -File '.scripts/UpdateVersion.ps1' -incVersion build -commit
    
    Checkout uses token: ${{ secrets.GIT_ADMIN_TOKEN }} with persist-credentials: true so the script can push. The job exposes the pushed SHA as output ref (a retry loop polls git ls-remote origin until the tip matches before continuing).

    Self-bump rerun guard

    The bump push to main would otherwise re-trigger this same workflow. The update job is guarded so it skips its own commit:

    if: >-
      github.event_name != 'push' ||
      !startsWith(github.event.head_commit.message, 'chore: bump version to ')
    
    UpdateVersion.ps1 always commits with the message chore: bump version to <v> (common.props), which is exactly what this prefix match suppresses.

  2. deploy (ubuntu, env stage, needs: update) — checks out the exact needs.update.outputs.ref, sets up the toolchain, OIDC login, then runs four script steps in order:

      - name: Publish Code
        run: ./PublishApp.ps1
      - name: Deploy To Dev-HubTest
        run: ./DeployToAppServices.ps1 -AzureEnvironment Dev-HubTest
      - name: Deploy To Dev-HubProd
        run: ./DeployToAppServices.ps1 -AzureEnvironment Dev-HubProd
      - name: Swap Environments for HubTest to production slot
        run: ./SwapAppServiceSlots.ps1 -AzureEnvironment Dev-HubTest
      - name: Deploy To Dev-HubTest after swap with production slot
        run: ./DeployToAppServices.ps1 -AzureEnvironment Dev-HubTest -SkipDbMigration

So the flow is: publish all service zips → deploy to HubTest (with DB migration) → deploy to HubProd (with DB migration) → swap HubTest into the production slot → redeploy HubTest.

!!! warning "Why HubTest is deployed twice"
    The slot swap moves HubTest's content into the live production slot, leaving the test slot holding the old production bits. The final step refills the test slot with the new build, this time `-SkipDbMigration` (the migration already ran on the first HubTest deploy, so re-running it is unnecessary).

release-deploy-app-services.yml — Deploy Release App Services

The production release pipeline. Triggered when a GitHub release is published (on: release: types: [published]).

Jobs

Job Env needs Action
update stage UpdateVersion.ps1 -version ${{ github.event.release.tag_name }} -commit — sets the exact release version
deployAspire stage update DeployToAzureContainerApps.ps1 -EnvironmentName 'stage' -SkipEnvSetup $true -SparkEnvironment 'Test'stage.cargonerds.dev
deployAppService stage update PublishApp.ps1 then DeployToAppServices.ps1 -AzureEnvironment Prod-Staging (the staging slot of prod)
approve production deployAppService manual gate — the job body just echos; the production environment's required reviewers block it
swap stage approve SwapAppServiceSlots.ps1 -AzureEnvironment Prod-Staging (staging → production)

deployAspire and deployAppService both check out ${{ github.event.release.target_commitish }} (the commit the release points at). The release flow is therefore: set version → deploy to stage Aspire + prod-staging slot in parallel → wait for human approval → swap staging into production.

Rollback

SwapAppServiceSlots.ps1 performs a warm-up swap --action preview, health-polls each service, then swap --action swap. If a preview gets stuck the script prints the swap --action reset commands to cancel it — that is the documented rollback path. See Azure App Service.

flowchart LR
    R(["release published"]) --> U["update<br/>set version = tag"]
    U --> DA["deployAspire<br/>stage.cargonerds.dev"]
    U --> DS["deployAppService<br/>Prod-Staging slot"]
    DS --> AP{{"approve<br/>environment: production"}}
    AP -->|reviewer approves| SW["swap<br/>staging → production"]

PR validation

pr-validation.yml — PR Validation

The merge gate. Runs on pull_request targeting main or develop. Two independent jobs run in parallel:

  • build-dotnet (ubuntu) — dotnet restore Cargonerds.All.slndotnet build … -c Releasedotnet tool restoredotnet csharpier check . (formatting) → a "build produced no files" worktree check → dotnet test Cargonerds.All.sln -c Release --no-build.
  • build-frontend (ubuntu, Node 22) — npm ci in frontend/npm run build in frontend/realtimenpx prettier --check ..

Aggregate-then-fail pattern

Every step uses continue-on-error: true and records its outcome; a final if: always() step collects the failures and exits non-zero with a list. This means one PR run reports all problems at once (build and format and tests) instead of stopping at the first failure. The --ignore-exit-code 8 on dotnet test tolerates the "no tests found" exit code so empty test projects do not fail the gate.

The CSharpier check mirrors local formatting expectations; see AGENTS.md and the Development Workflow page.

Operational workflows

teardown.yml — Teardown Environments

Deletes aged ephemeral environments. Trigger: daily cron 0 2 * * * or manual workflow_dispatch.

Inputs (manual mode): mode (scheduled | manual), environment_name, max_age_days (default 7), dry_run, and confirm_deletion (must equal DELETE for a real manual delete).

The single teardown job (env stage) is entirely inline PowerShell:

  • Scheduled mode lists every rg-ABP-* resource group and reads the tags written by azure-dev.yml:
    • skips the RG unless autoTeardownEnabled == 'true' (protected branches, automatic deployments, and untagged RGs are all kept),
    • then deletes only when age >= max_age_days (default 7 days, computed from the deployedAt tag).
  • Manual mode targets rg-ABP-<environment_name> and refuses unless dry_run is on or confirm_deletion == 'DELETE'.
  • Before deleting, it scans the RG for Key Vaults and, after az group delete, purges the soft-deleted vaults (az keyvault purge) so their names are free for the next deploy.

Missing custom teardown script

The job probes for ./.scripts/Teardown.ps1 and would prefer it, but that file does not exist in the repo — so the inline az group delete fallback always runs. (The local-developer equivalent is .scripts/TeardownAzureContainerApps.ps1, which is not what this workflow invokes.)

cleanup-dns.yml.inactive — Daily DNS Cleanup (disabled)

A daily (0 2 * * *) job that would run CleanupDnsEntries.ps1 against the cargonerds.dev zone in rg-cargonerds-applications-core-infrastructure to prune dangling CNAMEs.

This workflow does not run

The filename ends in .yml.inactive, so GitHub Actions ignores it (only *.yml / *.yaml are loaded). The DNS-cleanup logic only executes ad hoc or as part of a teardown. To re-enable it, rename the file to cleanup-dns.yml.

deployment-freeze.yml — Deployment Freeze (placeholder)

Manual-only (workflow_dispatch). The single job (named Placehlder — sic) just echoes a message and performs no action:

      - name: "Freeze Deployments"
        run: |
          echo "This workflow is a placeholder to prevent deployments during freeze periods. It does not perform any actions."

Note

Despite holding permissions: actions: write, it does not currently disable or cancel any other workflow. Treat it as a stub.

docs.yml — Deploy Documentation

Builds and publishes this documentation site to GitHub Pages.

Triggers: push to main touching docs/** or .github/workflows/docs.yml, plus manual dispatch. Concurrency group pages (no cancel-in-progress).

Jobs

  1. build (working-directory: docs) — Python 3.12 + pip install -r requirements-docs.txt, then:
          - name: Build documentation
            env:
              GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
            run: mkdocs build --strict
    
    --strict makes any warning (e.g. a broken internal link) fail the build. GITHUB_TOKEN lets gen_releases.py read the repo's releases and merged PRs via the authenticated API (avoiding the anonymous rate limit) when generating the Releases page. The built docs/site is uploaded via actions/upload-pages-artifact.
  2. deploy (needs: build, env github-pages) — actions/deploy-pages@v4 publishes the artifact.

--strict will fail on bad links

Because the build is strict, a relative cross-link to a page that does not exist (or an over-deep ../../../ path that escapes the docs root) breaks the whole publish. Keep cross-links relative to the docs/docs/ tree, and link repo-root files such as common.props or AGENTS.md to their https://github.com/Cargonerds/CargonerdsApp/blob/main/<file> URL rather than with a relative path.

Claude bot workflows

These two are developer-assist integrations via anthropics/claude-code-action@v1 and are not part of any deployment.

  • claude.ymlClaude Code. Fires on issue comments, PR review comments, PR reviews, and issue open/assign, but the job's if only runs when the body/title contains @claude. It authenticates with secrets.CLAUDE_CODE_OAUTH_TOKEN and is granted actions: read so it can read CI results on PRs.
  • claude-code-review.ymlClaude Code Review. Fires on every PR opened / synchronize / ready_for_review / reopened and runs the code-review plugin automatically:
            plugin_marketplaces: 'https://github.com/anthropics/claude-code.git'
            plugins: 'code-review@claude-code-plugins'
            prompt: '/code-review:code-review ${{ github.repository }}/pull/${{ github.event.pull_request.number }}'
    

Gotchas summary

Things to know before editing these workflows

  • azure-dev-app-services.yml self-triggers. Its own version-bump push to main re-runs the workflow; the startsWith(head_commit.message, 'chore: bump version to ') guard is what prevents an infinite loop. Keep UpdateVersion.ps1's commit message and that guard in sync.
  • HubTest is deployed twice in the dev pipeline — the second deploy is intentional (refills the test slot after the swap) and uses -SkipDbMigration.
  • teardown.yml reads RG tags written by azure-dev.yml. The autoTeardownEnabled/deployedAt/branch tags are a cross-workflow contract; changing the tag names in one place breaks teardown decisions.
  • cleanup-dns.yml.inactive is disabled by filename. Renaming back to .yml re-activates a daily DNS-cleanup run.
  • deployment-freeze.yml is a no-op. It blocks nothing today.
  • teardown.yml references ./.scripts/Teardown.ps1, which does not exist — the inline az group delete path always runs.
  • docs.yml builds with --strict — broken links fail the publish, not just warn.
  • azd needs AZURE_CLIENT_SECRET even though az uses OIDC; the Aspire jobs run azd auth login separately.