本文是服务器运维笔记系列的第 13 篇,聚焦 GitLab CI/CD 的实际应用。从基础概念到高级配置,从单机部署到 Kubernetes 集成,提供一套可以直接复用的实战指南。


一、GitLab CI 基础

1.1 架构概览

GitLab CI/CD 的核心架构由三个角色组成:

  • GitLab Server:托管代码仓库,解析 .gitlab-ci.yml,调度流水线任务

  • GitLab Runner:执行流水线中具体 Job 的代理程序,可部署在任意机器上

  • Executor:Runner 内部的执行环境(Docker、Shell、Kubernetes 等)

工作流程:开发者推送代码 → GitLab 解析 CI 配置 → 按 Stages 调度 Jobs → Runner 拉取代码并执行 → 结果回写 GitLab UI。

1.2 Runner 类型

类型

说明

适用场景

Shared Runner

全局共享,所有项目可用

小团队、公共资源池

Group Runner

组级别共享,组内项目可用

部门/团队级别隔离

Specific Runner

绑定到单个项目

敏感项目、特殊环境需求

1.3 .gitlab-ci.yml 核心语法

# 定义流水线阶段(按顺序执行)
stages:
  - build
  - test
  - deploy

# 定义一个 Job
build-app:
  stage: build
  image: node:18-alpine
  script:
    - npm ci
    - npm run build
  artifacts:
    paths:
      - dist/

核心概念

  • stages:定义阶段顺序,同阶段 Jobs 并行,不同阶段串行

  • jobs:定义在顶层的 key(非保留字),每个 Job 必须属于一个 stage

  • script:Job 中实际执行的命令列表(必填)

  • image:指定 Docker 镜像(Docker Executor 下生效)

  • tags:匹配特定 Runner 的标签


二、Runner 配置

2.1 安装与注册

安装 GitLab Runner(以 Ubuntu/Debian 为例):

# 添加官方仓库
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash

# 安装
sudo apt-get install gitlab-runner

# 验证版本
gitlab-runner --version

注册 Runner:

sudo gitlab-runner register \
  --non-interactive \
  --url "https://gitlab.example.com/" \
  --registration-token "PROJECT_TOKEN_HERE" \
  --executor "docker" \
  --docker-image "alpine:latest" \
  --description "my-docker-runner" \
  --tag-list "docker,linux" \
  --run-untagged="true" \
  --locked="false"

注册完成后,配置文件位于 /etc/gitlab-runner/config.toml

2.2 Docker Executor 配置

Docker Executor 是最常用的执行器,每次 Job 在独立容器中运行,天然隔离。

# /etc/gitlab-runner/config.toml
concurrent = 4          # 最大并发 Job 数
check_interval = 3      # 轮询间隔(秒)

[[runners]]
  name = "docker-runner"
  url = "https://gitlab.example.com/"
  token = "RUNNER_TOKEN"
  executor = "docker"

  [runners.docker]
    image = "alpine:latest"           # 默认镜像
    privileged = false                # 是否特权模式(Docker-in-Docker 需要)
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]              # 挂载缓存卷
    shm_size = 0                      # 共享内存大小
    pull_policy = "if-not-present"    # 拉取策略:always/if-not-present/never
    allowed_images = ["ruby:*", "python:*", "node:*"]  # 镜像白名单

Docker-in-Docker(DinD)方案:需要在 CI 配置中构建 Docker 镜像时使用。

build-docker-image:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  variables:
    DOCKER_TLS_CERTDIR: "/certs"
  script:
    - docker build -t my-app:$CI_COMMIT_SHA .
    - docker push registry.example.com/my-app:$CI_COMMIT_SHA

2.3 Shell Executor 配置

Shell Executor 直接在宿主机上执行命令,性能最好但隔离性最差。

[[runners]]
  name = "shell-runner"
  url = "https://gitlab.example.com/"
  token = "RUNNER_TOKEN"
  executor = "shell"

  [runners.custom_build_dir]
    enabled = true    # 允许自定义构建目录

⚠️ Shell Executor 下所有 Job 共享同一环境,注意依赖冲突。建议仅用于特定场景(如需要访问宿主机硬件、GPU 等)。

2.4 自动伸缩配置(Docker Machine / Autoscaler)

大规模场景下可配置 Runner 自动伸缩,按需创建和销毁 Runner 实例。

[[runners]]
  name = "autoscale-runner"
  url = "https://gitlab.example.com/"
  token = "RUNNER_TOKEN"
  executor = "docker+machine"
  limit = 10                    # 最大实例数

  [runners.machine]
    IdleCount = 2               # 空闲实例数
    IdleTime = 600              # 空闲超时(秒)
    MaxBuilds = 100             # 单实例最大构建次数
    MachineName = "runner-%s"
    MachineDriver = "amazonec2"
    MachineOptions = [
      "amazonec2-instance-type=t3.medium",
      "amazonec2-region=cn-north-1",
      "amazonec2-vpc-id=vpc-xxx",
      "amazonec2-subnet-id=subnet-xxx",
      "amazonec2-security-group=sg-xxx",
    ]

    [[runners.machine.autoscaling]]
      Periods = ["* * 8-18 * * mon-fri *"]   # 工作时间
      IdleCount = 5
      IdleTime = 300
      Timezone = "Asia/Shanghai"

    [[runners.machine.autoscaling]]
      Periods = ["* * * * * sat,sun *"]       # 周末
      IdleCount = 0
      IdleTime = 60

三、流水线设计

3.1 Stages 与 Jobs

stages:
  - lint
  - build
  - test
  - security
  - deploy

# 同一 stage 的 jobs 会并行执行
lint-js:
  stage: lint
  script:
    - npx eslint . --max-warnings=0

lint-python:
  stage: lint
  script:
    - flake8 src/
    - mypy src/

3.2 Dependencies 与制品传递

使用 dependencies 控制 Job 之间的制品传递,避免不必要的下载。

build-frontend:
  stage: build
  script:
    - npm ci && npm run build
  artifacts:
    paths:
      - dist/frontend/

build-backend:
  stage: build
  script:
    - go build -o bin/server ./cmd/server
  artifacts:
    paths:
      - bin/

deploy-staging:
  stage: deploy
  dependencies:
    - build-frontend     # 只下载 frontend 的制品
    - build-backend      # 只下载 backend 的制品
  script:
    - scp dist/frontend/* staging:/var/www/html/
    - scp bin/server staging:/opt/app/

3.3 Artifacts 配置

test-unit:
  stage: test
  script:
    - go test ./... -coverprofile=coverage.out -v 2>&1 | tee test-output.txt
  artifacts:
    when: always              # always/on_success/on_failure
    paths:
      - coverage.out
      - test-output.txt
    reports:
      junit: test-output.txt  # 解析测试报告在 MR 中展示
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
    expire_in: 7 days         # 制品过期时间

3.4 Cache 配置

variables:
  npm_config_cache: "$CI_PROJECT_DIR/.npm"

cache:
  key:
    files:
      - package-lock.json     # 基于文件内容生成缓存 key
  paths:
    - .npm/
    - node_modules/

build:
  stage: build
  script:
    - npm ci --cache .npm
    - npm run build

四、变量管理

4.1 预定义变量

GitLab 提供了大量内置变量,常用的包括:

# 在 script 中直接使用
deploy-production:
  stage: deploy
  script:
    - echo "项目: $CI_PROJECT_NAME"
    - echo "分支: $CI_COMMIT_REF_NAME"
    - echo "提交: $CI_COMMIT_SHA"
    - echo "短 SHA: $CI_COMMIT_SHORT_SHA"
    - echo "标签: $CI_COMMIT_TAG"
    - echo "MR 源分支: $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME"
    - echo "Pipeline ID: $CI_PIPELINE_ID"
    - echo "Job ID: $CI_JOB_ID"
    - echo "Runner 描述: $CI_RUNNER_DESCRIPTION"
    - echo "Registry: $CI_REGISTRY"
    - echo "Registry 镜像: $CI_REGISTRY_IMAGE"

完整的预定义变量列表可参考:GitLab 官方文档 → CI/CD → Predefined variables。

4.2 自定义变量

变量可以在三个层级定义,优先级:Job 级 > Pipeline 级 > 项目/组 Settings

# Pipeline 级变量
variables:
  APP_NAME: "my-awesome-app"
  DEPLOY_ENV: "staging"

build:
  stage: build
  variables:
    BUILD_TYPE: "release"       # Job 级变量
  script:
    - echo "Building $APP_NAME ($BUILD_TYPE)"
    - make BUILD_TYPE=$BUILD_TYPE

4.3 Protected 与 Masked 变量

Settings → CI/CD → Variables 中配置:

  • Protected:仅在受保护分支/标签的流水线中可见

  • Masked:在 Job 日志中自动隐藏(显示 [MASKED]),值必须是 Base64 编码或满足正则规则

# 使用 Protected + Masked 变量(在 Settings 中配置,不在 YAML 中暴露)
deploy-production:
  stage: deploy
  script:
    - echo "$PROD_DB_PASSWORD" | docker secret create db_password -  # 日志中显示 [MASKED]
  only:
    - main                          # 仅 main 分支触发
  environment:
    name: production

4.4 Secrets 管理(Vault 集成)

对于更高安全要求,可集成 HashiCorp Vault:

# 需要在项目 Settings 中配置 Vault 集成
deploy-with-vault:
  stage: deploy
  id_tokens:
    VAULT_ID_TOKEN:
      aud: https://vault.example.com
  secrets:
    DB_PASSWORD:
      vault:
        engine: { name: kv-v2, path: secret }
        path: production/db
        field: password
  script:
    - echo "DB password loaded from Vault"   # 不要打印实际值
    - deploy --db-password "$DB_PASSWORD"

五、常用场景

5.1 构建→测试→部署(完整流水线)

stages:
  - build
  - test
  - deploy

variables:
  DOCKER_REGISTRY: "registry.example.com"

# ============ Build ============
build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  script:
    - docker build -t $DOCKER_REGISTRY/$CI_PROJECT_NAME:$CI_COMMIT_SHA .
    - docker push $DOCKER_REGISTRY/$CI_PROJECT_NAME:$CI_COMMIT_SHA

# ============ Test ============
test-unit:
  stage: test
  image: $DOCKER_REGISTRY/$CI_PROJECT_NAME:$CI_COMMIT_SHA
  script:
    - npm test -- --coverage
  coverage: '/All files\s*\|\s*([\d.]+)/'
  artifacts:
    reports:
      junit: junit.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml

test-integration:
  stage: test
  image: $DOCKER_REGISTRY/$CI_PROJECT_NAME:$CI_COMMIT_SHA
  services:
    - postgres:15
    - redis:7
  variables:
    POSTGRES_DB: test_db
    POSTGRES_USER: test
    POSTGRES_PASSWORD: test
    DATABASE_URL: "postgresql://test:test@postgres:5432/test_db"
    REDIS_URL: "redis://redis:6379"
  script:
    - npm run test:integration

# ============ Deploy ============
deploy-staging:
  stage: deploy
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - kubectl set image deployment/$CI_PROJECT_NAME
        app=$DOCKER_REGISTRY/$CI_PROJECT_NAME:$CI_COMMIT_SHA
  only:
    - develop
  when: manual                  # 手动触发

5.2 多环境部署

.deploy_template: &deploy_template
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context $KUBE_CONTEXT
    - envsubst < k8s/deployment.yml | kubectl apply -f -
    - kubectl rollout status deployment/$CI_PROJECT_NAME -n $K8S_NAMESPACE

deploy-staging:
  <<: *deploy_template
  variables:
    KUBE_CONTEXT: staging
    K8S_NAMESPACE: staging
    REPLICAS: "2"
  environment:
    name: staging
    url: https://staging.example.com
  only:
    - develop

deploy-production:
  <<: *deploy_template
  variables:
    KUBE_CONTEXT: production
    K8S_NAMESPACE: production
    REPLICAS: "5"
  environment:
    name: production
    url: https://www.example.com
  only:
    - main
  when: manual
  allow_failure: false          # 不允许跳过

5.3 并行任务

test-suite:
  stage: test
  parallel: 4                   # 启动 4 个并行 Job
  script:
    - |
      case $CI_NODE_INDEX in
        1) TEST_PATTERN="tests/unit/models/**" ;;
        2) TEST_PATTERN="tests/unit/services/**" ;;
        3) TEST_PATTERN="tests/unit/controllers/**" ;;
        4) TEST_PATTERN="tests/unit/middleware/**" ;;
      esac
    - pytest $TEST_PATTERN --junitxml=results-$CI_NODE_INDEX.xml
  artifacts:
    reports:
      junit:
        - results-*.xml

5.4 矩阵构建

build-matrix:
  stage: build
  parallel:
    matrix:
      - GO_VERSION: ["1.21", "1.22"]
        OS: ["linux", "darwin"]
        ARCH: ["amd64", "arm64"]
  image: golang:$GO_VERSION
  script:
    - GOOS=$OS GOARCH=$ARCH go build -o bin/app-$OS-$ARCH ./cmd/app
  artifacts:
    paths:
      - bin/

六、制品管理

6.1 Artifacts 详细配置

build:
  stage: build
  script:
    - make build
    - make docs
  artifacts:
    name: "$CI_PROJECT_NAME-$CI_COMMIT_REF_SLUG-$CI_PIPELINE_ID"
    paths:
      - bin/
      - docs/
      - config/production.yml
    exclude:
      - "**/*.test"             # 排除测试文件
      - "**/*.log"
    when: on_success            # on_success(默认)/ on_failure / always
    expire_in: 30 days
    access: all                 # all / developer / none

6.2 过期策略

# 快速过期的中间产物
test-results:
  artifacts:
    expire_in: 3 days

# 永久保留的发布制品
release-package:
  artifacts:
    expire_in: never            # 永不过期
  only:
    - tags                      # 仅标签触发

6.3 依赖传递控制

build-frontend:
  stage: build
  script: npm run build
  artifacts:
    paths: [dist/]

build-backend:
  stage: build
  script: go build ./...
  artifacts:
    paths: [bin/]

# 默认行为:每个 Job 会下载前面所有阶段的制品
# 显式控制:
deploy:
  stage: deploy
  dependencies:                 # 只下载指定 Job 的制品
    - build-frontend
  script:
    - ls dist/                  # ✅ 有 frontend
    - ls bin/ 2>/dev/null       # ❌ 没有 backend

# 不下载任何制品
lint:
  stage: test
  dependencies: []              # 空数组 = 不下载
  script: eslint .

七、部署策略

7.1 SSH 部署

deploy-ssh:
  stage: deploy
  before_script:
    - apt-get update && apt-get install -y openssh-client
    - eval $(ssh-agent -s)
    - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
    - mkdir -p ~/.ssh && chmod 700 ~/.ssh
    - ssh-keyscan -H $DEPLOY_HOST >> ~/.ssh/known_hosts
  script:
    - scp -r dist/* deploy@$DEPLOY_HOST:/opt/app/
    - ssh deploy@$DEPLOY_HOST "cd /opt/app && docker compose pull && docker compose up -d"
  environment:
    name: production
    url: https://www.example.com

7.2 Docker 部署(带 Registry)

build-and-push:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  before_script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
  script:
    - |
      docker build \
        --cache-from $CI_REGISTRY_IMAGE:latest \
        --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA \
        --tag $CI_REGISTRY_IMAGE:latest \
        .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - docker push $CI_REGISTRY_IMAGE:latest

deploy-docker:
  stage: deploy
  before_script:
    - eval $(ssh-agent -s)
    - echo "$SSH_PRIVATE_KEY" | ssh-add -
  script:
    - |
      ssh deploy@$DEPLOY_HOST << 'EOF'
        docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
        docker pull $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
        docker stop app || true
        docker rm app || true
        docker run -d --name app \
          --restart unless-stopped \
          -p 8080:8080 \
          -e DATABASE_URL="$DATABASE_URL" \
          $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
      EOF

7.3 Kubernetes 部署

deploy-k8s:
  stage: deploy
  image:
    name: bitnami/kubectl:latest
    entrypoint: [""]
  before_script:
    - echo "$KUBE_CONFIG" | base64 -d > /tmp/kubeconfig
    - export KUBECONFIG=/tmp/kubeconfig
  script:
    # 替换镜像标签
    - kubectl set image deployment/$CI_PROJECT_NAME
        app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
        -n $K8S_NAMESPACE
    - kubectl rollout status deployment/$CI_PROJECT_NAME
        -n $K8S_NAMESPACE
        --timeout=300s
  after_script:
    - rm -f /tmp/kubeconfig
  environment:
    name: production
    url: https://www.example.com
    kubernetes:
      namespace: production

使用 Helm 部署:

deploy-helm:
  stage: deploy
  image:
    name: alpine/helm:3.14
    entrypoint: [""]
  before_script:
    - echo "$KUBE_CONFIG" | base64 -d > /tmp/kubeconfig
    - export KUBECONFIG=/tmp/kubeconfig
    - helm repo add myrepo https://charts.example.com
    - helm repo update
  script:
    - |
      helm upgrade --install $CI_PROJECT_NAME myrepo/app-chart \
        --namespace $K8S_NAMESPACE \
        --create-namespace \
        --set image.repository=$CI_REGISTRY_IMAGE \
        --set image.tag=$CI_COMMIT_SHA \
        --set ingress.host=$DEPLOY_HOST \
        --values values-production.yaml \
        --wait --timeout 5m

7.4 Auto DevOps

GitLab 内置的 Auto DevOps 可以零配置完成从构建到部署的全流程:

# 项目根目录创建 .gitlab-ci.yml 仅需一行
include:
  - template: Auto-DevOps.gitlab-ci.yml

# 或者自定义覆盖
variables:
  AUTO_DEVOPS_DEPLOY_TARGET: kubernetes
  KUBE_NAMESPACE: my-namespace
  HELM_UPGRADE_EXTRA_ARGS: "--set replicas=3"

include:
  - template: Auto-DevOps.gitlab-ci.yml

八、缓存优化

8.1 全局缓存

# 全局缓存配置,所有 Job 继承
cache:
  key: "${CI_COMMIT_REF_SLUG}"    # 按分支隔离缓存
  paths:
    - node_modules/
    - .npm/
    - .cache/
  policy: pull-push                # pull / push / pull-push

# 仅 pull(不更新缓存,用于只读 Job)
test:
  stage: test
  cache:
    key: "${CI_COMMIT_REF_SLUG}"
    paths:
      - node_modules/
    policy: pull                   # 只读缓存
  script:
    - npm test

8.2 Per-Job 缓存

build-frontend:
  stage: build
  cache:
    - key: frontend-deps
      paths:
        - frontend/node_modules/
    - key: frontend-build-cache
      paths:
        - frontend/.cache/
  script:
    - cd frontend && npm ci && npm run build

build-backend:
  stage: build
  cache:
    - key: backend-deps
      paths:
        - backend/vendor/
  script:
    - cd backend && go mod download && go build ./...

8.3 缓存键策略

# 策略 1:基于 Lock 文件生成缓存 key(推荐)
cache:
  key:
    files:
      - package-lock.json          # 文件内容变化 → 新缓存
  paths:
    - node_modules/

# 策略 2:分支 + Lock 文件组合
cache:
  key:
    files:
      - package-lock.json
    prefix: $CI_COMMIT_REF_SLUG
  paths:
    - node_modules/

# 策略 3:固定 key + 手动版本控制
cache:
  key: "deps-v2"                   # 需要清除缓存时改版本号
  paths:
    - node_modules/

# 清除缓存的技巧:
# 方式 1:在 GitLab UI → CI/CD → Pipelines → Clear runner caches
# 方式 2:修改 cache key 中的版本号
# 方式 3:使用 API
#   curl --request DELETE --header "PRIVATE-TOKEN: $TOKEN" \
#     "https://gitlab.example.com/api/v4/projects/$PROJECT_ID/clean"

九、安全扫描

9.1 SAST(静态应用安全测试)

include:
  - template: Security/SAST.gitlab-ci.yml

sast:
  variables:
    SAST_EXCLUDED_ANALYZERS: "spotbugs"     # 排除特定分析器
    SAST_EXCLUDED_PATHS: "test,spec,docs"   # 排除目录

9.2 DAST(动态应用安全测试)

include:
  - template: Security/DAST.gitlab-ci.yml

dast:
  variables:
    DAST_WEBSITE: "https://staging.example.com"
    DAST_FULL_SCAN_ENABLED: "true"
    DAST_AUTH_URL: "https://staging.example.com/login"
    DAST_USERNAME: "scanner@example.com"
    DAST_PASSWORD: "$DAST_PASSWORD"         # 在 CI Variables 中配置

9.3 依赖扫描

include:
  - template: Security/Dependency-Scanning.gitlab-ci.yml

dependency_scanning:
  variables:
    DS_EXCLUDED_PATHS: "test,spec,docs"

9.4 容器扫描

include:
  - template: Security/Container-Scanning.gitlab-ci.yml

container_scanning:
  variables:
    CS_IMAGE: "$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA"
    CS_SEVERITY_THRESHOLD: "high"           # 只报告 high 及以上

完整安全扫描流水线示例:

include:
  - template: Security/SAST.gitlab-ci.yml
  - template: Security/Dependency-Scanning.gitlab-ci.yml
  - template: Security/Container-Scanning.gitlab-ci.yml
  - template: Security/Secret-Detection.gitlab-ci.yml

stages:
  - build
  - test
  - security-sast
  - security-container
  - deploy

build:
  stage: build
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

# 所有安全扫描 Job 自动从 include 模板继承配置
# 扫描结果会自动合并到 MR 的安全面板中

十、最佳实践

10.1 YAML 锚点复用

# 定义可复用的 Job 片段
.docker_login: &docker_login
  before_script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin

.base_deploy: &base_deploy
  stage: deploy
  image: bitnami/kubectl:latest
  <<: *docker_login
  when: manual
  allow_failure: false

# 使用锚点扩展
deploy-staging:
  <<: *base_deploy
  variables:
    K8S_NAMESPACE: staging
  environment:
    name: staging
  only:
    - develop

deploy-production:
  <<: *base_deploy
  variables:
    K8S_NAMESPACE: production
  environment:
    name: production
  only:
    - main

10.2 使用 extends(更推荐)

.docker_login:
  before_script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin

.base_deploy:
  stage: deploy
  image: bitnami/kubectl:latest
  extends: .docker_login
  when: manual

deploy-staging:
  extends: .base_deploy
  variables:
    K8S_NAMESPACE: staging
  environment:
    name: staging

deploy-production:
  extends: .base_deploy
  variables:
    K8S_NAMESPACE: production
  environment:
    name: production

💡 extends 比 YAML 锚点更强大:支持多级继承、深度合并对象和数组。推荐优先使用 extends

10.3 Monorepo 策略

# 使用 rules:changes 按文件变更路径触发
build-frontend:
  stage: build
  rules:
    - changes:
        - frontend/**/*
        - shared/**/*
      when: on_success
  script:
    - cd frontend && npm ci && npm run build

build-backend:
  stage: build
  rules:
    - changes:
        - backend/**/*
        - shared/**/*
      when: on_success
  script:
    - cd backend && go build ./...

deploy-frontend:
  stage: deploy
  rules:
    - changes:
        - frontend/**/*
      when: manual
  script:
    - echo "Deploy frontend"

deploy-backend:
  stage: deploy
  rules:
    - changes:
        - backend/**/*
      when: manual
  script:
    - echo "Deploy backend"

10.4 调试技巧

debug-job:
  stage: test
  image: ubuntu:22.04
  variables:
    CI_DEBUG_TRACE: "1"             # 开启 Shell 调试(set -x)
  script:
    - echo "Debug info:"
    - echo "Shell: $0"
    - echo "User: $(whoami)"
    - echo "PWD: $(pwd)"
    - echo "ENV:"
    - env | sort
    - echo "Network:"
    - ip addr show || ifconfig
    - echo "Disk:"
    - df -h

其他调试方法:

# 1. 本地运行 Job(需要安装 gitlab-runner)
gitlab-runner exec docker build-job

# 2. 使用 CI_DEBUG_SERVICES 查看 service 日志
#    在 Variables 中设置 CI_DEBUG_SERVICES=true

# 3. 在 Job 失败时保留容器(Docker Executor)
#    config.toml 中设置 [runners.docker] cleanup = false

# 4. 使用 after_script 查看失败后的环境状态
job:
  script:
    - make test
  after_script:
    - ls -la /builds/
    - cat /tmp/*.log 2>/dev/null || true

10.5 常见坑与解决方案

坑 1:Job 日志中变量值泄露

# ❌ 错误:直接打印密码
deploy:
  script:
    - echo "Connecting with password: $DB_PASSWORD"    # 日志中可见!
    - mysql -u root -p$DB_PASSWORD

# ✅ 正确:使用 Masked 变量 + 避免打印
deploy:
  script:
    - mysql -u root -p"$DB_PASSWORD" -e "SELECT 1"    # 变量在 Settings 中设为 Masked

坑 2:Docker Executor 中文件权限问题

# ❌ 以 root 运行的容器中创建的文件,宿主机 runner 用户无法访问
build:
  script:
    - touch output.txt

# ✅ 使用 --user 指定用户,或在 script 中修改权限
build:
  script:
    - touch output.txt
    - chmod 644 output.txt

坑 3:缓存未生效

# ❌ 不同分支使用相同缓存 key 但 Lock 文件不同
cache:
  key: "deps"

# ✅ 基于 Lock 文件生成 key
cache:
  key:
    files:
      - package-lock.json

坑 4:rulesonly/except 混用

# ❌ 不要混用 rules 和 only/except(已废弃)
job:
  only:
    - main
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

# ✅ 统一使用 rules
job:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
    - if: $CI_COMMIT_TAG

坑 5:Artifact 超大导致上传失败

# ❌ 打包整个 node_modules(可能上 GB)
artifacts:
  paths:
    - node_modules/

# ✅ 只打包构建产物,依赖用 cache 管理
artifacts:
  paths:
    - dist/
  expire_in: 7 days

附录:完整项目配置示例

以下是一个完整的 .gitlab-ci.yml 模板,涵盖了本文讨论的大部分功能:

# ============================================
# .gitlab-ci.yml - 完整项目模板
# ============================================

stages:
  - prepare
  - build
  - test
  - security
  - deploy

variables:
  DOCKER_REGISTRY: "$CI_REGISTRY"
  DOCKER_IMAGE: "$CI_REGISTRY_IMAGE"
  K8S_NAMESPACE: "default"

# ---- 全局缓存 ----
default:
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
      - .npm/
  image: node:18-alpine

# ---- 复用模板 ----
.docker_auth: &docker_auth
  before_script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY

.kubectl_base: &kubectl_base
  image:
    name: bitnami/kubectl:latest
    entrypoint: [""]
  before_script:
    - echo "$KUBE_CONFIG" | base64 -d > /tmp/kubeconfig
    - export KUBECONFIG=/tmp/kubeconfig

# ============ Prepare Stage ============
install-deps:
  stage: prepare
  script:
    - npm ci --cache .npm
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
      - .npm/
    policy: pull-push

# ============ Build Stage ============
build-app:
  stage: build
  script:
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 7 days

build-docker:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  <<: *docker_auth
  script:
    - docker build --cache-from $DOCKER_IMAGE:latest -t $DOCKER_IMAGE:$CI_COMMIT_SHA -t $DOCKER_IMAGE:latest .
    - docker push $DOCKER_IMAGE:$CI_COMMIT_SHA
    - docker push $DOCKER_IMAGE:latest

# ============ Test Stage ============
test-unit:
  stage: test
  script:
    - npm run test:unit -- --coverage
  coverage: '/Lines\s*:\s*([\d.]+)/'
  artifacts:
    reports:
      junit: junit.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml

test-integration:
  stage: test
  services:
    - postgres:15
    - redis:7
  variables:
    POSTGRES_DB: test
    POSTGRES_USER: test
    POSTGRES_PASSWORD: test
    DATABASE_URL: "postgresql://test:test@postgres:5432/test"
  script:
    - npm run test:integration

# ============ Security Stage ============
include:
  - template: Security/SAST.gitlab-ci.yml
  - template: Security/Dependency-Scanning.gitlab-ci.yml
  - template: Security/Secret-Detection.gitlab-ci.yml

# ============ Deploy Stage ============
deploy-staging:
  <<: *kubectl_base
  stage: deploy
  script:
    - kubectl set image deployment/app app=$DOCKER_IMAGE:$CI_COMMIT_SHA -n staging
    - kubectl rollout status deployment/app -n staging --timeout=300s
  environment:
    name: staging
    url: https://staging.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy-production:
  <<: *kubectl_base
  stage: deploy
  script:
    - kubectl set image deployment/app app=$DOCKER_IMAGE:$CI_COMMIT_SHA -n production
    - kubectl rollout status deployment/app -n production --timeout=300s
  environment:
    name: production
    url: https://www.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  allow_failure: false

📝 写在最后:GitLab CI/CD 是一个功能非常强大的持续集成/持续部署平台。本文覆盖了日常运维中最常用的配置和场景,建议收藏作为速查手册。实际项目中,根据团队规模和业务需求选择合适的 Runner 架构和流水线设计,切忌过度设计。从简单开始,逐步迭代优化,才是 DevOps 的正确姿势。