本文是服务器运维系列第 16 篇,聚焦 CI/CD 流水线的架构设计、多环境部署策略、发布模式(蓝绿/金丝雀/滚动更新)、回滚机制、制品管理、数据库迁移和安全合规。每个章节附带可直接复用的配置模板。


一、CI/CD 核心概念

很多人把 CI、CD 混为一谈,其实它们是三个递进的阶段。

1.1 持续集成(Continuous Integration)

开发人员频繁(每天至少一次)将代码合并到主干,每次合并都触发自动化构建和测试。

核心价值: 尽早发现问题。代码冲突、编译错误、单元测试失败在合并后几分钟内暴露,而不是几天后。

典型流程:

代码提交 → 触发 CI → 编译 → 单元测试 → 静态分析 → 构建产物 → 通知

1.2 持续交付(Continuous Delivery)

在 CI 基础上,确保代码随时可以部署到生产环境。部署到生产需要手动审批

核心价值: 部署能力回归业务决策。技术上随时可发,业务上选择合适时机。

1.3 持续部署(Continuous Deployment)

在持续交付基础上更进一步:代码通过所有测试后自动部署到生产,零人工干预。

核心价值: 最快的交付速度,适合成熟团队和高自动化水平的系统。

1.4 三者关系

持续集成 ⊂ 持续交付 ⊂ 持续部署

CI:   代码 → 构建 → 测试 → 制品
CD*:  制品 → 预发布环境 → [手动审批] → 生产环境(持续交付)
CD**: 制品 → 预发布环境 → 自动 → 生产环境(持续部署)

选型建议:

  • 初创团队 / 新项目:先做好 CI

  • 中型团队:持续交付(推荐大多数场景)

  • 成熟微服务 + 完善监控:持续部署


二、流水线设计原则

2.1 快速反馈

流水线每一秒都在消耗团队的时间。反馈越快,修复成本越低。

实践要点:

  • 单元测试 < 5 分钟,集成测试 < 15 分钟

  • 失败时立即通知,不要等整条流水线跑完

  • 分层测试:快速检查先行,慢速检查后置

GitLab CI 示例 — 阶段化流水线:

stages:
  - lint        # 30 秒
  - unit-test   # 2 分钟
  - build       # 3 分钟
  - integration # 10 分钟
  - deploy      # 2 分钟

lint:
  stage: lint
  script:
    - npm run lint
    - npm run type-check
  rules:
    - changes:
        - "src/**/*"

unit-test:
  stage: unit-test
  script:
    - npm run test:unit -- --coverage
  coverage: '/Lines\s*:\s*(\d+\.?\d*)%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

2.2 并行化

独立任务并行执行,缩短总耗时。

# GitLab CI 并行示例
unit-test:
  stage: test
  parallel: 4  # 分成 4 个并行 Job
  script:
    - jest --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

# GitHub Actions 矩阵构建
jobs:
  test:
    strategy:
      matrix:
        node-version: [18, 20, 22]
        os: [ubuntu-latest, macos-latest]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
      - run: npm test

2.3 幂等性

流水线多次执行结果一致,不会产生副作用。

# 幂等部署脚本示例
deploy:
  stage: deploy
  script:
    # 检查当前版本,避免重复部署
    - CURRENT=$(kubectl get deployment app -o jsonpath='{.metadata.annotations.version}')
    - if [ "$CURRENT" = "$CI_COMMIT_SHA" ]; then echo "Already deployed"; exit 0; fi
    # 使用声明式部署
    - kubectl apply -f k8s/deployment.yaml
    - kubectl rollout status deployment/app --timeout=300s

2.4 可回滚

任何变更都能快速回退到上一个已知正常版本。

# 保留最近 10 个版本的制品
deploy:
  artifacts:
    paths:
      - dist/
    expire_in: 30 days

# 回滚 Job
rollback:
  stage: deploy
  when: manual
  script:
    - kubectl rollout undo deployment/app
    - kubectl rollout status deployment/app --timeout=120s
  environment:
    name: production
    action: stop

三、多环境部署

3.1 环境分层

典型的三环境模型:

dev (开发) → staging (预发布) → production (生产)
  ↓              ↓                  ↓
 自动部署      自动部署          手动审批后部署
 功能验证      集成验证          线上验证
 随时重置      接近生产          高可用保障

3.2 环境差异管理

原则:代码不变,配置外置。

GitLab CI 多环境模板:

.deploy_template: &deploy_template
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context $KUBE_CONTEXT
    - envsubst < k8s/deployment.template.yaml | kubectl apply -f -
    - kubectl rollout status deployment/$APP_NAME -n $NAMESPACE --timeout=300s
  variables:
    APP_NAME: my-service

deploy:dev:
  <<: *deploy_template
  variables:
    KUBE_CONTEXT: dev-cluster
    NAMESPACE: dev
    REPLICAS: "1"
    CPU_LIMIT: "500m"
    MEMORY_LIMIT: "512Mi"
    LOG_LEVEL: "debug"
  environment:
    name: development
    url: https://dev.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy:staging:
  <<: *deploy_template
  variables:
    KUBE_CONTEXT: staging-cluster
    NAMESPACE: staging
    REPLICAS: "2"
    CPU_LIMIT: "1000m"
    MEMORY_LIMIT: "1Gi"
    LOG_LEVEL: "info"
  environment:
    name: staging
    url: https://staging.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy:prod:
  <<: *deploy_template
  variables:
    KUBE_CONTEXT: prod-cluster
    NAMESPACE: production
    REPLICAS: "5"
    CPU_LIMIT: "2000m"
    MEMORY_LIMIT: "2Gi"
    LOG_LEVEL: "warn"
  environment:
    name: production
    url: https://www.example.com
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual
  allow_failure: false

3.3 配置注入

方式一:Kubernetes ConfigMap + Secret

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: ${NAMESPACE}
data:
  DATABASE_HOST: "${DB_HOST}"
  REDIS_URL: "redis://${REDIS_HOST}:6379"
  LOG_LEVEL: "${LOG_LEVEL}"
  FEATURE_FLAG_NEW_UI: "true"

---
# k8s/secret.yaml (由 CI 变量注入)
apiVersion: v1
kind: Secret
metadata:
  name: app-secret
  namespace: ${NAMESPACE}
type: Opaque
data:
  DATABASE_PASSWORD: "${DB_PASSWORD_B64}"
  API_KEY: "${API_KEY_B64}"

方式二:Vault 动态密钥

# 使用 Vault Agent Sidecar 注入
annotations:
  vault.hashicorp.com/agent-inject: "true"
  vault.hashicorp.com/role: "my-service"
  vault.hashicorp.com/agent-inject-secret-db: "database/creds/my-service"
  vault.hashicorp.com/agent-inject-template-db: |
    {{- with secret "database/creds/my-service" -}}
    DATABASE_URL=postgresql://{{ .Data.username }}:{{ .Data.password }}@db:5432/mydb
    {{- end }}

四、蓝绿部署

4.1 原理

同时维护两套完全相同的生产环境(蓝和绿),任何时候只有一套对外服务。发布新版本时部署到闲置环境,验证通过后切换流量,出问题立即切回。

                  ┌─────────────┐
  用户请求 ───→  │  负载均衡器   │
                  └──────┬──────┘
                         │
              ┌──────────┴──────────┐
              ▼                     ▼
        ┌──────────┐          ┌──────────┐
        │  蓝(当前) │          │  绿(新)   │
        │  v1.2.3  │          │  v1.3.0  │
        └──────────┘          └──────────┘

  发布前:流量 → 蓝
  发布后:流量 → 绿(蓝保留作回滚)

4.2 Nginx 实现

# /etc/nginx/conf.d/blue-green.conf
upstream app {
    # 切换这一行即可完成蓝绿切换
    server 10.0.1.10:8080;  # 蓝环境
    # server 10.0.2.10:8080;  # 绿环境
}

server {
    listen 80;
    server_name www.example.com;

    location / {
        proxy_pass http://app;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

自动化切换脚本:

#!/bin/bash
# switch-env.sh - 蓝绿切换脚本
CURRENT=$(grep -oP 'server \K[\d.]+' /etc/nginx/conf.d/blue-green.conf)
BLUE="10.0.1.10"
GREEN="10.0.2.10"

if [ "$CURRENT" = "$BLUE" ]; then
    NEW_TARGET=$GREEN
    ENV_NAME="green"
else
    NEW_TARGET=$BLUE
    ENV_NAME="blue"
fi

# 预检:确认新环境健康
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://${NEW_TARGET}:8080/health)
if [ "$HTTP_CODE" != "200" ]; then
    echo "❌ 新环境健康检查失败 (HTTP $HTTP_CODE),中止切换"
    exit 1
fi

# 执行切换
sed -i "s/server ${CURRENT}:8080/server ${NEW_TARGET}:8080/" /etc/nginx/conf.d/blue-green.conf
nginx -t && nginx -s reload

echo "✅ 已切换到 ${ENV_NAME} 环境 (${NEW_TARGET})"
echo "回滚命令: sed -i 's/${NEW_TARGET}/${CURRENT}/' /etc/nginx/conf.d/blue-green.conf && nginx -s reload"

4.3 Docker Compose 实现

# docker-compose.blue-green.yml
version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - app-blue
      - app-green

  app-blue:
    image: registry.example.com/myapp:${BLUE_VERSION:-1.2.3}
    environment:
      - NODE_ENV=production
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  app-green:
    image: registry.example.com/myapp:${GREEN_VERSION:-1.3.0}
    environment:
      - NODE_ENV=production
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 3

4.4 Kubernetes 实现

# k8s/blue-green-deployment.yaml
# 蓝环境(当前版本)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
  labels:
    app: myapp
    version: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:1.2.3
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5

---
# Service 通过 selector 切换蓝绿
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # 改成 green 即切换
  ports:
    - port: 80
      targetPort: 8080

切换命令:

# 切换到绿环境
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

# 回滚到蓝环境
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'

五、金丝雀发布

5.1 原理

将新版本逐步推送给一小部分用户,观察指标无异常后再全量发布。像矿井中的金丝雀一样,先让"探路者"验证安全性。

  流量分配:
  ┌────────────────────────────────────────┐
  │ ████████████████████████████░░░░ 90%  │  ← 旧版本 (stable)
  │ ████████░░░░░░░░░░░░░░░░░░░░ 10%  │  ← 新版本 (canary)
  └────────────────────────────────────────┘

  时间线:
  0% → 10% → 25% → 50% → 100%
  │      │      │      │      │
  观察   观察   观察   观察   全量发布
  15min  15min  15min  15min

5.2 Nginx 流量控制

# 基于权重的金丝雀
upstream app {
    server 10.0.1.10:8080 weight=90;  # 稳定版
    server 10.0.2.10:8080 weight=10;  # 金丝雀
}

# 基于 Header 的金丝雀(内部测试用户)
map $http_x_canary $backend {
    "true"  canary_backend;
    default stable_backend;
}

upstream stable_backend {
    server 10.0.1.10:8080;
}

upstream canary_backend {
    server 10.0.2.10:8080;
}

server {
    listen 80;
    location / {
        proxy_pass http://$backend;
    }
}

5.3 Kubernetes 金丝雀

# 稳定版 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-stable
spec:
  replicas: 9
  selector:
    matchLabels:
      app: myapp
      track: stable
  template:
    metadata:
      labels:
        app: myapp
        track: stable
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:1.2.3

---
# 金丝雀 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
      track: canary
  template:
    metadata:
      labels:
        app: myapp
        track: canary
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:1.3.0

---
# Service 同时匹配 stable 和 canary
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp  # 不含 track,匹配两者
  ports:
    - port: 80
      targetPort: 8080

5.4 监控与自动回滚

Prometheus 告警规则(金丝雀指标监控):

# prometheus-rules.yaml
groups:
  - name: canary-monitoring
    rules:
      - alert: CanaryHighErrorRate
        expr: |
          sum(rate(http_requests_total{track="canary",status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total{track="canary"}[5m]))
          > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "金丝雀错误率超过 5%"
          description: "金丝雀版本错误率 {{ $value | humanizePercentage }},建议回滚"

      - alert: CanaryHighLatency
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_request_duration_seconds_bucket{track="canary"}[5m])) by (le)
          ) > 2
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "金丝雀 P95 延迟超过 2 秒"

自动回滚脚本:

#!/bin/bash
# canary-rollback.sh
ERROR_RATE=$(curl -s 'http://prometheus:9090/api/v1/query' \
  --data-urlencode 'query=sum(rate(http_requests_total{track="canary",status=~"5.."}[5m]))/sum(rate(http_requests_total{track="canary"}[5m]))' \
  | jq -r '.data.result[0].value[1]')

if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
    echo "⚠️ 金丝雀错误率 ${ERROR_RATE},执行回滚"
    kubectl scale deployment app-canary --replicas=0
    kubectl scale deployment app-stable --replicas=10
    echo "✅ 回滚完成"
else
    echo "✅ 金丝雀指标正常,错误率 ${ERROR_RATE}"
fi

5.5 Flagger 自动化金丝雀

# Flagger Canary 资源(适用于 Istio/Linkerd/Nginx Ingress)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  progressDeadlineSeconds: 600
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 5           # 最大失败次数
    maxWeight: 50          # 最大金丝雀权重
    stepWeight: 10         # 每次增加的权重
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99          # 成功率 > 99%
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500         # P99 < 500ms
        interval: 1m
    webhooks:
      - name: load-test
        type: rollout
        url: http://flagger-loadtester.test/
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary.test/"

六、滚动更新

6.1 策略配置

Kubernetes 原生的默认更新策略,逐步用新 Pod 替换旧 Pod。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 3        # 最多多出 3 个 Pod(13 个总 Pod)
      maxUnavailable: 1  # 最多 1 个 Pod 不可用(至少 9 个可用)
  # 等价于:一次更新 3 个 Pod,保证至少 9 个健康
  template:
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:1.3.0
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

参数调优建议:

场景

maxSurge

maxUnavailable

说明

高可用

25%

0

不允许任何 Pod 不可用

快速发布

50%

25%

更快的更新速度

资源紧张

0

1

先删后建,不超资源配额

6.2 健康检查三件套

containers:
  - name: app
    # 启动探针:慢启动应用专用,避免 liveness 误杀
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      failureThreshold: 30    # 最多等 30*5=150 秒
      periodSeconds: 5

    # 就绪探针:决定是否接收流量
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      failureThreshold: 3     # 连续 3 次失败则摘除

    # 存活探针:决定是否重启容器
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      failureThreshold: 3     # 连续 3 次失败则重启

6.3 回滚机制

# 查看发布历史
kubectl rollout history deployment/myapp

# 回滚到上一版本
kubectl rollout undo deployment/myapp

# 回滚到指定版本
kubectl rollout undo deployment/myapp --to-revision=3

# 查看回滚状态
kubectl rollout status deployment/myapp --timeout=120s

# 暂停发布(发现异常时)
kubectl rollout pause deployment/myapp

# 恢复发布
kubectl rollout resume deployment/myapp

CI/CD 集成回滚:

# GitLab CI 自动回滚
deploy:
  stage: deploy
  script:
    - kubectl apply -f k8s/
    - kubectl rollout status deployment/myapp --timeout=300s || {
        echo "部署超时,自动回滚";
        kubectl rollout undo deployment/myapp;
        kubectl rollout status deployment/myapp --timeout=120s;
        exit 1;
      }

七、回滚策略

7.1 版本管理

语义化版本 + 制品不可变:

版本号: MAJOR.MINOR.PATCH (如 1.3.2)
  MAJOR: 不兼容的 API 变更
  MINOR: 向后兼容的功能新增
  PATCH: 向后兼容的 Bug 修复

制品标签: v1.3.2 / v1.3.2-abc1234 (带 commit hash)

Docker 镜像版本策略:

# ❌ 避免使用 latest
docker build -t myapp:latest .

# ✅ 使用语义化版本 + commit hash
docker build -t myapp:v1.3.2 -t myapp:v1.3.2-abc1234 .

# CI 中自动生成标签
VERSION=$(git describe --tags --always)
docker build -t registry.example.com/myapp:${VERSION} .
docker push registry.example.com/myapp:${VERSION}

7.2 数据库回滚

数据库变更是回滚中最棘手的部分。核心原则:向前兼容,分步执行。

安全的数据库变更模式:

第 1 步:新增列(Additive only)→ 部署新代码
第 2 步:新代码同时读写新旧列 → 验证
第 3 步:新代码只读写新列 → 停止写旧列
第 4 步:删除旧列(在下个版本)

回滚脚本模板:

-- migration/V3__add_email_column.sql (向前)
ALTER TABLE users ADD COLUMN email VARCHAR(255);
UPDATE users SET email = CONCAT(username, '@example.com');
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

-- rollback/V3__rollback_email_column.sql (回滚)
ALTER TABLE users DROP COLUMN IF EXISTS email;

7.3 快速回滚 SOP

#!/bin/bash
# rollback.sh - 生产快速回滚 SOP
set -euo pipefail

APP_NAME="${1:?用法: ./rollback.sh <app-name> [revision]}"
REVISION="${2:-}"

echo "🔄 开始回滚 ${APP_NAME}"

# Step 1: 确认当前版本
echo "当前版本:"
kubectl get deployment ${APP_NAME} -o jsonpath='{.spec.template.spec.containers[0].image}'
echo ""

# Step 2: 执行回滚
if [ -n "$REVISION" ]; then
    kubectl rollout undo deployment/${APP_NAME} --to-revision=${REVISION}
else
    kubectl rollout undo deployment/${APP_NAME}
fi

# Step 3: 等待回滚完成
kubectl rollout status deployment/${APP_NAME} --timeout=180s

# Step 4: 验证
NEW_IMAGE=$(kubectl get deployment ${APP_NAME} -o jsonpath='{.spec.template.spec.containers[0].image}')
echo "✅ 回滚完成,当前镜像: ${NEW_IMAGE}"

# Step 5: 检查 Pod 状态
kubectl get pods -l app=${APP_NAME} -o wide

# Step 6: 健康检查
for i in {1..5}; do
    STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://www.example.com/health || true)
    if [ "$STATUS" = "200" ]; then
        echo "✅ 健康检查通过"
        exit 0
    fi
    echo "等待健康检查... (${i}/5)"
    sleep 5
done
echo "❌ 健康检查失败,请人工介入"
exit 1

八、制品仓库

8.1 主流制品仓库对比

特性

Nexus Repository

JFrog Artifactory

Harbor

类型

通用仓库

通用仓库

Docker 专用

Docker 支持

✅(核心功能)

Maven/Npm/PyPI

免费版

OSS 版免费

CE 版免费

完全开源

高可用

付费版

付费版

适合场景

中小团队

企业级

K8s/容器化

8.2 Nexus 配置示例

# Docker Compose 部署 Nexus
version: "3.8"
services:
  nexus:
    image: sonatype/nexus3:3.68.0
    ports:
      - "8081:8081"
      - "8082:8082"  # Docker registry port
    volumes:
      - nexus-data:/nexus-data
    environment:
      - NEXUS_SECURITY_RANDOMPASSWORD=false

volumes:
  nexus-data:

Nexus Docker Registry 使用:

# 登录
docker login nexus.example.com:8082

# 推送镜像
docker tag myapp:v1.3.0 nexus.example.com:8082/myapp:v1.3.0
docker push nexus.example.com:8082/myapp:v1.3.0

# CI 中使用
# .gitlab-ci.yml
build:
  stage: build
  script:
    - docker login -u $NEXUS_USER -p $NEXUS_PASS nexus.example.com:8082
    - docker build -t nexus.example.com:8082/myapp:${CI_COMMIT_TAG} .
    - docker push nexus.example.com:8082/myapp:${CI_COMMIT_TAG}

8.3 Harbor 配置

# Harbor 安装 (docker-compose)
# 下载 Harbor 离线安装包
wget https://github.com/goharbor/harbor/releases/download/v2.11.0/harbor-offline-installer-v2.11.0.tgz
tar xzf harbor-offline-installer-v2.11.0.tgz
cd harbor

# 编辑 harbor.yml
# hostname: harbor.example.com
# harbor_admin_password: StrongPassword123!
# database.password: dbpassword

./install.sh --with-trivy  # 启用漏洞扫描

Harbor 项目与镜像策略:

# 创建项目
curl -u admin:password -X POST https://harbor.example.com/api/v2.0/projects \
  -H "Content-Type: application/json" \
  -d '{"project_name": "myteam", "public": false}'

# 配置镜像保留策略(保留最近 30 个 tag)
curl -u admin:password -X POST \
  "https://harbor.example.com/api/v2.0/projects/myteam/repositories/myapp/policies" \
  -H "Content-Type: application/json" \
  -d '{
    "rules": [{
      "template": "latestPushedK",
      "params": {"latestPushedK": 30},
      "tag_selectors": [{"kind": "wildcard", "decoration": "matches", "pattern": "**"}],
      "repo_selectors": [{"kind": "doublestar", "decoration": "matches", "pattern": "**"}]
    }]
  }'

8.4 制品版本策略与垃圾回收

# GitLab CI 制品版本管理
variables:
  IMAGE_REGISTRY: harbor.example.com/myteam
  IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}

build:
  stage: build
  script:
    - docker build -t ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} .
    - docker push ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG}
    # 额外推送 latest(仅 main 分支)
    - if [ "$CI_COMMIT_BRANCH" = "main" ]; then
        docker tag ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} ${IMAGE_REGISTRY}/myapp:latest;
        docker push ${IMAGE_REGISTRY}/myapp:latest;
      fi
  rules:
    - if: $CI_COMMIT_TAG
    - if: $CI_COMMIT_BRANCH == "main"

Harbor 垃圾回收:

# 手动触发垃圾回收
curl -u admin:password -X POST \
  "https://harbor.example.com/api/v2.0/system/gc/schedule" \
  -H "Content-Type: application/json" \
  -d '{
    "parameters": {"delete_untagged": true},
    "schedule": {"type": "Weekly", "weekday": 1, "offtime": 0}
  }'

九、数据库迁移

9.1 Flyway

Flyway 使用纯 SQL 脚本,简单直接,学习成本低。

目录结构:

db/migration/
  V1__create_users_table.sql
  V2__add_email_column.sql
  V3__create_orders_table.sql
  V4__add_index_on_email.sql

SQL 示例:

-- V1__create_users_table.sql
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    username VARCHAR(100) NOT NULL UNIQUE,
    email VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- V2__add_email_column.sql
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX idx_users_phone ON users(phone);

Flyway 集成到 CI/CD:

# GitLab CI 数据库迁移
db:migrate:
  stage: migrate
  image: flyway/flyway:10
  script:
    - flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
             -user=${DB_USER}
             -password=${DB_PASS}
             -locations=filesystem:./db/migration
             migrate
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      changes:
        - "db/migration/**/*"

db:info:
  stage: check
  image: flyway/flyway:10
  script:
    - flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
             -user=${DB_USER}
             -password=${DB_PASS}
             -locations=filesystem:./db/migration
             info

db:repair:
  stage: fix
  image: flyway/flyway:10
  script:
    - flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
             -user=${DB_USER}
             -password=${DB_PASS}
             repair
  when: manual

9.2 Liquibase

Liquibase 使用 XML/YAML/JSON 描述变更,支持自动回滚。

changelog.yaml:

databaseChangeLog:
  - changeSet:
      id: 1
      author: devteam
      changes:
        - createTable:
            tableName: users
            columns:
              - column:
                  name: id
                  type: bigint
                  autoIncrement: true
                  constraints:
                    primaryKey: true
                    nullable: false
              - column:
                  name: username
                  type: varchar(100)
                  constraints:
                    nullable: false
                    unique: true
      rollback:
        - dropTable:
            tableName: users

  - changeSet:
      id: 2
      author: devteam
      changes:
        - addColumn:
            tableName: users
            columns:
              - column:
                  name: phone
                  type: varchar(20)
        - createIndex:
            tableName: users
            indexName: idx_users_phone
            columns:
              - column:
                  name: phone
      rollback:
        - dropIndex:
            tableName: users
            indexName: idx_users_phone
        - dropColumn:
            tableName: users
            columnName: phone

9.3 迁移策略

Golden Rules:

  1. 只追加,不修改 — 不要 ALTER 已有列的数据类型

  2. 向后兼容 — 新代码必须能跑在旧 schema 上

  3. 小步快走 — 每次迁移只做一个变更

  4. 先扩后缩 — 新增列 → 迁移数据 → 去掉旧列(跨版本完成)

CI 集成模式:

代码合并 → CI 构建镜像 → 先跑 DB Migration → 再部署应用 → 健康检查
                          ↑ 数据库变更先于应用部署

十、安全与合规

10.1 SAST / DAST / SCA 集成

SAST(静态应用安全测试)— 代码层面:

# GitLab CI 集成 Semgrep
sast:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=auto --json --output=semgrep-results.json .
  artifacts:
    reports:
      sast: semgrep-results.json
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

SCA(软件成分分析)— 依赖层面:

# GitLab CI 集成 Trivy 扫描依赖漏洞
dependency-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --format json --output trivy-results.json --severity HIGH,CRITICAL .
    - trivy fs --exit-code 1 --severity CRITICAL .  # 高危漏洞直接失败
  artifacts:
    reports:
      dependency_scanning: trivy-results.json

# Docker 镜像漏洞扫描
image-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy image --exit-code 1 --severity CRITICAL ${IMAGE_REGISTRY}/myapp:${CI_COMMIT_TAG}

DAST(动态应用安全测试)— 运行层面:

# 集成 OWASP ZAP
dast:
  stage: security
  image: ghcr.io/zaproxy/zaproxy:stable
  script:
    - zap-baseline.py -t https://staging.example.com -r zap-report.html
  artifacts:
    paths:
      - zap-report.html
    when: always
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

10.2 审批流程

# GitLab CI 多级审批
deploy:staging:
  stage: deploy
  environment:
    name: staging
  script:
    - ./deploy.sh staging
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy:prod:
  stage: deploy
  environment:
    name: production
    deployment_tier: production
  script:
    - ./deploy.sh production
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual
  allow_failure: false

# GitLab Protected Environment 审批
# Settings → CI/CD → Protected Environments
# 添加 production 环境保护
# 允许部署的角色: Maintainer
# 所需审批人数: 2

10.3 审计日志

# 部署审计记录
deploy:prod:
  stage: deploy
  script:
    - |
      # 记录部署审计日志
      AUDIT_LOG=$(cat <<EOF
      {
        "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
        "action": "deploy",
        "environment": "production",
        "version": "${CI_COMMIT_TAG}",
        "deployer": "${GITLAB_USER_LOGIN}",
        "pipeline_id": "${CI_PIPELINE_ID}",
        "commit_sha": "${CI_COMMIT_SHA}",
        "commit_message": "$(git log -1 --pretty=%B)"
      }
      EOF
      )
      echo "$AUDIT_LOG" >> /var/log/deploy-audit.jsonl
      # 推送到集中式日志
      curl -X POST "https://elasticsearch:9200/deploy-audit/_doc" \
        -H "Content-Type: application/json" \
        -d "$AUDIT_LOG"
    - ./deploy.sh production

十一、完整流水线模板

11.1 前端项目(React/Vue + Docker)

# .gitlab-ci.yml - 前端项目完整流水线
variables:
  IMAGE_REGISTRY: harbor.example.com/frontend
  IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}
  NODE_VERSION: "20"

stages:
  - install
  - lint
  - test
  - build
  - security
  - docker
  - deploy

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - node_modules/
    - .npm/

# ── 安装依赖 ──
install:
  stage: install
  image: node:${NODE_VERSION}-alpine
  script:
    - npm ci --prefer-offline
  artifacts:
    paths:
      - node_modules/
    expire_in: 1 hour

# ── 代码检查 ──
lint:
  stage: lint
  image: node:${NODE_VERSION}-alpine
  needs: [install]
  script:
    - npm run lint
    - npm run type-check

# ── 单元测试 ──
test:unit:
  stage: test
  image: node:${NODE_VERSION}-alpine
  needs: [install]
  script:
    - npm run test:unit -- --coverage
  coverage: '/Lines\s*:\s*(\d+\.?\d*)%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
      junit: junit.xml

# ── 构建 ──
build:
  stage: build
  image: node:${NODE_VERSION}-alpine
  needs: [install]
  script:
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 7 days

# ── 安全扫描 ──
sast:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=auto --json --output=sast.json .
  artifacts:
    reports:
      sast: sast.json
  allow_failure: true

dependency-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --exit-code 1 --severity CRITICAL .

# ── Docker 构建推送 ──
docker:build:
  stage: docker
  image: docker:24
  services:
    - docker:24-dind
  needs: [build, test:unit]
  script:
    - docker login -u $HARBOR_USER -p $HARBOR_PASS $IMAGE_REGISTRY
    - docker build -t ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} .
    - docker push ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG}
    - |
      if [ "$CI_COMMIT_BRANCH" = "main" ]; then
        docker tag ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} ${IMAGE_REGISTRY}/myapp:latest
        docker push ${IMAGE_REGISTRY}/myapp:latest
      fi
  rules:
    - if: $CI_COMMIT_TAG
    - if: $CI_COMMIT_BRANCH == "main"

# ── 部署 ──
deploy:dev:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build]
  script:
    - kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n dev
    - kubectl rollout status deployment/myapp -n dev --timeout=180s
  environment:
    name: development
    url: https://dev.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy:staging:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build]
  script:
    - kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n staging
    - kubectl rollout status deployment/myapp -n staging --timeout=300s
  environment:
    name: staging
    url: https://staging.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy:prod:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build]
  script:
    - kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n production
    - kubectl rollout status deployment/myapp -n production --timeout=300s
  environment:
    name: production
    url: https://www.example.com
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual

11.2 后端项目(Java Spring Boot)

# .gitlab-ci.yml - Java 后端完整流水线
variables:
  IMAGE_REGISTRY: harbor.example.com/backend
  IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}
  MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"

stages:
  - build
  - test
  - security
  - docker
  - migrate
  - deploy

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - .m2/repository/
    - target/

# ── 编译 ──
build:
  stage: build
  image: maven:3.9-eclipse-temurin-21
  script:
    - mvn clean compile -DskipTests
  artifacts:
    paths:
      - target/

# ── 测试 ──
test:unit:
  stage: test
  image: maven:3.9-eclipse-temurin-21
  needs: [build]
  script:
    - mvn test
  artifacts:
    reports:
      junit: target/surefire-reports/TEST-*.xml
    when: always

test:integration:
  stage: test
  image: maven:3.9-eclipse-temurin-21
  services:
    - postgres:15
    - redis:7
  variables:
    POSTGRES_DB: testdb
    POSTGRES_USER: test
    POSTGRES_PASSWORD: test
    SPRING_DATASOURCE_URL: "jdbc:postgresql://postgres:5432/testdb"
  needs: [build]
  script:
    - mvn verify -P integration-test
  artifacts:
    reports:
      junit: target/failsafe-reports/TEST-*.xml

# ── 安全扫描 ──
sast:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=p/java --json --output=sast.json .
  artifacts:
    reports:
      sast: sast.json
  allow_failure: true

dependency-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --exit-code 1 --severity CRITICAL --scanners vuln .

# ── Docker 构建 ──
docker:build:
  stage: docker
  image: docker:24
  services:
    - docker:24-dind
  needs: [build, test:unit, test:integration]
  script:
    - mvn package -DskipTests
    - docker login -u $HARBOR_USER -p $HARBOR_PASS $IMAGE_REGISTRY
    - docker build -t ${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} .
    - docker push ${IMAGE_REGISTRY}/myapi:${IMAGE_TAG}
  rules:
    - if: $CI_COMMIT_TAG
    - if: $CI_COMMIT_BRANCH == "main"

# ── 数据库迁移 ──
db:migrate:
  stage: migrate
  image: flyway/flyway:10
  needs: [docker:build]
  script:
    - flyway -url=jdbc:postgresql://${PROD_DB_HOST}:5432/${PROD_DB_NAME}
             -user=${PROD_DB_USER}
             -password=${PROD_DB_PASS}
             -locations=filesystem:./db/migration
             migrate
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual

# ── 部署 ──
deploy:staging:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build]
  script:
    - kubectl set image deployment/myapi myapi=${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} -n staging
    - kubectl rollout status deployment/myapi -n staging --timeout=300s
  environment:
    name: staging
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy:prod:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build, db:migrate]
  script:
    - kubectl set image deployment/myapi myapi=${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} -n production
    - kubectl rollout status deployment/myapi -n production --timeout=300s
  environment:
    name: production
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual

11.3 Docker 多阶段构建

# Dockerfile - 前端多阶段构建
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline
COPY . .
RUN npm run build

FROM nginx:1.25-alpine AS production
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget -qO- http://localhost/health || exit 1

# Dockerfile - Java 后端多阶段构建
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests

FROM eclipse-temurin:21-jre-alpine AS production
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY --from=builder /app/target/*.jar app.jar
USER app
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget -qO- http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=75.0", "-jar", "app.jar"]

总结

CI/CD 流水线不是一蹴而就的,建议按阶段演进:

阶段 1: 自动化构建 + 单元测试(CI 基础)
阶段 2: 多环境自动部署 + 集成测试
阶段 3: 安全扫描 + 制品管理
阶段 4: 蓝绿/金丝雀发布 + 自动回滚
阶段 5: 全链路监控 + 完善审计

关键心法:

  • 代码不变,配置外置 — 所有环境差异通过配置管理

  • 制品不可变 — 同一个镜像从测试跑到生产

  • 向前兼容 — 数据库变更永远向前兼容

  • 小步快跑 — 频繁小发布优于偶尔大发布

  • 快速回滚 — 回滚能力比发布能力更重要


💡 本文配置均可直接复用,根据实际项目调整参数即可。建议收藏备查。