服务器运维笔记：CI/CD 流水线设计

服务器运维笔记

本文是服务器运维系列第 16 篇，聚焦 CI/CD 流水线的架构设计、多环境部署策略、发布模式（蓝绿/金丝雀/滚动更新）、回滚机制、制品管理、数据库迁移和安全合规。每个章节附带可直接复用的配置模板。

一、CI/CD 核心概念

很多人把 CI、CD 混为一谈，其实它们是三个递进的阶段。

1.1 持续集成（Continuous Integration）

开发人员频繁（每天至少一次）将代码合并到主干，每次合并都触发自动化构建和测试。

核心价值： 尽早发现问题。代码冲突、编译错误、单元测试失败在合并后几分钟内暴露，而不是几天后。

典型流程：

代码提交 → 触发 CI → 编译 → 单元测试 → 静态分析 → 构建产物 → 通知

1.2 持续交付（Continuous Delivery）

在 CI 基础上，确保代码随时可以部署到生产环境。部署到生产需要手动审批。

核心价值： 部署能力回归业务决策。技术上随时可发，业务上选择合适时机。

1.3 持续部署（Continuous Deployment）

在持续交付基础上更进一步：代码通过所有测试后自动部署到生产，零人工干预。

核心价值： 最快的交付速度，适合成熟团队和高自动化水平的系统。

1.4 三者关系

持续集成 ⊂ 持续交付 ⊂ 持续部署

CI:   代码 → 构建 → 测试 → 制品
CD*:  制品 → 预发布环境 → [手动审批] → 生产环境（持续交付）
CD**: 制品 → 预发布环境 → 自动 → 生产环境（持续部署）

选型建议：

初创团队 / 新项目：先做好 CI
中型团队：持续交付（推荐大多数场景）
成熟微服务 + 完善监控：持续部署

二、流水线设计原则

2.1 快速反馈

流水线每一秒都在消耗团队的时间。反馈越快，修复成本越低。

实践要点：

单元测试 < 5 分钟，集成测试 < 15 分钟
失败时立即通知，不要等整条流水线跑完
分层测试：快速检查先行，慢速检查后置

GitLab CI 示例 — 阶段化流水线：

stages:
  - lint        # 30 秒
  - unit-test   # 2 分钟
  - build       # 3 分钟
  - integration # 10 分钟
  - deploy      # 2 分钟

lint:
  stage: lint
  script:
    - npm run lint
    - npm run type-check
  rules:
    - changes:
        - "src/**/*"

unit-test:
  stage: unit-test
  script:
    - npm run test:unit -- --coverage
  coverage: '/Lines\s*:\s*(\d+\.?\d*)%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

2.2 并行化

独立任务并行执行，缩短总耗时。

# GitLab CI 并行示例
unit-test:
  stage: test
  parallel: 4  # 分成 4 个并行 Job
  script:
    - jest --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

# GitHub Actions 矩阵构建
jobs:
  test:
    strategy:
      matrix:
        node-version: [18, 20, 22]
        os: [ubuntu-latest, macos-latest]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
      - run: npm test

2.3 幂等性

流水线多次执行结果一致，不会产生副作用。

# 幂等部署脚本示例
deploy:
  stage: deploy
  script:
    # 检查当前版本，避免重复部署
    - CURRENT=$(kubectl get deployment app -o jsonpath='{.metadata.annotations.version}')
    - if [ "$CURRENT" = "$CI_COMMIT_SHA" ]; then echo "Already deployed"; exit 0; fi
    # 使用声明式部署
    - kubectl apply -f k8s/deployment.yaml
    - kubectl rollout status deployment/app --timeout=300s

2.4 可回滚

任何变更都能快速回退到上一个已知正常版本。

# 保留最近 10 个版本的制品
deploy:
  artifacts:
    paths:
      - dist/
    expire_in: 30 days

# 回滚 Job
rollback:
  stage: deploy
  when: manual
  script:
    - kubectl rollout undo deployment/app
    - kubectl rollout status deployment/app --timeout=120s
  environment:
    name: production
    action: stop

三、多环境部署

3.1 环境分层

典型的三环境模型：

dev (开发) → staging (预发布) → production (生产)
  ↓              ↓                  ↓
 自动部署      自动部署          手动审批后部署
 功能验证      集成验证          线上验证
 随时重置      接近生产          高可用保障

3.2 环境差异管理

原则：代码不变，配置外置。

GitLab CI 多环境模板：

.deploy_template: &deploy_template
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context $KUBE_CONTEXT
    - envsubst < k8s/deployment.template.yaml | kubectl apply -f -
    - kubectl rollout status deployment/$APP_NAME -n $NAMESPACE --timeout=300s
  variables:
    APP_NAME: my-service

deploy:dev:
  <<: *deploy_template
  variables:
    KUBE_CONTEXT: dev-cluster
    NAMESPACE: dev
    REPLICAS: "1"
    CPU_LIMIT: "500m"
    MEMORY_LIMIT: "512Mi"
    LOG_LEVEL: "debug"
  environment:
    name: development
    url: https://dev.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy:staging:
  <<: *deploy_template
  variables:
    KUBE_CONTEXT: staging-cluster
    NAMESPACE: staging
    REPLICAS: "2"
    CPU_LIMIT: "1000m"
    MEMORY_LIMIT: "1Gi"
    LOG_LEVEL: "info"
  environment:
    name: staging
    url: https://staging.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy:prod:
  <<: *deploy_template
  variables:
    KUBE_CONTEXT: prod-cluster
    NAMESPACE: production
    REPLICAS: "5"
    CPU_LIMIT: "2000m"
    MEMORY_LIMIT: "2Gi"
    LOG_LEVEL: "warn"
  environment:
    name: production
    url: https://www.example.com
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual
  allow_failure: false

3.3 配置注入

方式一：Kubernetes ConfigMap + Secret

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: ${NAMESPACE}
data:
  DATABASE_HOST: "${DB_HOST}"
  REDIS_URL: "redis://${REDIS_HOST}:6379"
  LOG_LEVEL: "${LOG_LEVEL}"
  FEATURE_FLAG_NEW_UI: "true"

---
# k8s/secret.yaml (由 CI 变量注入)
apiVersion: v1
kind: Secret
metadata:
  name: app-secret
  namespace: ${NAMESPACE}
type: Opaque
data:
  DATABASE_PASSWORD: "${DB_PASSWORD_B64}"
  API_KEY: "${API_KEY_B64}"

方式二：Vault 动态密钥

# 使用 Vault Agent Sidecar 注入
annotations:
  vault.hashicorp.com/agent-inject: "true"
  vault.hashicorp.com/role: "my-service"
  vault.hashicorp.com/agent-inject-secret-db: "database/creds/my-service"
  vault.hashicorp.com/agent-inject-template-db: |
    {{- with secret "database/creds/my-service" -}}
    DATABASE_URL=postgresql://{{ .Data.username }}:{{ .Data.password }}@db:5432/mydb
    {{- end }}

四、蓝绿部署

4.1 原理

同时维护两套完全相同的生产环境（蓝和绿），任何时候只有一套对外服务。发布新版本时部署到闲置环境，验证通过后切换流量，出问题立即切回。

                  ┌─────────────┐
  用户请求 ───→  │  负载均衡器   │
                  └──────┬──────┘
                         │
              ┌──────────┴──────────┐
              ▼                     ▼
        ┌──────────┐          ┌──────────┐
        │  蓝(当前) │          │  绿(新)   │
        │  v1.2.3  │          │  v1.3.0  │
        └──────────┘          └──────────┘

  发布前：流量 → 蓝
  发布后：流量 → 绿（蓝保留作回滚）

4.2 Nginx 实现

# /etc/nginx/conf.d/blue-green.conf
upstream app {
    # 切换这一行即可完成蓝绿切换
    server 10.0.1.10:8080;  # 蓝环境
    # server 10.0.2.10:8080;  # 绿环境
}

server {
    listen 80;
    server_name www.example.com;

    location / {
        proxy_pass http://app;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

自动化切换脚本：

#!/bin/bash
# switch-env.sh - 蓝绿切换脚本
CURRENT=$(grep -oP 'server \K[\d.]+' /etc/nginx/conf.d/blue-green.conf)
BLUE="10.0.1.10"
GREEN="10.0.2.10"

if [ "$CURRENT" = "$BLUE" ]; then
    NEW_TARGET=$GREEN
    ENV_NAME="green"
else
    NEW_TARGET=$BLUE
    ENV_NAME="blue"
fi

# 预检：确认新环境健康
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://${NEW_TARGET}:8080/health)
if [ "$HTTP_CODE" != "200" ]; then
    echo "❌ 新环境健康检查失败 (HTTP $HTTP_CODE)，中止切换"
    exit 1
fi

# 执行切换
sed -i "s/server ${CURRENT}:8080/server ${NEW_TARGET}:8080/" /etc/nginx/conf.d/blue-green.conf
nginx -t && nginx -s reload

echo "✅ 已切换到 ${ENV_NAME} 环境 (${NEW_TARGET})"
echo "回滚命令: sed -i 's/${NEW_TARGET}/${CURRENT}/' /etc/nginx/conf.d/blue-green.conf && nginx -s reload"

4.3 Docker Compose 实现

# docker-compose.blue-green.yml
version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - app-blue
      - app-green

  app-blue:
    image: registry.example.com/myapp:${BLUE_VERSION:-1.2.3}
    environment:
      - NODE_ENV=production
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  app-green:
    image: registry.example.com/myapp:${GREEN_VERSION:-1.3.0}
    environment:
      - NODE_ENV=production
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 3

4.4 Kubernetes 实现

# k8s/blue-green-deployment.yaml
# 蓝环境（当前版本）
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
  labels:
    app: myapp
    version: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:1.2.3
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5

---
# Service 通过 selector 切换蓝绿
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # 改成 green 即切换
  ports:
    - port: 80
      targetPort: 8080

切换命令：

# 切换到绿环境
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

# 回滚到蓝环境
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'

五、金丝雀发布

5.1 原理

将新版本逐步推送给一小部分用户，观察指标无异常后再全量发布。像矿井中的金丝雀一样，先让"探路者"验证安全性。

  流量分配:
  ┌────────────────────────────────────────┐
  │ ████████████████████████████░░░░ 90%  │  ← 旧版本 (stable)
  │ ████████░░░░░░░░░░░░░░░░░░░░ 10%  │  ← 新版本 (canary)
  └────────────────────────────────────────┘

  时间线:
  0% → 10% → 25% → 50% → 100%
  │      │      │      │      │
  观察   观察   观察   观察   全量发布
  15min  15min  15min  15min

5.2 Nginx 流量控制

# 基于权重的金丝雀
upstream app {
    server 10.0.1.10:8080 weight=90;  # 稳定版
    server 10.0.2.10:8080 weight=10;  # 金丝雀
}

# 基于 Header 的金丝雀（内部测试用户）
map $http_x_canary $backend {
    "true"  canary_backend;
    default stable_backend;
}

upstream stable_backend {
    server 10.0.1.10:8080;
}

upstream canary_backend {
    server 10.0.2.10:8080;
}

server {
    listen 80;
    location / {
        proxy_pass http://$backend;
    }
}

5.3 Kubernetes 金丝雀

# 稳定版 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-stable
spec:
  replicas: 9
  selector:
    matchLabels:
      app: myapp
      track: stable
  template:
    metadata:
      labels:
        app: myapp
        track: stable
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:1.2.3

---
# 金丝雀 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
      track: canary
  template:
    metadata:
      labels:
        app: myapp
        track: canary
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:1.3.0

---
# Service 同时匹配 stable 和 canary
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp  # 不含 track，匹配两者
  ports:
    - port: 80
      targetPort: 8080

5.4 监控与自动回滚

Prometheus 告警规则（金丝雀指标监控）：

# prometheus-rules.yaml
groups:
  - name: canary-monitoring
    rules:
      - alert: CanaryHighErrorRate
        expr: |
          sum(rate(http_requests_total{track="canary",status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total{track="canary"}[5m]))
          > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "金丝雀错误率超过 5%"
          description: "金丝雀版本错误率 {{ $value | humanizePercentage }}，建议回滚"

      - alert: CanaryHighLatency
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_request_duration_seconds_bucket{track="canary"}[5m])) by (le)
          ) > 2
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "金丝雀 P95 延迟超过 2 秒"

自动回滚脚本：

#!/bin/bash
# canary-rollback.sh
ERROR_RATE=$(curl -s 'http://prometheus:9090/api/v1/query' \
  --data-urlencode 'query=sum(rate(http_requests_total{track="canary",status=~"5.."}[5m]))/sum(rate(http_requests_total{track="canary"}[5m]))' \
  | jq -r '.data.result[0].value[1]')

if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
    echo "⚠️ 金丝雀错误率 ${ERROR_RATE}，执行回滚"
    kubectl scale deployment app-canary --replicas=0
    kubectl scale deployment app-stable --replicas=10
    echo "✅ 回滚完成"
else
    echo "✅ 金丝雀指标正常，错误率 ${ERROR_RATE}"
fi

5.5 Flagger 自动化金丝雀

# Flagger Canary 资源（适用于 Istio/Linkerd/Nginx Ingress）
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  progressDeadlineSeconds: 600
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 5           # 最大失败次数
    maxWeight: 50          # 最大金丝雀权重
    stepWeight: 10         # 每次增加的权重
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99          # 成功率 > 99%
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500         # P99 < 500ms
        interval: 1m
    webhooks:
      - name: load-test
        type: rollout
        url: http://flagger-loadtester.test/
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary.test/"

六、滚动更新

6.1 策略配置

Kubernetes 原生的默认更新策略，逐步用新 Pod 替换旧 Pod。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 3        # 最多多出 3 个 Pod（13 个总 Pod）
      maxUnavailable: 1  # 最多 1 个 Pod 不可用（至少 9 个可用）
  # 等价于：一次更新 3 个 Pod，保证至少 9 个健康
  template:
    spec:
      containers:
        - name: app
          image: registry.example.com/myapp:1.3.0
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

参数调优建议：

场景	maxSurge	maxUnavailable	说明
高可用	25%	0	不允许任何 Pod 不可用
快速发布	50%	25%	更快的更新速度
资源紧张	0	1	先删后建，不超资源配额

6.2 健康检查三件套

containers:
  - name: app
    # 启动探针：慢启动应用专用，避免 liveness 误杀
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      failureThreshold: 30    # 最多等 30*5=150 秒
      periodSeconds: 5

    # 就绪探针：决定是否接收流量
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 1
      failureThreshold: 3     # 连续 3 次失败则摘除

    # 存活探针：决定是否重启容器
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      failureThreshold: 3     # 连续 3 次失败则重启

6.3 回滚机制

# 查看发布历史
kubectl rollout history deployment/myapp

# 回滚到上一版本
kubectl rollout undo deployment/myapp

# 回滚到指定版本
kubectl rollout undo deployment/myapp --to-revision=3

# 查看回滚状态
kubectl rollout status deployment/myapp --timeout=120s

# 暂停发布（发现异常时）
kubectl rollout pause deployment/myapp

# 恢复发布
kubectl rollout resume deployment/myapp

CI/CD 集成回滚：

# GitLab CI 自动回滚
deploy:
  stage: deploy
  script:
    - kubectl apply -f k8s/
    - kubectl rollout status deployment/myapp --timeout=300s || {
        echo "部署超时，自动回滚";
        kubectl rollout undo deployment/myapp;
        kubectl rollout status deployment/myapp --timeout=120s;
        exit 1;
      }

七、回滚策略

7.1 版本管理

语义化版本 + 制品不可变：

版本号: MAJOR.MINOR.PATCH (如 1.3.2)
  MAJOR: 不兼容的 API 变更
  MINOR: 向后兼容的功能新增
  PATCH: 向后兼容的 Bug 修复

制品标签: v1.3.2 / v1.3.2-abc1234 (带 commit hash)

Docker 镜像版本策略：

# ❌ 避免使用 latest
docker build -t myapp:latest .

# ✅ 使用语义化版本 + commit hash
docker build -t myapp:v1.3.2 -t myapp:v1.3.2-abc1234 .

# CI 中自动生成标签
VERSION=$(git describe --tags --always)
docker build -t registry.example.com/myapp:${VERSION} .
docker push registry.example.com/myapp:${VERSION}

7.2 数据库回滚

数据库变更是回滚中最棘手的部分。核心原则：向前兼容，分步执行。

安全的数据库变更模式：

第 1 步：新增列（Additive only）→ 部署新代码
第 2 步：新代码同时读写新旧列 → 验证
第 3 步：新代码只读写新列 → 停止写旧列
第 4 步：删除旧列（在下个版本）

回滚脚本模板：

-- migration/V3__add_email_column.sql (向前)
ALTER TABLE users ADD COLUMN email VARCHAR(255);
UPDATE users SET email = CONCAT(username, '@example.com');
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

-- rollback/V3__rollback_email_column.sql (回滚)
ALTER TABLE users DROP COLUMN IF EXISTS email;

7.3 快速回滚 SOP

#!/bin/bash
# rollback.sh - 生产快速回滚 SOP
set -euo pipefail

APP_NAME="${1:?用法: ./rollback.sh <app-name> [revision]}"
REVISION="${2:-}"

echo "🔄 开始回滚 ${APP_NAME}"

# Step 1: 确认当前版本
echo "当前版本:"
kubectl get deployment ${APP_NAME} -o jsonpath='{.spec.template.spec.containers[0].image}'
echo ""

# Step 2: 执行回滚
if [ -n "$REVISION" ]; then
    kubectl rollout undo deployment/${APP_NAME} --to-revision=${REVISION}
else
    kubectl rollout undo deployment/${APP_NAME}
fi

# Step 3: 等待回滚完成
kubectl rollout status deployment/${APP_NAME} --timeout=180s

# Step 4: 验证
NEW_IMAGE=$(kubectl get deployment ${APP_NAME} -o jsonpath='{.spec.template.spec.containers[0].image}')
echo "✅ 回滚完成，当前镜像: ${NEW_IMAGE}"

# Step 5: 检查 Pod 状态
kubectl get pods -l app=${APP_NAME} -o wide

# Step 6: 健康检查
for i in {1..5}; do
    STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://www.example.com/health || true)
    if [ "$STATUS" = "200" ]; then
        echo "✅ 健康检查通过"
        exit 0
    fi
    echo "等待健康检查... (${i}/5)"
    sleep 5
done
echo "❌ 健康检查失败，请人工介入"
exit 1

八、制品仓库

8.1 主流制品仓库对比

特性	Nexus Repository	JFrog Artifactory	Harbor
类型	通用仓库	通用仓库	Docker 专用
Docker 支持	✅	✅	✅（核心功能）
Maven/Npm/PyPI	✅	✅	❌
免费版	OSS 版免费	CE 版免费	完全开源
高可用	付费版	付费版	✅
适合场景	中小团队	企业级	K8s/容器化

8.2 Nexus 配置示例

# Docker Compose 部署 Nexus
version: "3.8"
services:
  nexus:
    image: sonatype/nexus3:3.68.0
    ports:
      - "8081:8081"
      - "8082:8082"  # Docker registry port
    volumes:
      - nexus-data:/nexus-data
    environment:
      - NEXUS_SECURITY_RANDOMPASSWORD=false

volumes:
  nexus-data:

Nexus Docker Registry 使用：

# 登录
docker login nexus.example.com:8082

# 推送镜像
docker tag myapp:v1.3.0 nexus.example.com:8082/myapp:v1.3.0
docker push nexus.example.com:8082/myapp:v1.3.0

# CI 中使用
# .gitlab-ci.yml
build:
  stage: build
  script:
    - docker login -u $NEXUS_USER -p $NEXUS_PASS nexus.example.com:8082
    - docker build -t nexus.example.com:8082/myapp:${CI_COMMIT_TAG} .
    - docker push nexus.example.com:8082/myapp:${CI_COMMIT_TAG}

8.3 Harbor 配置

# Harbor 安装 (docker-compose)
# 下载 Harbor 离线安装包
wget https://github.com/goharbor/harbor/releases/download/v2.11.0/harbor-offline-installer-v2.11.0.tgz
tar xzf harbor-offline-installer-v2.11.0.tgz
cd harbor

# 编辑 harbor.yml
# hostname: harbor.example.com
# harbor_admin_password: StrongPassword123!
# database.password: dbpassword

./install.sh --with-trivy  # 启用漏洞扫描

Harbor 项目与镜像策略：

# 创建项目
curl -u admin:password -X POST https://harbor.example.com/api/v2.0/projects \
  -H "Content-Type: application/json" \
  -d '{"project_name": "myteam", "public": false}'

# 配置镜像保留策略（保留最近 30 个 tag）
curl -u admin:password -X POST \
  "https://harbor.example.com/api/v2.0/projects/myteam/repositories/myapp/policies" \
  -H "Content-Type: application/json" \
  -d '{
    "rules": [{
      "template": "latestPushedK",
      "params": {"latestPushedK": 30},
      "tag_selectors": [{"kind": "wildcard", "decoration": "matches", "pattern": "**"}],
      "repo_selectors": [{"kind": "doublestar", "decoration": "matches", "pattern": "**"}]
    }]
  }'

8.4 制品版本策略与垃圾回收

# GitLab CI 制品版本管理
variables:
  IMAGE_REGISTRY: harbor.example.com/myteam
  IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}

build:
  stage: build
  script:
    - docker build -t ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} .
    - docker push ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG}
    # 额外推送 latest（仅 main 分支）
    - if [ "$CI_COMMIT_BRANCH" = "main" ]; then
        docker tag ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} ${IMAGE_REGISTRY}/myapp:latest;
        docker push ${IMAGE_REGISTRY}/myapp:latest;
      fi
  rules:
    - if: $CI_COMMIT_TAG
    - if: $CI_COMMIT_BRANCH == "main"

Harbor 垃圾回收：

# 手动触发垃圾回收
curl -u admin:password -X POST \
  "https://harbor.example.com/api/v2.0/system/gc/schedule" \
  -H "Content-Type: application/json" \
  -d '{
    "parameters": {"delete_untagged": true},
    "schedule": {"type": "Weekly", "weekday": 1, "offtime": 0}
  }'

九、数据库迁移

9.1 Flyway

Flyway 使用纯 SQL 脚本，简单直接，学习成本低。

目录结构：

db/migration/
  V1__create_users_table.sql
  V2__add_email_column.sql
  V3__create_orders_table.sql
  V4__add_index_on_email.sql

SQL 示例：

-- V1__create_users_table.sql
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    username VARCHAR(100) NOT NULL UNIQUE,
    email VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- V2__add_email_column.sql
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX idx_users_phone ON users(phone);

Flyway 集成到 CI/CD：

# GitLab CI 数据库迁移
db:migrate:
  stage: migrate
  image: flyway/flyway:10
  script:
    - flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
             -user=${DB_USER}
             -password=${DB_PASS}
             -locations=filesystem:./db/migration
             migrate
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      changes:
        - "db/migration/**/*"

db:info:
  stage: check
  image: flyway/flyway:10
  script:
    - flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
             -user=${DB_USER}
             -password=${DB_PASS}
             -locations=filesystem:./db/migration
             info

db:repair:
  stage: fix
  image: flyway/flyway:10
  script:
    - flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
             -user=${DB_USER}
             -password=${DB_PASS}
             repair
  when: manual

9.2 Liquibase

Liquibase 使用 XML/YAML/JSON 描述变更，支持自动回滚。

changelog.yaml：

databaseChangeLog:
  - changeSet:
      id: 1
      author: devteam
      changes:
        - createTable:
            tableName: users
            columns:
              - column:
                  name: id
                  type: bigint
                  autoIncrement: true
                  constraints:
                    primaryKey: true
                    nullable: false
              - column:
                  name: username
                  type: varchar(100)
                  constraints:
                    nullable: false
                    unique: true
      rollback:
        - dropTable:
            tableName: users

  - changeSet:
      id: 2
      author: devteam
      changes:
        - addColumn:
            tableName: users
            columns:
              - column:
                  name: phone
                  type: varchar(20)
        - createIndex:
            tableName: users
            indexName: idx_users_phone
            columns:
              - column:
                  name: phone
      rollback:
        - dropIndex:
            tableName: users
            indexName: idx_users_phone
        - dropColumn:
            tableName: users
            columnName: phone

9.3 迁移策略

Golden Rules：

只追加，不修改 — 不要 ALTER 已有列的数据类型
向后兼容 — 新代码必须能跑在旧 schema 上
小步快走 — 每次迁移只做一个变更
先扩后缩 — 新增列 → 迁移数据 → 去掉旧列（跨版本完成）

CI 集成模式：

代码合并 → CI 构建镜像 → 先跑 DB Migration → 再部署应用 → 健康检查
                          ↑ 数据库变更先于应用部署

十、安全与合规

10.1 SAST / DAST / SCA 集成

SAST（静态应用安全测试）— 代码层面：

# GitLab CI 集成 Semgrep
sast:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=auto --json --output=semgrep-results.json .
  artifacts:
    reports:
      sast: semgrep-results.json
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

SCA（软件成分分析）— 依赖层面：

# GitLab CI 集成 Trivy 扫描依赖漏洞
dependency-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --format json --output trivy-results.json --severity HIGH,CRITICAL .
    - trivy fs --exit-code 1 --severity CRITICAL .  # 高危漏洞直接失败
  artifacts:
    reports:
      dependency_scanning: trivy-results.json

# Docker 镜像漏洞扫描
image-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy image --exit-code 1 --severity CRITICAL ${IMAGE_REGISTRY}/myapp:${CI_COMMIT_TAG}

DAST（动态应用安全测试）— 运行层面：

# 集成 OWASP ZAP
dast:
  stage: security
  image: ghcr.io/zaproxy/zaproxy:stable
  script:
    - zap-baseline.py -t https://staging.example.com -r zap-report.html
  artifacts:
    paths:
      - zap-report.html
    when: always
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

10.2 审批流程

# GitLab CI 多级审批
deploy:staging:
  stage: deploy
  environment:
    name: staging
  script:
    - ./deploy.sh staging
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy:prod:
  stage: deploy
  environment:
    name: production
    deployment_tier: production
  script:
    - ./deploy.sh production
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual
  allow_failure: false

# GitLab Protected Environment 审批
# Settings → CI/CD → Protected Environments
# 添加 production 环境保护
# 允许部署的角色: Maintainer
# 所需审批人数: 2

10.3 审计日志

# 部署审计记录
deploy:prod:
  stage: deploy
  script:
    - |
      # 记录部署审计日志
      AUDIT_LOG=$(cat <<EOF
      {
        "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
        "action": "deploy",
        "environment": "production",
        "version": "${CI_COMMIT_TAG}",
        "deployer": "${GITLAB_USER_LOGIN}",
        "pipeline_id": "${CI_PIPELINE_ID}",
        "commit_sha": "${CI_COMMIT_SHA}",
        "commit_message": "$(git log -1 --pretty=%B)"
      }
      EOF
      )
      echo "$AUDIT_LOG" >> /var/log/deploy-audit.jsonl
      # 推送到集中式日志
      curl -X POST "https://elasticsearch:9200/deploy-audit/_doc" \
        -H "Content-Type: application/json" \
        -d "$AUDIT_LOG"
    - ./deploy.sh production

十一、完整流水线模板

11.1 前端项目（React/Vue + Docker）

# .gitlab-ci.yml - 前端项目完整流水线
variables:
  IMAGE_REGISTRY: harbor.example.com/frontend
  IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}
  NODE_VERSION: "20"

stages:
  - install
  - lint
  - test
  - build
  - security
  - docker
  - deploy

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - node_modules/
    - .npm/

# ── 安装依赖 ──
install:
  stage: install
  image: node:${NODE_VERSION}-alpine
  script:
    - npm ci --prefer-offline
  artifacts:
    paths:
      - node_modules/
    expire_in: 1 hour

# ── 代码检查 ──
lint:
  stage: lint
  image: node:${NODE_VERSION}-alpine
  needs: [install]
  script:
    - npm run lint
    - npm run type-check

# ── 单元测试 ──
test:unit:
  stage: test
  image: node:${NODE_VERSION}-alpine
  needs: [install]
  script:
    - npm run test:unit -- --coverage
  coverage: '/Lines\s*:\s*(\d+\.?\d*)%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
      junit: junit.xml

# ── 构建 ──
build:
  stage: build
  image: node:${NODE_VERSION}-alpine
  needs: [install]
  script:
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 7 days

# ── 安全扫描 ──
sast:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=auto --json --output=sast.json .
  artifacts:
    reports:
      sast: sast.json
  allow_failure: true

dependency-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --exit-code 1 --severity CRITICAL .

# ── Docker 构建推送 ──
docker:build:
  stage: docker
  image: docker:24
  services:
    - docker:24-dind
  needs: [build, test:unit]
  script:
    - docker login -u $HARBOR_USER -p $HARBOR_PASS $IMAGE_REGISTRY
    - docker build -t ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} .
    - docker push ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG}
    - |
      if [ "$CI_COMMIT_BRANCH" = "main" ]; then
        docker tag ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} ${IMAGE_REGISTRY}/myapp:latest
        docker push ${IMAGE_REGISTRY}/myapp:latest
      fi
  rules:
    - if: $CI_COMMIT_TAG
    - if: $CI_COMMIT_BRANCH == "main"

# ── 部署 ──
deploy:dev:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build]
  script:
    - kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n dev
    - kubectl rollout status deployment/myapp -n dev --timeout=180s
  environment:
    name: development
    url: https://dev.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy:staging:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build]
  script:
    - kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n staging
    - kubectl rollout status deployment/myapp -n staging --timeout=300s
  environment:
    name: staging
    url: https://staging.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy:prod:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build]
  script:
    - kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n production
    - kubectl rollout status deployment/myapp -n production --timeout=300s
  environment:
    name: production
    url: https://www.example.com
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual

11.2 后端项目（Java Spring Boot）

# .gitlab-ci.yml - Java 后端完整流水线
variables:
  IMAGE_REGISTRY: harbor.example.com/backend
  IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}
  MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"

stages:
  - build
  - test
  - security
  - docker
  - migrate
  - deploy

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - .m2/repository/
    - target/

# ── 编译 ──
build:
  stage: build
  image: maven:3.9-eclipse-temurin-21
  script:
    - mvn clean compile -DskipTests
  artifacts:
    paths:
      - target/

# ── 测试 ──
test:unit:
  stage: test
  image: maven:3.9-eclipse-temurin-21
  needs: [build]
  script:
    - mvn test
  artifacts:
    reports:
      junit: target/surefire-reports/TEST-*.xml
    when: always

test:integration:
  stage: test
  image: maven:3.9-eclipse-temurin-21
  services:
    - postgres:15
    - redis:7
  variables:
    POSTGRES_DB: testdb
    POSTGRES_USER: test
    POSTGRES_PASSWORD: test
    SPRING_DATASOURCE_URL: "jdbc:postgresql://postgres:5432/testdb"
  needs: [build]
  script:
    - mvn verify -P integration-test
  artifacts:
    reports:
      junit: target/failsafe-reports/TEST-*.xml

# ── 安全扫描 ──
sast:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep --config=p/java --json --output=sast.json .
  artifacts:
    reports:
      sast: sast.json
  allow_failure: true

dependency-scan:
  stage: security
  image: aquasec/trivy:latest
  script:
    - trivy fs --exit-code 1 --severity CRITICAL --scanners vuln .

# ── Docker 构建 ──
docker:build:
  stage: docker
  image: docker:24
  services:
    - docker:24-dind
  needs: [build, test:unit, test:integration]
  script:
    - mvn package -DskipTests
    - docker login -u $HARBOR_USER -p $HARBOR_PASS $IMAGE_REGISTRY
    - docker build -t ${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} .
    - docker push ${IMAGE_REGISTRY}/myapi:${IMAGE_TAG}
  rules:
    - if: $CI_COMMIT_TAG
    - if: $CI_COMMIT_BRANCH == "main"

# ── 数据库迁移 ──
db:migrate:
  stage: migrate
  image: flyway/flyway:10
  needs: [docker:build]
  script:
    - flyway -url=jdbc:postgresql://${PROD_DB_HOST}:5432/${PROD_DB_NAME}
             -user=${PROD_DB_USER}
             -password=${PROD_DB_PASS}
             -locations=filesystem:./db/migration
             migrate
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual

# ── 部署 ──
deploy:staging:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build]
  script:
    - kubectl set image deployment/myapi myapi=${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} -n staging
    - kubectl rollout status deployment/myapi -n staging --timeout=300s
  environment:
    name: staging
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy:prod:
  stage: deploy
  image: bitnami/kubectl:latest
  needs: [docker:build, db:migrate]
  script:
    - kubectl set image deployment/myapi myapi=${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} -n production
    - kubectl rollout status deployment/myapi -n production --timeout=300s
  environment:
    name: production
  rules:
    - if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
  when: manual

11.3 Docker 多阶段构建

# Dockerfile - 前端多阶段构建
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline
COPY . .
RUN npm run build

FROM nginx:1.25-alpine AS production
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget -qO- http://localhost/health || exit 1

# Dockerfile - Java 后端多阶段构建
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests

FROM eclipse-temurin:21-jre-alpine AS production
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY --from=builder /app/target/*.jar app.jar
USER app
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget -qO- http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=75.0", "-jar", "app.jar"]

总结

CI/CD 流水线不是一蹴而就的，建议按阶段演进：

阶段 1: 自动化构建 + 单元测试（CI 基础）
阶段 2: 多环境自动部署 + 集成测试
阶段 3: 安全扫描 + 制品管理
阶段 4: 蓝绿/金丝雀发布 + 自动回滚
阶段 5: 全链路监控 + 完善审计

关键心法：

代码不变，配置外置 — 所有环境差异通过配置管理
制品不可变 — 同一个镜像从测试跑到生产
向前兼容 — 数据库变更永远向前兼容
小步快跑 — 频繁小发布优于偶尔大发布
快速回滚 — 回滚能力比发布能力更重要

💡 本文配置均可直接复用，根据实际项目调整参数即可。建议收藏备查。

如果觉得文章对你有用，请随意赞赏

运维

服务器运维笔记：CI/CD 流水线设计

https://acf1sh.top/console/overview/archives/fu-wu-qi-yun-wei-bi-ji-ci-cd-liu-shui-xian-she-ji

作者

fish

发布于

2026-06-16

更新于

2026-06-10

许可协议

CC BY 4.0

服务器运维笔记：CI/CD 流水线设计

一、CI/CD 核心概念

1.1 持续集成（Continuous Integration）

1.2 持续交付（Continuous Delivery）

1.3 持续部署（Continuous Deployment）

1.4 三者关系

二、流水线设计原则

2.1 快速反馈

2.2 并行化

2.3 幂等性

2.4 可回滚

三、多环境部署

3.1 环境分层

3.2 环境差异管理

3.3 配置注入

四、蓝绿部署

4.1 原理

4.2 Nginx 实现

4.3 Docker Compose 实现

4.4 Kubernetes 实现

五、金丝雀发布

5.1 原理

5.2 Nginx 流量控制

5.3 Kubernetes 金丝雀

5.4 监控与自动回滚

5.5 Flagger 自动化金丝雀

六、滚动更新

6.1 策略配置

6.2 健康检查三件套

6.3 回滚机制

七、回滚策略

7.1 版本管理

7.2 数据库回滚

7.3 快速回滚 SOP

八、制品仓库

8.1 主流制品仓库对比

8.2 Nexus 配置示例

8.3 Harbor 配置

8.4 制品版本策略与垃圾回收

九、数据库迁移

9.1 Flyway

9.2 Liquibase

9.3 迁移策略

十、安全与合规

10.1 SAST / DAST / SCA 集成

10.2 审批流程

10.3 审计日志

十一、完整流水线模板

11.1 前端项目（React/Vue + Docker）

11.2 后端项目（Java Spring Boot）

11.3 Docker 多阶段构建

总结

作者

发布于

更新于

许可协议

评论