本文是服务器运维系列第 16 篇,聚焦 CI/CD 流水线的架构设计、多环境部署策略、发布模式(蓝绿/金丝雀/滚动更新)、回滚机制、制品管理、数据库迁移和安全合规。每个章节附带可直接复用的配置模板。
一、CI/CD 核心概念
很多人把 CI、CD 混为一谈,其实它们是三个递进的阶段。
1.1 持续集成(Continuous Integration)
开发人员频繁(每天至少一次)将代码合并到主干,每次合并都触发自动化构建和测试。
核心价值: 尽早发现问题。代码冲突、编译错误、单元测试失败在合并后几分钟内暴露,而不是几天后。
典型流程:
代码提交 → 触发 CI → 编译 → 单元测试 → 静态分析 → 构建产物 → 通知1.2 持续交付(Continuous Delivery)
在 CI 基础上,确保代码随时可以部署到生产环境。部署到生产需要手动审批。
核心价值: 部署能力回归业务决策。技术上随时可发,业务上选择合适时机。
1.3 持续部署(Continuous Deployment)
在持续交付基础上更进一步:代码通过所有测试后自动部署到生产,零人工干预。
核心价值: 最快的交付速度,适合成熟团队和高自动化水平的系统。
1.4 三者关系
持续集成 ⊂ 持续交付 ⊂ 持续部署
CI: 代码 → 构建 → 测试 → 制品
CD*: 制品 → 预发布环境 → [手动审批] → 生产环境(持续交付)
CD**: 制品 → 预发布环境 → 自动 → 生产环境(持续部署)选型建议:
初创团队 / 新项目:先做好 CI
中型团队:持续交付(推荐大多数场景)
成熟微服务 + 完善监控:持续部署
二、流水线设计原则
2.1 快速反馈
流水线每一秒都在消耗团队的时间。反馈越快,修复成本越低。
实践要点:
单元测试 < 5 分钟,集成测试 < 15 分钟
失败时立即通知,不要等整条流水线跑完
分层测试:快速检查先行,慢速检查后置
GitLab CI 示例 — 阶段化流水线:
stages:
- lint # 30 秒
- unit-test # 2 分钟
- build # 3 分钟
- integration # 10 分钟
- deploy # 2 分钟
lint:
stage: lint
script:
- npm run lint
- npm run type-check
rules:
- changes:
- "src/**/*"
unit-test:
stage: unit-test
script:
- npm run test:unit -- --coverage
coverage: '/Lines\s*:\s*(\d+\.?\d*)%/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml2.2 并行化
独立任务并行执行,缩短总耗时。
# GitLab CI 并行示例
unit-test:
stage: test
parallel: 4 # 分成 4 个并行 Job
script:
- jest --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
# GitHub Actions 矩阵构建
jobs:
test:
strategy:
matrix:
node-version: [18, 20, 22]
os: [ubuntu-latest, macos-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
- run: npm test2.3 幂等性
流水线多次执行结果一致,不会产生副作用。
# 幂等部署脚本示例
deploy:
stage: deploy
script:
# 检查当前版本,避免重复部署
- CURRENT=$(kubectl get deployment app -o jsonpath='{.metadata.annotations.version}')
- if [ "$CURRENT" = "$CI_COMMIT_SHA" ]; then echo "Already deployed"; exit 0; fi
# 使用声明式部署
- kubectl apply -f k8s/deployment.yaml
- kubectl rollout status deployment/app --timeout=300s2.4 可回滚
任何变更都能快速回退到上一个已知正常版本。
# 保留最近 10 个版本的制品
deploy:
artifacts:
paths:
- dist/
expire_in: 30 days
# 回滚 Job
rollback:
stage: deploy
when: manual
script:
- kubectl rollout undo deployment/app
- kubectl rollout status deployment/app --timeout=120s
environment:
name: production
action: stop三、多环境部署
3.1 环境分层
典型的三环境模型:
dev (开发) → staging (预发布) → production (生产)
↓ ↓ ↓
自动部署 自动部署 手动审批后部署
功能验证 集成验证 线上验证
随时重置 接近生产 高可用保障3.2 环境差异管理
原则:代码不变,配置外置。
GitLab CI 多环境模板:
.deploy_template: &deploy_template
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl config use-context $KUBE_CONTEXT
- envsubst < k8s/deployment.template.yaml | kubectl apply -f -
- kubectl rollout status deployment/$APP_NAME -n $NAMESPACE --timeout=300s
variables:
APP_NAME: my-service
deploy:dev:
<<: *deploy_template
variables:
KUBE_CONTEXT: dev-cluster
NAMESPACE: dev
REPLICAS: "1"
CPU_LIMIT: "500m"
MEMORY_LIMIT: "512Mi"
LOG_LEVEL: "debug"
environment:
name: development
url: https://dev.example.com
rules:
- if: $CI_COMMIT_BRANCH == "develop"
deploy:staging:
<<: *deploy_template
variables:
KUBE_CONTEXT: staging-cluster
NAMESPACE: staging
REPLICAS: "2"
CPU_LIMIT: "1000m"
MEMORY_LIMIT: "1Gi"
LOG_LEVEL: "info"
environment:
name: staging
url: https://staging.example.com
rules:
- if: $CI_COMMIT_BRANCH == "main"
deploy:prod:
<<: *deploy_template
variables:
KUBE_CONTEXT: prod-cluster
NAMESPACE: production
REPLICAS: "5"
CPU_LIMIT: "2000m"
MEMORY_LIMIT: "2Gi"
LOG_LEVEL: "warn"
environment:
name: production
url: https://www.example.com
rules:
- if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
when: manual
allow_failure: false3.3 配置注入
方式一:Kubernetes ConfigMap + Secret
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: ${NAMESPACE}
data:
DATABASE_HOST: "${DB_HOST}"
REDIS_URL: "redis://${REDIS_HOST}:6379"
LOG_LEVEL: "${LOG_LEVEL}"
FEATURE_FLAG_NEW_UI: "true"
---
# k8s/secret.yaml (由 CI 变量注入)
apiVersion: v1
kind: Secret
metadata:
name: app-secret
namespace: ${NAMESPACE}
type: Opaque
data:
DATABASE_PASSWORD: "${DB_PASSWORD_B64}"
API_KEY: "${API_KEY_B64}"方式二:Vault 动态密钥
# 使用 Vault Agent Sidecar 注入
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "my-service"
vault.hashicorp.com/agent-inject-secret-db: "database/creds/my-service"
vault.hashicorp.com/agent-inject-template-db: |
{{- with secret "database/creds/my-service" -}}
DATABASE_URL=postgresql://{{ .Data.username }}:{{ .Data.password }}@db:5432/mydb
{{- end }}四、蓝绿部署
4.1 原理
同时维护两套完全相同的生产环境(蓝和绿),任何时候只有一套对外服务。发布新版本时部署到闲置环境,验证通过后切换流量,出问题立即切回。
┌─────────────┐
用户请求 ───→ │ 负载均衡器 │
└──────┬──────┘
│
┌──────────┴──────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ 蓝(当前) │ │ 绿(新) │
│ v1.2.3 │ │ v1.3.0 │
└──────────┘ └──────────┘
发布前:流量 → 蓝
发布后:流量 → 绿(蓝保留作回滚)4.2 Nginx 实现
# /etc/nginx/conf.d/blue-green.conf
upstream app {
# 切换这一行即可完成蓝绿切换
server 10.0.1.10:8080; # 蓝环境
# server 10.0.2.10:8080; # 绿环境
}
server {
listen 80;
server_name www.example.com;
location / {
proxy_pass http://app;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}自动化切换脚本:
#!/bin/bash
# switch-env.sh - 蓝绿切换脚本
CURRENT=$(grep -oP 'server \K[\d.]+' /etc/nginx/conf.d/blue-green.conf)
BLUE="10.0.1.10"
GREEN="10.0.2.10"
if [ "$CURRENT" = "$BLUE" ]; then
NEW_TARGET=$GREEN
ENV_NAME="green"
else
NEW_TARGET=$BLUE
ENV_NAME="blue"
fi
# 预检:确认新环境健康
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://${NEW_TARGET}:8080/health)
if [ "$HTTP_CODE" != "200" ]; then
echo "❌ 新环境健康检查失败 (HTTP $HTTP_CODE),中止切换"
exit 1
fi
# 执行切换
sed -i "s/server ${CURRENT}:8080/server ${NEW_TARGET}:8080/" /etc/nginx/conf.d/blue-green.conf
nginx -t && nginx -s reload
echo "✅ 已切换到 ${ENV_NAME} 环境 (${NEW_TARGET})"
echo "回滚命令: sed -i 's/${NEW_TARGET}/${CURRENT}/' /etc/nginx/conf.d/blue-green.conf && nginx -s reload"4.3 Docker Compose 实现
# docker-compose.blue-green.yml
version: "3.8"
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- app-blue
- app-green
app-blue:
image: registry.example.com/myapp:${BLUE_VERSION:-1.2.3}
environment:
- NODE_ENV=production
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 3
app-green:
image: registry.example.com/myapp:${GREEN_VERSION:-1.3.0}
environment:
- NODE_ENV=production
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 34.4 Kubernetes 实现
# k8s/blue-green-deployment.yaml
# 蓝环境(当前版本)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
labels:
app: myapp
version: blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: app
image: registry.example.com/myapp:1.2.3
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
---
# Service 通过 selector 切换蓝绿
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
version: blue # 改成 green 即切换
ports:
- port: 80
targetPort: 8080切换命令:
# 切换到绿环境
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'
# 回滚到蓝环境
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'五、金丝雀发布
5.1 原理
将新版本逐步推送给一小部分用户,观察指标无异常后再全量发布。像矿井中的金丝雀一样,先让"探路者"验证安全性。
流量分配:
┌────────────────────────────────────────┐
│ ████████████████████████████░░░░ 90% │ ← 旧版本 (stable)
│ ████████░░░░░░░░░░░░░░░░░░░░ 10% │ ← 新版本 (canary)
└────────────────────────────────────────┘
时间线:
0% → 10% → 25% → 50% → 100%
│ │ │ │ │
观察 观察 观察 观察 全量发布
15min 15min 15min 15min5.2 Nginx 流量控制
# 基于权重的金丝雀
upstream app {
server 10.0.1.10:8080 weight=90; # 稳定版
server 10.0.2.10:8080 weight=10; # 金丝雀
}
# 基于 Header 的金丝雀(内部测试用户)
map $http_x_canary $backend {
"true" canary_backend;
default stable_backend;
}
upstream stable_backend {
server 10.0.1.10:8080;
}
upstream canary_backend {
server 10.0.2.10:8080;
}
server {
listen 80;
location / {
proxy_pass http://$backend;
}
}5.3 Kubernetes 金丝雀
# 稳定版 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-stable
spec:
replicas: 9
selector:
matchLabels:
app: myapp
track: stable
template:
metadata:
labels:
app: myapp
track: stable
spec:
containers:
- name: app
image: registry.example.com/myapp:1.2.3
---
# 金丝雀 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-canary
spec:
replicas: 1
selector:
matchLabels:
app: myapp
track: canary
template:
metadata:
labels:
app: myapp
track: canary
spec:
containers:
- name: app
image: registry.example.com/myapp:1.3.0
---
# Service 同时匹配 stable 和 canary
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp # 不含 track,匹配两者
ports:
- port: 80
targetPort: 80805.4 监控与自动回滚
Prometheus 告警规则(金丝雀指标监控):
# prometheus-rules.yaml
groups:
- name: canary-monitoring
rules:
- alert: CanaryHighErrorRate
expr: |
sum(rate(http_requests_total{track="canary",status=~"5.."}[5m]))
/
sum(rate(http_requests_total{track="canary"}[5m]))
> 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "金丝雀错误率超过 5%"
description: "金丝雀版本错误率 {{ $value | humanizePercentage }},建议回滚"
- alert: CanaryHighLatency
expr: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket{track="canary"}[5m])) by (le)
) > 2
for: 3m
labels:
severity: warning
annotations:
summary: "金丝雀 P95 延迟超过 2 秒"自动回滚脚本:
#!/bin/bash
# canary-rollback.sh
ERROR_RATE=$(curl -s 'http://prometheus:9090/api/v1/query' \
--data-urlencode 'query=sum(rate(http_requests_total{track="canary",status=~"5.."}[5m]))/sum(rate(http_requests_total{track="canary"}[5m]))' \
| jq -r '.data.result[0].value[1]')
if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
echo "⚠️ 金丝雀错误率 ${ERROR_RATE},执行回滚"
kubectl scale deployment app-canary --replicas=0
kubectl scale deployment app-stable --replicas=10
echo "✅ 回滚完成"
else
echo "✅ 金丝雀指标正常,错误率 ${ERROR_RATE}"
fi5.5 Flagger 自动化金丝雀
# Flagger Canary 资源(适用于 Istio/Linkerd/Nginx Ingress)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
progressDeadlineSeconds: 600
service:
port: 80
analysis:
interval: 1m
threshold: 5 # 最大失败次数
maxWeight: 50 # 最大金丝雀权重
stepWeight: 10 # 每次增加的权重
metrics:
- name: request-success-rate
thresholdRange:
min: 99 # 成功率 > 99%
interval: 1m
- name: request-duration
thresholdRange:
max: 500 # P99 < 500ms
interval: 1m
webhooks:
- name: load-test
type: rollout
url: http://flagger-loadtester.test/
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary.test/"六、滚动更新
6.1 策略配置
Kubernetes 原生的默认更新策略,逐步用新 Pod 替换旧 Pod。
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 3 # 最多多出 3 个 Pod(13 个总 Pod)
maxUnavailable: 1 # 最多 1 个 Pod 不可用(至少 9 个可用)
# 等价于:一次更新 3 个 Pod,保证至少 9 个健康
template:
spec:
containers:
- name: app
image: registry.example.com/myapp:1.3.0
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10参数调优建议:
6.2 健康检查三件套
containers:
- name: app
# 启动探针:慢启动应用专用,避免 liveness 误杀
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # 最多等 30*5=150 秒
periodSeconds: 5
# 就绪探针:决定是否接收流量
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 3 # 连续 3 次失败则摘除
# 存活探针:决定是否重启容器
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3 # 连续 3 次失败则重启6.3 回滚机制
# 查看发布历史
kubectl rollout history deployment/myapp
# 回滚到上一版本
kubectl rollout undo deployment/myapp
# 回滚到指定版本
kubectl rollout undo deployment/myapp --to-revision=3
# 查看回滚状态
kubectl rollout status deployment/myapp --timeout=120s
# 暂停发布(发现异常时)
kubectl rollout pause deployment/myapp
# 恢复发布
kubectl rollout resume deployment/myappCI/CD 集成回滚:
# GitLab CI 自动回滚
deploy:
stage: deploy
script:
- kubectl apply -f k8s/
- kubectl rollout status deployment/myapp --timeout=300s || {
echo "部署超时,自动回滚";
kubectl rollout undo deployment/myapp;
kubectl rollout status deployment/myapp --timeout=120s;
exit 1;
}七、回滚策略
7.1 版本管理
语义化版本 + 制品不可变:
版本号: MAJOR.MINOR.PATCH (如 1.3.2)
MAJOR: 不兼容的 API 变更
MINOR: 向后兼容的功能新增
PATCH: 向后兼容的 Bug 修复
制品标签: v1.3.2 / v1.3.2-abc1234 (带 commit hash)Docker 镜像版本策略:
# ❌ 避免使用 latest
docker build -t myapp:latest .
# ✅ 使用语义化版本 + commit hash
docker build -t myapp:v1.3.2 -t myapp:v1.3.2-abc1234 .
# CI 中自动生成标签
VERSION=$(git describe --tags --always)
docker build -t registry.example.com/myapp:${VERSION} .
docker push registry.example.com/myapp:${VERSION}7.2 数据库回滚
数据库变更是回滚中最棘手的部分。核心原则:向前兼容,分步执行。
安全的数据库变更模式:
第 1 步:新增列(Additive only)→ 部署新代码
第 2 步:新代码同时读写新旧列 → 验证
第 3 步:新代码只读写新列 → 停止写旧列
第 4 步:删除旧列(在下个版本)回滚脚本模板:
-- migration/V3__add_email_column.sql (向前)
ALTER TABLE users ADD COLUMN email VARCHAR(255);
UPDATE users SET email = CONCAT(username, '@example.com');
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
-- rollback/V3__rollback_email_column.sql (回滚)
ALTER TABLE users DROP COLUMN IF EXISTS email;7.3 快速回滚 SOP
#!/bin/bash
# rollback.sh - 生产快速回滚 SOP
set -euo pipefail
APP_NAME="${1:?用法: ./rollback.sh <app-name> [revision]}"
REVISION="${2:-}"
echo "🔄 开始回滚 ${APP_NAME}"
# Step 1: 确认当前版本
echo "当前版本:"
kubectl get deployment ${APP_NAME} -o jsonpath='{.spec.template.spec.containers[0].image}'
echo ""
# Step 2: 执行回滚
if [ -n "$REVISION" ]; then
kubectl rollout undo deployment/${APP_NAME} --to-revision=${REVISION}
else
kubectl rollout undo deployment/${APP_NAME}
fi
# Step 3: 等待回滚完成
kubectl rollout status deployment/${APP_NAME} --timeout=180s
# Step 4: 验证
NEW_IMAGE=$(kubectl get deployment ${APP_NAME} -o jsonpath='{.spec.template.spec.containers[0].image}')
echo "✅ 回滚完成,当前镜像: ${NEW_IMAGE}"
# Step 5: 检查 Pod 状态
kubectl get pods -l app=${APP_NAME} -o wide
# Step 6: 健康检查
for i in {1..5}; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://www.example.com/health || true)
if [ "$STATUS" = "200" ]; then
echo "✅ 健康检查通过"
exit 0
fi
echo "等待健康检查... (${i}/5)"
sleep 5
done
echo "❌ 健康检查失败,请人工介入"
exit 1八、制品仓库
8.1 主流制品仓库对比
8.2 Nexus 配置示例
# Docker Compose 部署 Nexus
version: "3.8"
services:
nexus:
image: sonatype/nexus3:3.68.0
ports:
- "8081:8081"
- "8082:8082" # Docker registry port
volumes:
- nexus-data:/nexus-data
environment:
- NEXUS_SECURITY_RANDOMPASSWORD=false
volumes:
nexus-data:Nexus Docker Registry 使用:
# 登录
docker login nexus.example.com:8082
# 推送镜像
docker tag myapp:v1.3.0 nexus.example.com:8082/myapp:v1.3.0
docker push nexus.example.com:8082/myapp:v1.3.0
# CI 中使用
# .gitlab-ci.yml
build:
stage: build
script:
- docker login -u $NEXUS_USER -p $NEXUS_PASS nexus.example.com:8082
- docker build -t nexus.example.com:8082/myapp:${CI_COMMIT_TAG} .
- docker push nexus.example.com:8082/myapp:${CI_COMMIT_TAG}8.3 Harbor 配置
# Harbor 安装 (docker-compose)
# 下载 Harbor 离线安装包
wget https://github.com/goharbor/harbor/releases/download/v2.11.0/harbor-offline-installer-v2.11.0.tgz
tar xzf harbor-offline-installer-v2.11.0.tgz
cd harbor
# 编辑 harbor.yml
# hostname: harbor.example.com
# harbor_admin_password: StrongPassword123!
# database.password: dbpassword
./install.sh --with-trivy # 启用漏洞扫描Harbor 项目与镜像策略:
# 创建项目
curl -u admin:password -X POST https://harbor.example.com/api/v2.0/projects \
-H "Content-Type: application/json" \
-d '{"project_name": "myteam", "public": false}'
# 配置镜像保留策略(保留最近 30 个 tag)
curl -u admin:password -X POST \
"https://harbor.example.com/api/v2.0/projects/myteam/repositories/myapp/policies" \
-H "Content-Type: application/json" \
-d '{
"rules": [{
"template": "latestPushedK",
"params": {"latestPushedK": 30},
"tag_selectors": [{"kind": "wildcard", "decoration": "matches", "pattern": "**"}],
"repo_selectors": [{"kind": "doublestar", "decoration": "matches", "pattern": "**"}]
}]
}'8.4 制品版本策略与垃圾回收
# GitLab CI 制品版本管理
variables:
IMAGE_REGISTRY: harbor.example.com/myteam
IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}
build:
stage: build
script:
- docker build -t ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} .
- docker push ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG}
# 额外推送 latest(仅 main 分支)
- if [ "$CI_COMMIT_BRANCH" = "main" ]; then
docker tag ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} ${IMAGE_REGISTRY}/myapp:latest;
docker push ${IMAGE_REGISTRY}/myapp:latest;
fi
rules:
- if: $CI_COMMIT_TAG
- if: $CI_COMMIT_BRANCH == "main"Harbor 垃圾回收:
# 手动触发垃圾回收
curl -u admin:password -X POST \
"https://harbor.example.com/api/v2.0/system/gc/schedule" \
-H "Content-Type: application/json" \
-d '{
"parameters": {"delete_untagged": true},
"schedule": {"type": "Weekly", "weekday": 1, "offtime": 0}
}'九、数据库迁移
9.1 Flyway
Flyway 使用纯 SQL 脚本,简单直接,学习成本低。
目录结构:
db/migration/
V1__create_users_table.sql
V2__add_email_column.sql
V3__create_orders_table.sql
V4__add_index_on_email.sqlSQL 示例:
-- V1__create_users_table.sql
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(100) NOT NULL UNIQUE,
email VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- V2__add_email_column.sql
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX idx_users_phone ON users(phone);Flyway 集成到 CI/CD:
# GitLab CI 数据库迁移
db:migrate:
stage: migrate
image: flyway/flyway:10
script:
- flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
-user=${DB_USER}
-password=${DB_PASS}
-locations=filesystem:./db/migration
migrate
rules:
- if: $CI_COMMIT_BRANCH == "main"
changes:
- "db/migration/**/*"
db:info:
stage: check
image: flyway/flyway:10
script:
- flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
-user=${DB_USER}
-password=${DB_PASS}
-locations=filesystem:./db/migration
info
db:repair:
stage: fix
image: flyway/flyway:10
script:
- flyway -url=jdbc:postgresql://${DB_HOST}:5432/${DB_NAME}
-user=${DB_USER}
-password=${DB_PASS}
repair
when: manual9.2 Liquibase
Liquibase 使用 XML/YAML/JSON 描述变更,支持自动回滚。
changelog.yaml:
databaseChangeLog:
- changeSet:
id: 1
author: devteam
changes:
- createTable:
tableName: users
columns:
- column:
name: id
type: bigint
autoIncrement: true
constraints:
primaryKey: true
nullable: false
- column:
name: username
type: varchar(100)
constraints:
nullable: false
unique: true
rollback:
- dropTable:
tableName: users
- changeSet:
id: 2
author: devteam
changes:
- addColumn:
tableName: users
columns:
- column:
name: phone
type: varchar(20)
- createIndex:
tableName: users
indexName: idx_users_phone
columns:
- column:
name: phone
rollback:
- dropIndex:
tableName: users
indexName: idx_users_phone
- dropColumn:
tableName: users
columnName: phone9.3 迁移策略
Golden Rules:
只追加,不修改 — 不要 ALTER 已有列的数据类型
向后兼容 — 新代码必须能跑在旧 schema 上
小步快走 — 每次迁移只做一个变更
先扩后缩 — 新增列 → 迁移数据 → 去掉旧列(跨版本完成)
CI 集成模式:
代码合并 → CI 构建镜像 → 先跑 DB Migration → 再部署应用 → 健康检查
↑ 数据库变更先于应用部署十、安全与合规
10.1 SAST / DAST / SCA 集成
SAST(静态应用安全测试)— 代码层面:
# GitLab CI 集成 Semgrep
sast:
stage: security
image: returntocorp/semgrep
script:
- semgrep --config=auto --json --output=semgrep-results.json .
artifacts:
reports:
sast: semgrep-results.json
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"SCA(软件成分分析)— 依赖层面:
# GitLab CI 集成 Trivy 扫描依赖漏洞
dependency-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy fs --format json --output trivy-results.json --severity HIGH,CRITICAL .
- trivy fs --exit-code 1 --severity CRITICAL . # 高危漏洞直接失败
artifacts:
reports:
dependency_scanning: trivy-results.json
# Docker 镜像漏洞扫描
image-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy image --exit-code 1 --severity CRITICAL ${IMAGE_REGISTRY}/myapp:${CI_COMMIT_TAG}DAST(动态应用安全测试)— 运行层面:
# 集成 OWASP ZAP
dast:
stage: security
image: ghcr.io/zaproxy/zaproxy:stable
script:
- zap-baseline.py -t https://staging.example.com -r zap-report.html
artifacts:
paths:
- zap-report.html
when: always
rules:
- if: $CI_COMMIT_BRANCH == "main"10.2 审批流程
# GitLab CI 多级审批
deploy:staging:
stage: deploy
environment:
name: staging
script:
- ./deploy.sh staging
rules:
- if: $CI_COMMIT_BRANCH == "main"
deploy:prod:
stage: deploy
environment:
name: production
deployment_tier: production
script:
- ./deploy.sh production
rules:
- if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
when: manual
allow_failure: false
# GitLab Protected Environment 审批
# Settings → CI/CD → Protected Environments
# 添加 production 环境保护
# 允许部署的角色: Maintainer
# 所需审批人数: 210.3 审计日志
# 部署审计记录
deploy:prod:
stage: deploy
script:
- |
# 记录部署审计日志
AUDIT_LOG=$(cat <<EOF
{
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"action": "deploy",
"environment": "production",
"version": "${CI_COMMIT_TAG}",
"deployer": "${GITLAB_USER_LOGIN}",
"pipeline_id": "${CI_PIPELINE_ID}",
"commit_sha": "${CI_COMMIT_SHA}",
"commit_message": "$(git log -1 --pretty=%B)"
}
EOF
)
echo "$AUDIT_LOG" >> /var/log/deploy-audit.jsonl
# 推送到集中式日志
curl -X POST "https://elasticsearch:9200/deploy-audit/_doc" \
-H "Content-Type: application/json" \
-d "$AUDIT_LOG"
- ./deploy.sh production十一、完整流水线模板
11.1 前端项目(React/Vue + Docker)
# .gitlab-ci.yml - 前端项目完整流水线
variables:
IMAGE_REGISTRY: harbor.example.com/frontend
IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}
NODE_VERSION: "20"
stages:
- install
- lint
- test
- build
- security
- docker
- deploy
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
- .npm/
# ── 安装依赖 ──
install:
stage: install
image: node:${NODE_VERSION}-alpine
script:
- npm ci --prefer-offline
artifacts:
paths:
- node_modules/
expire_in: 1 hour
# ── 代码检查 ──
lint:
stage: lint
image: node:${NODE_VERSION}-alpine
needs: [install]
script:
- npm run lint
- npm run type-check
# ── 单元测试 ──
test:unit:
stage: test
image: node:${NODE_VERSION}-alpine
needs: [install]
script:
- npm run test:unit -- --coverage
coverage: '/Lines\s*:\s*(\d+\.?\d*)%/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
junit: junit.xml
# ── 构建 ──
build:
stage: build
image: node:${NODE_VERSION}-alpine
needs: [install]
script:
- npm run build
artifacts:
paths:
- dist/
expire_in: 7 days
# ── 安全扫描 ──
sast:
stage: security
image: returntocorp/semgrep
script:
- semgrep --config=auto --json --output=sast.json .
artifacts:
reports:
sast: sast.json
allow_failure: true
dependency-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy fs --exit-code 1 --severity CRITICAL .
# ── Docker 构建推送 ──
docker:build:
stage: docker
image: docker:24
services:
- docker:24-dind
needs: [build, test:unit]
script:
- docker login -u $HARBOR_USER -p $HARBOR_PASS $IMAGE_REGISTRY
- docker build -t ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} .
- docker push ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG}
- |
if [ "$CI_COMMIT_BRANCH" = "main" ]; then
docker tag ${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} ${IMAGE_REGISTRY}/myapp:latest
docker push ${IMAGE_REGISTRY}/myapp:latest
fi
rules:
- if: $CI_COMMIT_TAG
- if: $CI_COMMIT_BRANCH == "main"
# ── 部署 ──
deploy:dev:
stage: deploy
image: bitnami/kubectl:latest
needs: [docker:build]
script:
- kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n dev
- kubectl rollout status deployment/myapp -n dev --timeout=180s
environment:
name: development
url: https://dev.example.com
rules:
- if: $CI_COMMIT_BRANCH == "develop"
deploy:staging:
stage: deploy
image: bitnami/kubectl:latest
needs: [docker:build]
script:
- kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n staging
- kubectl rollout status deployment/myapp -n staging --timeout=300s
environment:
name: staging
url: https://staging.example.com
rules:
- if: $CI_COMMIT_BRANCH == "main"
deploy:prod:
stage: deploy
image: bitnami/kubectl:latest
needs: [docker:build]
script:
- kubectl set image deployment/myapp myapp=${IMAGE_REGISTRY}/myapp:${IMAGE_TAG} -n production
- kubectl rollout status deployment/myapp -n production --timeout=300s
environment:
name: production
url: https://www.example.com
rules:
- if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
when: manual11.2 后端项目(Java Spring Boot)
# .gitlab-ci.yml - Java 后端完整流水线
variables:
IMAGE_REGISTRY: harbor.example.com/backend
IMAGE_TAG: ${CI_COMMIT_TAG:-${CI_COMMIT_SHORT_SHA}}
MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"
stages:
- build
- test
- security
- docker
- migrate
- deploy
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- .m2/repository/
- target/
# ── 编译 ──
build:
stage: build
image: maven:3.9-eclipse-temurin-21
script:
- mvn clean compile -DskipTests
artifacts:
paths:
- target/
# ── 测试 ──
test:unit:
stage: test
image: maven:3.9-eclipse-temurin-21
needs: [build]
script:
- mvn test
artifacts:
reports:
junit: target/surefire-reports/TEST-*.xml
when: always
test:integration:
stage: test
image: maven:3.9-eclipse-temurin-21
services:
- postgres:15
- redis:7
variables:
POSTGRES_DB: testdb
POSTGRES_USER: test
POSTGRES_PASSWORD: test
SPRING_DATASOURCE_URL: "jdbc:postgresql://postgres:5432/testdb"
needs: [build]
script:
- mvn verify -P integration-test
artifacts:
reports:
junit: target/failsafe-reports/TEST-*.xml
# ── 安全扫描 ──
sast:
stage: security
image: returntocorp/semgrep
script:
- semgrep --config=p/java --json --output=sast.json .
artifacts:
reports:
sast: sast.json
allow_failure: true
dependency-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy fs --exit-code 1 --severity CRITICAL --scanners vuln .
# ── Docker 构建 ──
docker:build:
stage: docker
image: docker:24
services:
- docker:24-dind
needs: [build, test:unit, test:integration]
script:
- mvn package -DskipTests
- docker login -u $HARBOR_USER -p $HARBOR_PASS $IMAGE_REGISTRY
- docker build -t ${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} .
- docker push ${IMAGE_REGISTRY}/myapi:${IMAGE_TAG}
rules:
- if: $CI_COMMIT_TAG
- if: $CI_COMMIT_BRANCH == "main"
# ── 数据库迁移 ──
db:migrate:
stage: migrate
image: flyway/flyway:10
needs: [docker:build]
script:
- flyway -url=jdbc:postgresql://${PROD_DB_HOST}:5432/${PROD_DB_NAME}
-user=${PROD_DB_USER}
-password=${PROD_DB_PASS}
-locations=filesystem:./db/migration
migrate
rules:
- if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
when: manual
# ── 部署 ──
deploy:staging:
stage: deploy
image: bitnami/kubectl:latest
needs: [docker:build]
script:
- kubectl set image deployment/myapi myapi=${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} -n staging
- kubectl rollout status deployment/myapi -n staging --timeout=300s
environment:
name: staging
rules:
- if: $CI_COMMIT_BRANCH == "main"
deploy:prod:
stage: deploy
image: bitnami/kubectl:latest
needs: [docker:build, db:migrate]
script:
- kubectl set image deployment/myapi myapi=${IMAGE_REGISTRY}/myapi:${IMAGE_TAG} -n production
- kubectl rollout status deployment/myapi -n production --timeout=300s
environment:
name: production
rules:
- if: $CI_COMMIT_TAG =~ /^v\d+\.\d+\.\d+$/
when: manual11.3 Docker 多阶段构建
# Dockerfile - 前端多阶段构建
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --prefer-offline
COPY . .
RUN npm run build
FROM nginx:1.25-alpine AS production
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD wget -qO- http://localhost/health || exit 1
# Dockerfile - Java 后端多阶段构建
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests
FROM eclipse-temurin:21-jre-alpine AS production
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY --from=builder /app/target/*.jar app.jar
USER app
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD wget -qO- http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=75.0", "-jar", "app.jar"]总结
CI/CD 流水线不是一蹴而就的,建议按阶段演进:
阶段 1: 自动化构建 + 单元测试(CI 基础)
阶段 2: 多环境自动部署 + 集成测试
阶段 3: 安全扫描 + 制品管理
阶段 4: 蓝绿/金丝雀发布 + 自动回滚
阶段 5: 全链路监控 + 完善审计关键心法:
代码不变,配置外置 — 所有环境差异通过配置管理
制品不可变 — 同一个镜像从测试跑到生产
向前兼容 — 数据库变更永远向前兼容
小步快跑 — 频繁小发布优于偶尔大发布
快速回滚 — 回滚能力比发布能力更重要
💡 本文配置均可直接复用,根据实际项目调整参数即可。建议收藏备查。