服务器运维笔记：Node.js 应用部署与运维

服务器运维笔记

本文是服务器运维笔记系列的第十一篇，聚焦 Node.js 应用从开发到生产环境的完整部署与运维实践。涵盖版本管理、进程管理、容器化部署、日志体系、性能调优、内存排查、监控告警、安全加固、CI/CD 集成和常见故障排查，每节均附可直接复用的命令与配置。

一、Node.js 安装与版本管理

1.1 为什么需要版本管理？

生产环境的 Node.js 版本必须可控。不同项目可能依赖不同版本，升级也需要灰度验证。手动编译安装效率低且难以回滚，版本管理工具是刚需。

1.2 nvm（推荐）

nvm 是最流行的 Node.js 版本管理器，支持多版本共存和快速切换。

安装 nvm：

# 官方安装脚本
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash

# 重新加载 shell
source ~/.bashrc

# 验证安装
nvm --version

常用操作：

# 列出可安装版本
nvm ls-remote --lts

# 安装指定版本
nvm install 20.18.0
nvm install 22.12.0

# 安装最新 LTS
nvm install --lts

# 切换版本
nvm use 20.18.0

# 设置默认版本
nvm alias default 20.18.0

# 查看已安装版本
nvm ls

# 在项目目录创建 .nvmrc 文件
echo "20.18.0" > .nvmrc
# 进入目录时自动切换
nvm use

1.3 fnm（高性能替代）

fnm 用 Rust 编写，启动速度比 nvm 快 40 倍，适合追求速度的场景。

# 安装
curl -fsSL https://fnm.vercel.app/install | bash

# 安装 Node.js
fnm install 22
fnm install --lts

# 切换版本
fnm use 22

# 设置默认
fnm default 22

# .node-version 文件自动生效
echo "22.12.0" > .node-version

1.4 n（简洁方案）

# 安装
npm install -g n

# 安装版本
n lts
n 22.12.0

# 切换版本
n

# 交互式选择已安装版本
n

1.5 版本选择策略

LTS 版本优先：生产环境只用偶数版本（18、20、22），这些是 LTS 版本，有 30 个月维护期
Active LTS → Maintenance LTS：新项目选当前 Active LTS，老项目在 Maintenance 阶段规划升级
统一版本：团队通过 .nvmrc 或 .node-version 文件锁定版本，CI/CD 强制校验
升级节奏：每季度评估一次升级，先在测试环境验证，再灰度上线

# CI 中校验版本
node -v | grep -q "v20.18.0" || { echo "Node.js version mismatch!"; exit 1; }

二、PM2 进程管理

2.1 为什么需要 PM2？

Node.js 是单进程的，生产环境需要进程管理器来实现：多进程利用多核 CPU、崩溃自动重启、日志管理、零停机重载。PM2 是 Node.js 生态中最成熟的进程管理器。

2.2 安装与基础使用

# 全局安装
npm install -g pm2

# 启动应用
pm2 start app.js --name my-app

# 查看进程列表
pm2 list

# 查看详情
pm2 show my-app

# 查看日志
pm2 logs my-app

# 停止/重启/删除
pm2 stop my-app
pm2 restart my-app
pm2 delete my-app

# 开机自启
pm2 startup
pm2 save

2.3 ecosystem.config.js（推荐配置方式）

在项目根目录创建 ecosystem.config.js：

module.exports = {
  apps: [
    {
      // 基本配置
      name: 'my-api',
      script: './dist/server.js',
      cwd: '/opt/apps/my-api',

      // 集群模式
      instances: 'max',          // 使用所有 CPU 核心
      exec_mode: 'cluster',      // 集群模式

      // 环境变量
      env: {
        NODE_ENV: 'development',
        PORT: 3000
      },
      env_production: {
        NODE_ENV: 'production',
        PORT: 8080,
        DB_HOST: 'prod-db.example.com'
      },

      // 日志配置
      log_file: '/var/log/pm2/my-api.log',
      error_file: '/var/log/pm2/my-api-error.log',
      out_file: '/var/log/pm2/my-api-out.log',
      log_date_format: 'YYYY-MM-DD HH:mm:ss Z',
      merge_logs: true,          // 集群模式下合并日志

      // 内存和重启
      max_memory_restart: '1G',  // 内存超 1G 自动重启
      restart_delay: 3000,       // 重启间隔 3 秒
      max_restarts: 10,          // 10 分钟内最多重启 10 次
      min_uptime: '10s',         // 运行超 10 秒才算启动成功

      // 监听文件变化（仅开发环境）
      watch: false,
      ignore_watch: ['node_modules', 'logs', '.git'],

      // 优雅关闭
      kill_timeout: 5000,        // 5 秒后强制关闭
      listen_timeout: 8000,      // 8 秒内未就绪则超时
      shutdown_with_message: true
    }
  ]
};

使用方式：

# 使用 ecosystem 文件启动
pm2 start ecosystem.config.js

# 指定环境
pm2 start ecosystem.config.js --env production

# 重载（零停机）
pm2 reload ecosystem.config.js

# 重载指定环境
pm2 reload ecosystem.config.js --env production

2.4 集群模式详解

// ecosystem.config.js - 集群模式配置
module.exports = {
  apps: [{
    name: 'web-server',
    script: './server.js',
    instances: 4,              // 指定 4 个实例
    exec_mode: 'cluster',
    instance_var: 'INSTANCE_ID', // 注入实例 ID 到环境变量

    // 实例间负载均衡策略
    // PM2 默认使用 Round-Robin
    // 设为 -1 使用操作系统调度（对 CPU 密集型更好）
    // interpreter_args: '--max-old-space-size=2048'
  }]
};

在应用中优雅关闭：

// app.js - 处理优雅关闭信号
process.on('SIGINT', () => {
  console.log('Received SIGINT, shutting down gracefully...');
  server.close(() => {
    console.log('Server closed');
    process.exit(0);
  });

  // 超时强制退出
  setTimeout(() => {
    console.error('Forced shutdown after timeout');
    process.exit(1);
  }, 5000);
});

// PM2 的 graceful shutdown 消息
process.on('message', (msg) => {
  if (msg === 'shutdown') {
    server.close(() => process.exit(0));
  }
});

2.5 PM2 日志管理

# 查看所有日志
pm2 logs

# 查看指定应用日志
pm2 logs my-api --lines 100

# 清空日志
pm2 flush

# 日志文件位置（默认）
# ~/.pm2/logs/

# 安装日志轮转模块
pm2 install pm2-logrotate

# 配置日志轮转
pm2 set pm2-logrotate:max_size 10M    # 单个文件最大 10M
pm2 set pm2-logrotate:retain 7        # 保留 7 个历史文件
pm2 set pm2-logrotate:compress true   # 压缩历史日志
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm
pm2 set pm2-logrotate:rotateInterval '0 0 * * *'  # 每天轮转

三、应用部署

3.1 systemd 服务部署（推荐裸机部署）

创建 systemd 服务文件 /etc/systemd/system/my-api.service：

[Unit]
Description=My Node.js API Server
Documentation=https://github.com/myorg/my-api
After=network.target
Wants=network-online.target

[Service]
Type=notify
User=nodeapp
Group=nodeapp
WorkingDirectory=/opt/apps/my-api
Environment=NODE_ENV=production
Environment=PORT=8080
EnvironmentFile=/opt/apps/my-api/.env.production

# 启动命令
ExecStart=/usr/local/bin/node dist/server.js

# 重启策略
Restart=always
RestartSec=5
StartLimitIntervalSec=60
StartLimitBurst=5

# 资源限制
LimitNOFILE=65535
MemoryMax=1G
CPUQuota=200%

# 安全加固
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/my-api /opt/apps/my-api/uploads
PrivateTmp=true

# 标准输出
StandardOutput=journal
StandardError=journal
SyslogIdentifier=my-api

[Install]
WantedBy=multi-user.target

操作命令：

# 创建用户
sudo useradd -r -s /bin/false nodeapp

# 重载 systemd
sudo systemctl daemon-reload

# 启动服务
sudo systemctl start my-api
sudo systemctl enable my-api

# 查看状态
sudo systemctl status my-api

# 查看日志
sudo journalctl -u my-api -f
sudo journalctl -u my-api --since "1 hour ago"

# 重启
sudo systemctl restart my-api

3.2 Docker 部署

Dockerfile（多阶段构建）：

# === 构建阶段 ===
FROM node:20-alpine AS builder
WORKDIR /app

# 先复制依赖文件（利用 Docker 缓存）
COPY package*.json ./
RUN npm ci --only=production && \
    cp -R node_modules /prod_modules && \
    npm ci

# 复制源码并构建
COPY . .
RUN npm run build

# === 生产阶段 ===
FROM node:20-alpine AS production
WORKDIR /app

# 安全：创建非 root 用户
RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -s /bin/sh -D appuser

# 从构建阶段复制产物
COPY --from=builder /prod_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

USER appuser
EXPOSE 3000

CMD ["node", "dist/server.js"]

docker-compose.yml：

version: '3.8'

services:
  api:
    build:
      context: .
      dockerfile: Dockerfile
      target: production
    container_name: my-api
    restart: unless-stopped
    ports:
      - "8080:3000"
    environment:
      - NODE_ENV=production
      - DB_HOST=db
      - REDIS_HOST=redis
    env_file:
      - .env.production
    volumes:
      - api-logs:/app/logs
      - api-uploads:/app/uploads
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 1G
        reservations:
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 5

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: myapp
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - db-data:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5

volumes:
  api-logs:
  api-uploads:
  db-data:

3.3 环境变量管理

推荐使用 dotenv + 类型校验：

# 安装依赖
npm install dotenv zod

// config/env.js
require('dotenv').config({
  path: process.env.NODE_ENV === 'test' ? '.env.test' : '.env'
});

const { z } = require('zod');

const envSchema = z.object({
  NODE_ENV: z.enum(['development', 'production', 'test']).default('development'),
  PORT: z.coerce.number().default(3000),
  DATABASE_URL: z.string().url(),
  REDIS_URL: z.string().url(),
  JWT_SECRET: z.string().min(32),
  LOG_LEVEL: z.enum(['error', 'warn', 'info', 'debug']).default('info'),
});

// 启动时校验，失败则立即退出
const parsed = envSchema.safeParse(process.env);

if (!parsed.success) {
  console.error('❌ 环境变量校验失败:');
  console.error(parsed.error.format());
  process.exit(1);
}

module.exports = parsed.data;

多环境配置文件管理：

# 项目结构
.env.example      # 模板文件，提交到 Git
.env.development  # 本地开发，不提交
.env.test         # 测试环境，不提交
.env.production   # 生产环境，不提交，通过 CI/CD 注入

# .gitignore 中
.env
.env.local
.env.*.local

四、日志管理

4.1 为什么不用 console.log？

console.log 在生产环境有严重问题：没有日志级别、无法结构化输出、没有日志轮转、同步写入阻塞事件循环。生产环境必须使用专业日志库。

4.2 pino（推荐，高性能）

npm install pino pino-pretty

// logger.js
const pino = require('pino');

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  // 生产环境用 JSON 格式，开发环境用 pretty
  transport: process.env.NODE_ENV === 'production'
    ? undefined
    : {
        target: 'pino-pretty',
        options: {
          colorize: true,
          translateTime: 'SYS:standard',
          ignore: 'pid,hostname'
        }
      },
  // 添加公共字段
  base: {
    service: 'my-api',
    version: process.env.APP_VERSION || 'unknown'
  },
  // 序列化器
  serializers: {
    req: pino.stdSerializers.req,
    res: pino.stdSerializers.res,
    err: pino.stdSerializers.err
  }
});

module.exports = logger;

// 在 Express 中使用
const express = require('express');
const logger = require('./logger');
const pinoHttp = require('pino-http');

const app = express();

// 请求日志中间件
app.use(pinoHttp({ logger }));

app.get('/api/users', (req, res) => {
  req.log.info('Fetching users');  // 自动附加 request ID
  // ...
});

app.use((err, req, res, next) => {
  req.log.error({ err }, 'Request failed');
  res.status(500).json({ error: 'Internal Server Error' });
});

4.3 winston（功能全面）

npm install winston

// logger.js
const winston = require('winston');
const { combine, timestamp, json, errors, printf, colorize } = winston.format;

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: combine(
    errors({ stack: true }),
    timestamp({ format: 'YYYY-MM-DD HH:mm:ss.SSS' }),
    json()
  ),
  defaultMeta: { service: 'my-api' },
  transports: [
    // 控制台输出
    new winston.transports.Console({
      format: process.env.NODE_ENV === 'production'
        ? combine(timestamp(), json())
        : combine(colorize(), timestamp(), printf(({ level, message, timestamp, ...meta }) => {
            const metaStr = Object.keys(meta).length > 1 ? ` ${JSON.stringify(meta)}` : '';
            return `${timestamp} ${level}: ${message}${metaStr}`;
          }))
    }),
    // 错误日志文件
    new winston.transports.File({
      filename: 'logs/error.log',
      level: 'error',
      maxsize: 10 * 1024 * 1024,  // 10MB
      maxFiles: 5,
      tailable: true
    }),
    // 所有日志文件
    new winston.transports.File({
      filename: 'logs/combined.log',
      maxsize: 10 * 1024 * 1024,
      maxFiles: 10,
      tailable: true
    })
  ]
});

module.exports = logger;

4.4 日志级别使用规范

const logger = require('./logger');

// error - 需要立即关注的错误
logger.error({ err, userId, orderId }, 'Payment processing failed');

// warn - 潜在问题，不影响核心功能
logger.warn({ retryCount: 3, endpoint }, 'API call retrying');

// info - 关键业务流程节点
logger.info({ userId, action: 'login' }, 'User logged in successfully');

// debug - 开发调试信息
logger.debug({ query, duration }, 'Database query executed');

// 静态上下文绑定
const childLogger = logger.child({ module: 'auth', version: 'v2' });
childLogger.info('Auth module initialized');

4.5 系统级日志轮转（logrotate）

创建 /etc/logrotate.d/my-api：

/var/log/my-api/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 nodeapp nodeapp
    sharedscripts
    postrotate
        # 通知应用重新打开日志文件（PM2 场景用 pm2 reloadLogs）
        /usr/local/bin/pm2 reloadLogs
    endscript
}

4.6 集中式日志架构

应用 → pino/winston → stdout/file
                        ↓
                  Filebeat / Promtail → Elasticsearch / Loki → Kibana / Grafana

Docker 环境推荐： 应用直接输出 JSON 到 stdout，由 Docker 日志驱动收集：

# docker-compose.yml
services:
  api:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"
        tag: "{{.Name}}/{{.ID}}"

五、性能分析

5.1 clinic.js（一站式诊断）

# 安装
npm install -g clinic

# 自动检测瓶颈类型
clinic doctor -- node server.js
# 运行负载测试后，生成报告 URL

# 火焰图（CPU 热点分析）
clinic flame -- node server.js
# 查看哪些函数占用 CPU 最多

# 事件循环延迟分析
clinic bubbleprof -- node server.js
# 分析异步操作的瓶颈

# I/O 分析
clinic heapprofiler -- node server.js
# 堆内存分配分析

5.2 火焰图分析

# 方法一：使用 clinic flame
clinic flame -- node server.js

# 方法二：使用 0x
npm install -g 0x
0x -- node server.js
# 生成交互式火焰图，浏览器打开

# 方法三：使用 Node.js 内置 CPU profiler
node --prof server.js
# 生成 isolate-*.log 文件
node --prof-process isolate-0x*.log > processed.txt

火焰图阅读要点：

X 轴：采样比例（越宽 = 耗时越多）
Y 轴：调用栈深度（越深 = 调用链越长）
颜色：红色 = JS 代码，黄色 = C++，绿色 = 内核
关注平顶：平顶的函数是 CPU 热点，优先优化

5.3 CPU Profile

# 使用 --inspect 启动
node --inspect server.js

# 或者远程调试
node --inspect=0.0.0.0:9229 server.js

然后在 Chrome 浏览器中：

打开 chrome://inspect
点击 "Open dedicated DevTools for Node"
切到 "Profiler" 标签
点击 "Start" 开始录制
执行你要分析的操作
点击 "Stop" 查看 CPU Profile

5.4 内存快照（Heap Snapshot）

# 方法一：使用 --inspect
node --inspect server.js
# Chrome DevTools → Memory → Take heap snapshot

# 方法二：代码中触发

// 手动触发 heapdump
const v8 = require('v8');
const fs = require('fs');

function takeHeapSnapshot() {
  const snapshotStream = v8.writeHeapSnapshot();
  console.log(`Heap snapshot written to: ${snapshotStream}`);
}

// 通过信号触发（SIGUSR2）
process.on('SIGUSR2', takeHeapSnapshot);

// 或通过 HTTP 端点（仅调试用，生产环境需鉴权）
app.get('/debug/heapdump', (req, res) => {
  const filename = `heapdump-${Date.now()}.heapsnapshot`;
  v8.writeHeapSnapshot(filename);
  res.json({ file: filename });
});

# 发送信号触发
kill -USR2 <pid>

六、内存泄漏排查

6.1 常见内存泄漏模式

// ❌ 模式一：全局缓存无限增长
const cache = {};
function addToCache(key, value) {
  cache[key] = value;  // 永远不会被清理
}

// ✅ 修复：使用 LRU 缓存
const LRU = require('lru-cache');
const cache = new LRU({ max: 500, ttl: 1000 * 60 * 5 });
cache.set(key, value);

// ❌ 模式二：事件监听器泄漏
class EventEmitter {
  addListener(event, fn) {
    // 每次调用都添加，从不移除
  }
}

// ✅ 修复：及时移除监听器
const handler = () => { /* ... */ };
emitter.on('data', handler);
// 不再需要时
emitter.removeListener('data', handler);

// ❌ 模式三：闭包持有大对象
function processData() {
  const hugeData = loadHugeData();
  return function callback() {
    // hugeData 被闭包持有，无法回收
    return hugeData.length;
  };
}

// ✅ 修复：只保留需要的数据
function processData() {
  const hugeData = loadHugeData();
  const length = hugeData.length;
  return function callback() {
    return length;
  };
}

// ❌ 模式四：未清理的定时器
setInterval(() => {
  // 如果外部引用丢失，这个 interval 仍然存在
}, 1000);

// ✅ 修复：保存引用并及时清理
const timer = setInterval(() => { /* ... */ }, 1000);
clearInterval(timer);

// ❌ 模式五：未关闭的流
const stream = fs.createReadStream('file.txt');
stream.on('data', () => { /* ... */ });
// 如果忘记 end/error 处理，流不会被关闭

// ✅ 修复：使用 pipeline
const { pipeline } = require('stream/promises');
await pipeline(
  fs.createReadStream('input.txt'),
  transformStream,
  fs.createWriteStream('output.txt')
);

6.2 排查步骤

# 步骤 1：监控内存趋势
# 通过 PM2 监控
pm2 monit

# 通过 process.memoryUsage()
setInterval(() => {
  const mem = process.memoryUsage();
  console.log(JSON.stringify({
    rss: `${Math.round(mem.rss / 1024 / 1024)}MB`,
    heapUsed: `${Math.round(mem.heapUsed / 1024 / 1024)}MB`,
    heapTotal: `${Math.round(mem.heapTotal / 1024 / 1024)}MB`,
    external: `${Math.round(mem.external / 1024 / 1024)}MB`
  }));
}, 30000);

# 步骤 2：对比两个 heap snapshot
# 1. 启动应用并记录初始快照
# 2. 执行一些操作
# 3. 强制 GC（node --expose-gc）
# 4. 记录第二个快照
# 5. 在 Chrome DevTools 中对比两个快照
# 6. 查看 "#Delta" 列，找出增长最多的对象

# 步骤 3：使用 --expose-gc 手动触发 GC
node --expose-gc --inspect server.js

# 在代码中
global.gc();
const before = process.memoryUsage().heapUsed;
// 执行操作
global.gc();
const after = process.memoryUsage().heapUsed;
console.log(`Heap delta: ${(after - before) / 1024 / 1024}MB`);

6.3 使用 heapdump 模块

npm install heapdump

const heapdump = require('heapdump');

// 写入堆快照文件
heapdump.writeSnapshot('./heapdump.heapsnapshot', (err, filename) => {
  if (err) console.error(err);
  console.log(`Heap dump written to ${filename}`);
});

// 通过信号触发
process.on('SIGUSR2', () => {
  heapdump.writeSnapshot(`./heapdump-${Date.now()}.heapsnapshot`);
});

七、监控与告警

7.1 PM2 内置监控

# 终端实时监控
pm2 monit

# 查看进程详情
pm2 show my-api

# PM2 Plus（可选，有免费额度）
pm2 register
pm2 link <secret_key> <machine_name>
# 打开 https://app.pm2.io 查看 Web 监控面板

7.2 自定义健康检查端点

// routes/health.js
const express = require('express');
const router = express.Router();
const { Pool } = require('pg');
const Redis = require('ioredis');

const db = new Pool({ connectionString: process.env.DATABASE_URL });
const redis = new Redis(process.env.REDIS_URL);

// 基础健康检查（给负载均衡器用）
router.get('/health', (req, res) => {
  res.status(200).json({ status: 'ok', timestamp: new Date().toISOString() });
});

// 深度健康检查（给监控系统用）
router.get('/health/deep', async (req, res) => {
  const checks = {};

  // 数据库检查
  try {
    const start = Date.now();
    await db.query('SELECT 1');
    checks.database = { status: 'up', latency: Date.now() - start };
  } catch (err) {
    checks.database = { status: 'down', error: err.message };
  }

  // Redis 检查
  try {
    const start = Date.now();
    await redis.ping();
    checks.redis = { status: 'up', latency: Date.now() - start };
  } catch (err) {
    checks.redis = { status: 'down', error: err.message };
  }

  // 内存检查
  const mem = process.memoryUsage();
  const heapUsedMB = Math.round(mem.heapUsed / 1024 / 1024);
  checks.memory = {
    heapUsed: `${heapUsedMB}MB`,
    status: heapUsedMB > 800 ? 'warning' : 'ok'
  };

  // 事件循环延迟检查
  const start = process.hrtime.bigint();
  await new Promise(resolve => setImmediate(resolve));
  const loopDelay = Number(process.hrtime.bigint() - start) / 1e6;
  checks.eventLoop = {
    delay: `${loopDelay.toFixed(2)}ms`,
    status: loopDelay > 100 ? 'warning' : 'ok'
  };

  const hasFailure = Object.values(checks).some(c => c.status === 'down');
  const statusCode = hasFailure ? 503 : 200;

  res.status(statusCode).json({
    status: hasFailure ? 'unhealthy' : 'healthy',
    uptime: process.uptime(),
    version: process.env.APP_VERSION,
    checks
  });
});

module.exports = router;

7.3 Prometheus 指标集成

npm install prom-client

// metrics.js
const client = require('prom-client');

// 启用默认指标（CPU、内存、事件循环等）
client.collectDefaultMetrics({ prefix: 'my_api_' });

// 自定义指标
const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP 请求耗时',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5]
});

const httpRequestTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'HTTP 请求总数',
  labelNames: ['method', 'route', 'status_code']
});

const activeConnections = new client.Gauge({
  name: 'active_connections',
  help: '当前活跃连接数'
});

const dbQueryDuration = new client.Histogram({
  name: 'db_query_duration_seconds',
  help: '数据库查询耗时',
  labelNames: ['operation', 'table'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1]
});

// Express 中间件
function metricsMiddleware(req, res, next) {
  const end = httpRequestDuration.startTimer();
  res.on('finish', () => {
    end({ method: req.method, route: req.route?.path || 'unknown', status_code: res.statusCode });
    httpRequestTotal.inc({ method: req.method, route: req.route?.path || 'unknown', status_code: res.statusCode });
  });
  next();
}

// Prometheus 端点
function setupMetricsEndpoint(app) {
  app.get('/metrics', async (req, res) => {
    res.set('Content-Type', client.register.contentType);
    res.end(await client.register.metrics());
  });
}

module.exports = { httpRequestDuration, httpRequestTotal, activeConnections, dbQueryDuration, metricsMiddleware, setupMetricsEndpoint };

Prometheus 配置：

# prometheus.yml
scrape_configs:
  - job_name: 'node-api'
    scrape_interval: 15s
    static_configs:
      - targets: ['api-server:8080']
    metrics_path: '/metrics'

Grafana Dashboard 推荐指标：

请求速率 (QPS)
响应时间 P50/P95/P99
错误率 (5xx 比例)
Node.js 堆内存使用量
事件循环延迟
活跃连接数

7.4 告警规则示例（Prometheus AlertManager）

# alerts.yml
groups:
  - name: node-api-alerts
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status_code=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "错误率超过 5%"
          description: "{{ $labels.instance }} 错误率达 {{ $value | humanizePercentage }}"

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 响应时间超过 1 秒"

      - alert: HighMemoryUsage
        expr: process_resident_memory_bytes / 1024 / 1024 > 900
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "内存使用超过 900MB"

八、安全实践

8.1 Helmet（HTTP 安全头）

npm install helmet

const helmet = require('helmet');
const express = require('express');
const app = express();

app.use(helmet({
  contentSecurityPolicy: {
    directives: {
      defaultSrc: ["'self'"],
      scriptSrc: ["'self'", "'unsafe-inline'"],
      styleSrc: ["'self'", "'unsafe-inline'"],
      imgSrc: ["'self'", "data:", "https:"],
    }
  },
  hsts: {
    maxAge: 31536000,       // 1 年
    includeSubDomains: true,
    preload: true
  }
}));

8.2 CORS 配置

npm install cors

const cors = require('cors');

// 生产环境：严格限制来源
const corsOptions = {
  origin: (origin, callback) => {
    const allowedOrigins = [
      'https://app.example.com',
      'https://admin.example.com'
    ];
    // 允许无 origin 的请求（如 curl、服务器端请求）
    if (!origin || allowedOrigins.includes(origin)) {
      callback(null, true);
    } else {
      callback(new Error('CORS not allowed'));
    }
  },
  methods: ['GET', 'POST', 'PUT', 'DELETE', 'PATCH'],
  allowedHeaders: ['Content-Type', 'Authorization', 'X-Request-ID'],
  exposedHeaders: ['X-Total-Count', 'X-Request-ID'],
  credentials: true,
  maxAge: 86400  // 预检缓存 24 小时
};

app.use(cors(corsOptions));

8.3 Rate Limiting（限流）

npm install express-rate-limit rate-limit-redis ioredis

const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');

const redisClient = new Redis(process.env.REDIS_URL);

// 全局限流：每 IP 每 15 分钟 100 次请求
const globalLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 100,
  standardHeaders: true,
  legacyHeaders: false,
  // 集群模式下使用 Redis 存储
  store: new RedisStore({
    sendCommand: (...args) => redisClient.call(...args)
  }),
  message: {
    error: 'Too many requests, please try again later.',
    retryAfter: '15 minutes'
  }
});

// 登录接口更严格限流：每 IP 每 15 分钟 5 次
const loginLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 5,
  skipSuccessfulRequests: false,
  message: { error: 'Too many login attempts, please try again later.' }
});

app.use('/api/', globalLimiter);
app.use('/api/auth/login', loginLimiter);

8.4 依赖安全审计

# npm 内置审计
npm audit
npm audit --production
npm audit fix
npm audit fix --force  # 可能有破坏性变更，谨慎使用

# 生成审计报告
npm audit --json > audit-report.json

# 使用 Snyk（更强大）
npm install -g snyk
snyk auth
snyk test
snyk monitor  # 持续监控

# 在 CI 中阻止有高危漏洞的构建
npm audit --audit-level=high
# 如果有 high/critical 漏洞，退出码非零

# 使用 npq 在安装前检查
npm install -g npq
npq install lodash  # 安装前检查安全性

8.5 其他安全实践

// 输入校验
const { z } = require('zod');

const userSchema = z.object({
  email: z.string().email(),
  name: z.string().min(1).max(100).trim(),
  age: z.number().int().positive().max(150).optional()
});

app.post('/api/users', (req, res) => {
  const result = userSchema.safeParse(req.body);
  if (!result.success) {
    return res.status(400).json({ errors: result.error.issues });
  }
  // 使用 result.data（已校验和清理的数据）
});

// 请求体大小限制
app.use(express.json({ limit: '1mb' }));
app.use(express.urlencoded({ extended: true, limit: '1mb' }));

// 禁止 X-Powered-By
app.disable('x-powered-by');

// 信任代理（如果在 Nginx 后面）
app.set('trust proxy', 1);

九、CI/CD 集成

9.1 GitHub Actions 部署流程

# .github/workflows/deploy.yml
name: Build and Deploy

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  NODE_VERSION: '20'
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # ===== 测试阶段 =====
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: test_db
          POSTGRES_PASSWORD: test_pass
        ports: ['5432:5432']
        options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
      redis:
        image: redis:7
        ports: ['6379:6379']

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Lint
        run: npm run lint

      - name: Type check
        run: npm run typecheck

      - name: Unit tests
        run: npm run test:unit -- --coverage
        env:
          DATABASE_URL: postgresql://postgres:test_pass@localhost:5432/test_db
          REDIS_URL: redis://localhost:6379

      - name: Integration tests
        run: npm run test:integration
        env:
          DATABASE_URL: postgresql://postgres:test_pass@localhost:5432/test_db
          REDIS_URL: redis://localhost:6379

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          token: ${{ secrets.CODECOV_TOKEN }}

  # ===== 安全扫描 =====
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - name: Audit dependencies
        run: npm audit --audit-level=high
      - name: Run Snyk
        uses: snyk/actions/node@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

  # ===== 构建与推送镜像 =====
  build:
    needs: [test, security]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'

    permissions:
      contents: read
      packages: write

    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}

    steps:
      - uses: actions/checkout@v4

      - name: Login to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=raw,value=latest

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # ===== 部署到生产 =====
  deploy:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production

    steps:
      - name: Deploy to server
        uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.DEPLOY_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          key: ${{ secrets.DEPLOY_KEY }}
          script: |
            cd /opt/apps/my-api
            docker compose pull api
            docker compose up -d --no-deps api
            docker image prune -f

            # 等待健康检查通过
            for i in $(seq 1 30); do
              if curl -sf http://localhost:8080/health > /dev/null; then
                echo "✅ Deployment successful"
                exit 0
              fi
              echo "Waiting for health check... ($i/30)"
              sleep 2
            done
            echo "❌ Health check failed"
            docker compose logs api --tail=50
            exit 1

9.2 GitLab CI 部署流程

# .gitlab-ci.yml
stages:
  - test
  - build
  - deploy

variables:
  NODE_VERSION: "20"

test:
  stage: test
  image: node:${NODE_VERSION}-alpine
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/
  script:
    - npm ci --cache .npm --prefer-offline
    - npm run lint
    - npm run test
  coverage: '/All files[^|]*\|[^|]*\s+([\d\.]+)/'
  artifacts:
    reports:
      junit: junit.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  only:
    - main

deploy_staging:
  stage: deploy
  script:
    - ssh deploy@staging-server "cd /opt/app && docker pull $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA && docker compose up -d"
  environment:
    name: staging
  only:
    - main

deploy_production:
  stage: deploy
  script:
    - ssh deploy@prod-server "cd /opt/app && docker pull $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA && docker compose up -d"
  environment:
    name: production
  when: manual  # 手动触发生产部署
  only:
    - main

9.3 零停机发布策略

PM2 零停机重载：

# 零停机重载（集群模式下逐个重启 worker）
pm2 reload ecosystem.config.js --env production

# 验证
pm2 list
pm2 logs --lines 20

Docker 滚动更新：

# docker-compose.yml
services:
  api:
    deploy:
      replicas: 3
      update_config:
        parallelism: 1          # 每次更新 1 个容器
        delay: 10s              # 等 10 秒再更新下一个
        order: start-first      # 先启动新的，再停旧的
        failure_action: rollback
        monitor: 30s
      rollback_config:
        parallelism: 1
        delay: 5s

蓝绿部署（Nginx 切换）：

# /etc/nginx/conf.d/upstream.conf
upstream app {
    # 蓝绿切换：注释/取消注释
    server 127.0.0.1:3001;  # blue
    # server 127.0.0.1:3002;  # green
}

#!/bin/bash
# deploy.sh - 蓝绿部署脚本
CURRENT=$(docker port my-api-blue 3000 2>/dev/null | head -1)
NEW_SLOT="green"
NEW_PORT=3002

if [[ "$CURRENT" == *"3001"* ]]; then
  NEW_SLOT="green"
  NEW_PORT=3002
else
  NEW_SLOT="blue"
  NEW_PORT=3001
fi

echo "Deploying to $NEW_SLOT (port $NEW_PORT)..."

# 启动新版本
docker run -d --name my-api-$NEW_SLOT -p $NEW_PORT:3000 my-api:$VERSION

# 等待健康检查
for i in $(seq 1 30); do
  if curl -sf http://localhost:$NEW_PORT/health; then
    break
  fi
  sleep 2
done

# 切换 Nginx
sed -i "s/server 127.0.0.1:[0-9]*/server 127.0.0.1:$NEW_PORT/" /etc/nginx/conf.d/upstream.conf
nginx -s reload

# 停止旧版本
OLD_SLOT=$([[ "$NEW_SLOT" == "green" ]] && echo "blue" || echo "green")
docker stop my-api-$OLD_SLOT && docker rm my-api-$OLD_SLOT

echo "✅ Deployed to $NEW_SLOT successfully"

十、常见问题排查

10.1 事件循环阻塞

症状： 应用响应变慢，但 CPU 和内存正常。

# 检测事件循环延迟
node -e "
const { monitorEventLoopDelay } = require('perf_hooks');
const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();
setInterval(() => {
  console.log(JSON.stringify({
    min: (h.min / 1e6).toFixed(2) + 'ms',
    max: (h.max / 1e6).toFixed(2) + 'ms',
    mean: (h.mean / 1e6).toFixed(2) + 'ms',
    p50: (h.percentile(50) / 1e6).toFixed(2) + 'ms',
    p99: (h.percentile(99) / 1e6).toFixed(2) + 'ms'
  }));
  h.reset();
}, 5000);
"

常见原因与修复：

// ❌ 同步 JSON 解析大文件
const data = JSON.parse(hugeString);

// ✅ 使用流式解析
const { parser } = require('stream-json');
const { streamArray } = require('stream-json/streamers/StreamArray');
const fs = require('fs');

fs.createReadStream('huge.json')
  .pipe(parser())
  .pipe(streamArray())
  .on('data', ({ key, value }) => {
    // 逐条处理
  });

// ❌ 同步加密
const hash = crypto.createHash('sha256').update(data).digest('hex');

// ✅ 使用异步加密（Node.js 15+）
const { scrypt } = require('crypto/promises');
const hash = await scrypt(password, salt, 64);

// ❌ 密集计算阻塞事件循环
function fibonacci(n) {
  if (n <= 1) return n;
  return fibonacci(n - 1) + fibonacci(n - 2);
}

// ✅ 使用 Worker Threads
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

if (isMainThread) {
  const worker = new Worker(__filename, { workerData: { n: 40 } });
  worker.on('message', (result) => console.log(result));
} else {
  const result = fibonacci(workerData.n);
  parentPort.postMessage(result);
}

10.2 CPU 占用高

# 快速定位哪个进程
top -c -p $(pgrep -f "node" | tr '\n' ',' | sed 's/,$//')

# 使用 clinic flame 生成火焰图
clinic flame -- node server.js

# 使用 pidstat 监控
pidstat -p $(pgrep -f "node") 1

# 使用 strace 看系统调用
strace -cp $(pgrep -f "node") -e trace=all

# Node.js 内置 CPU profiling
node --prof server.js
# 压测后
node --prof-process isolate-*.log > profile.txt

10.3 OOM（Out of Memory）崩溃

# 查看 OOM 记录
dmesg | grep -i "oom\|killed process"

# 查看系统内存
free -h

# Node.js 堆内存限制（默认约 1.5GB）
# 增加堆内存上限
node --max-old-space-size=4096 server.js

# 在 PM2 中配置
# ecosystem.config.js
module.exports = {
  apps: [{
    name: 'my-api',
    script: 'server.js',
    node_args: '--max-old-space-size=4096',
    max_memory_restart: '3G'  // PM2 层面的重启保护
  }]
};

# 在 systemd 中配置
# [Service]
# Environment=NODE_OPTIONS=--max-old-space-size=4096

排查 OOM 根因：

# 1. 启用堆快照在 OOM 前自动导出
node --heapsnapshot-signal=SIGUSR2 --max-old-space-size=2048 server.js

# 2. 监控内存增长趋势
watch -n 5 'ps -p $(pgrep -f "node") -o pid,rss,vsz,comm'

# 3. 使用 process.memoryUsage() 定期记录

10.4 端口冲突

# 查看端口占用
lsof -i :3000
ss -tlnp | grep 3000
netstat -tlnp | grep 3000

# 查看占用进程详情
lsof -i :3000 -P -n

# 杀掉占用进程
kill $(lsof -ti :3000)

# 强制杀掉
kill -9 $(lsof -ti :3000)

# 查看所有 Node.js 监听端口
ss -tlnp | grep node

# 让 Node.js 自动寻找可用端口
const server = app.listen(0, () => {
  console.log(`Server listening on port ${server.address().port}`);
});

10.5 DNS 解析问题

# 在 Node.js 中调试 DNS
node -e "require('dns').resolve('api.example.com', console.log)"

# 查看 Node.js DNS 缓存（Node.js 22+）
node -e "console.log(require('dns').getCacheStats())"

# 设置 DNS 结果缓存 TTL
node --dns-result-order=ipv4first server.js

# 使用自定义 DNS 解析器
const dns = require('dns');
dns.setServers(['8.8.8.8', '8.8.4.4']);

10.6 连接泄漏

# 查看 TCP 连接数
ss -s
ss -tnp | grep :5432 | wc -l  # PostgreSQL 连接数

# 检查文件描述符使用
ls /proc/$(pgrep -f "node")/fd | wc -l
cat /proc/$(pgrep -f "node")/limits | grep "open files"

# 增加文件描述符限制
# /etc/security/limits.conf
# nodeapp soft nofile 65535
# nodeapp hard nofile 65535

十一、运维命令速查表

进程管理

# PM2 基础操作
pm2 start ecosystem.config.js            # 启动
pm2 start ecosystem.config.js --env prod  # 指定环境启动
pm2 reload all                            # 零停机重载所有
pm2 restart all                           # 重启所有
pm2 stop all                              # 停止所有
pm2 delete all                            # 删除所有
pm2 list                                  # 查看进程列表
pm2 show <name>                           # 查看详情
pm2 monit                                 # 实时监控面板
pm2 logs <name> --lines 100              # 查看日志
pm2 flush                                 # 清空日志
pm2 save                                  # 保存当前进程列表
pm2 resurrect                             # 恢复保存的进程列表
pm2 startup                               # 设置开机自启

调试与诊断

# Node.js 调试
node --inspect server.js                  # 启用调试器
node --inspect-brk server.js              # 启用调试器并在首行断点
node --inspect=0.0.0.0:9229 server.js    # 远程调试
node --prof server.js                     # CPU profiling
node --expose-gc server.js               # 暴露 gc()
node --max-old-space-size=4096 server.js # 设置堆内存上限
node --trace-warnings server.js           # 显示警告堆栈

# 内存分析
node -e "console.log(process.memoryUsage())"
node -e "console.log(v8.getHeapStatistics())"

# 系统诊断
clinic doctor -- node server.js           # 综合诊断
clinic flame -- node server.js            # 火焰图
clinic bubbleprof -- node server.js       # 异步分析

Docker 操作

# 构建
docker build -t my-api:latest .
docker build --no-cache -t my-api:v2 .    # 无缓存构建

# 运行
docker run -d --name my-api -p 8080:3000 my-api:latest
docker run -d --name my-api -e NODE_ENV=production my-api:latest

# 管理
docker ps                                 # 查看运行中容器
docker ps -a                              # 查看所有容器
docker logs -f --tail 100 my-api         # 查看日志
docker exec -it my-api sh                # 进入容器
docker stats my-api                       # 资源使用情况
docker inspect my-api                     # 容器详情

# 清理
docker system prune -a                    # 清理所有未使用资源
docker image prune -a                     # 清理未使用镜像
docker volume prune                       # 清理未使用卷

日志查看

# systemd 日志
journalctl -u my-api -f                   # 实时跟踪
journalctl -u my-api --since "1h ago"    # 最近 1 小时
journalctl -u my-api -p err              # 只看错误
journalctl -u my-api --no-pager -n 50    # 最近 50 行

# 文件日志
tail -f /var/log/my-api/app.log          # 实时跟踪
tail -f /var/log/my-api/app.log | grep ERROR  # 过滤错误
grep -r "OOM\|FATAL" /var/log/my-api/    # 搜索关键错误
zcat /var/log/my-api/app.log.*.gz | grep ERROR  # 搜索压缩的历史日志

网络排查

# 端口检查
ss -tlnp                                 # 查看所有监听端口
lsof -i :3000                            # 查看端口占用
curl -v http://localhost:3000/health     # 测试健康检查
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:3000/api  # 测量各阶段耗时

# 连接统计
ss -s                                    # 连接统计摘要
ss -tn state established | wc -l         # 已建立连接数
ss -tnp | grep node                      # Node.js 连接详情

资源监控

# CPU 和内存
top -c -p $(pgrep -d',' -f "node")
pidstat -p $(pgrep -f "node") 1
ps aux | grep node | grep -v grep
free -h

# 磁盘
df -h                                    # 磁盘使用
du -sh /var/log/my-api/                  # 日志目录大小
ncdu /var/log/my-api/                    # 交互式磁盘分析

# 系统负载
uptime                                   # 系统负载
vmstat 1 10                              # 虚拟内存统计
iostat -x 1 5                            # I/O 统计

附录：推荐技术栈清单

应用框架： Express / Fastify / Koa / NestJS

进程管理： PM2 / systemd / Docker

日志： pino（推荐） / winston

监控： Prometheus + Grafana / PM2 Plus / Datadog

性能分析： clinic.js / 0x / Chrome DevTools

安全： helmet / cors / express-rate-limit / express-validator

测试： Jest / Vitest / supertest / k6（负载测试）

CI/CD： GitHub Actions / GitLab CI / Jenkins

容器化： Docker / Docker Compose / Kubernetes

📝 运维箴言：生产环境的稳定性 = 预防（规范 + 自动化）+ 检测（监控 + 告警）+ 恢复（预案 + 演练）。不要等问题来找你，要让系统主动告诉你。

如果觉得文章对你有用，请随意赞赏

运维

服务器运维笔记：Node.js 应用部署与运维

https://acf1sh.top/console/overview/archives/fu-wu-qi-yun-wei-bi-ji-node.js-ying-yong-bu-shu-yu-yun-wei

作者

fish

发布于

2026-06-14

更新于

2026-06-10

许可协议

CC BY 4.0

服务器运维笔记：Node.js 应用部署与运维

一、Node.js 安装与版本管理

1.1 为什么需要版本管理？

1.2 nvm（推荐）

1.3 fnm（高性能替代）

1.4 n（简洁方案）

1.5 版本选择策略

二、PM2 进程管理

2.1 为什么需要 PM2？

2.2 安装与基础使用

2.3 ecosystem.config.js（推荐配置方式）

2.4 集群模式详解

2.5 PM2 日志管理

三、应用部署

3.1 systemd 服务部署（推荐裸机部署）

3.2 Docker 部署

3.3 环境变量管理

四、日志管理

4.1 为什么不用 console.log？

4.2 pino（推荐，高性能）

4.3 winston（功能全面）

4.4 日志级别使用规范

4.5 系统级日志轮转（logrotate）

4.6 集中式日志架构

五、性能分析

5.1 clinic.js（一站式诊断）

5.2 火焰图分析

5.3 CPU Profile

5.4 内存快照（Heap Snapshot）

六、内存泄漏排查

6.1 常见内存泄漏模式

6.2 排查步骤

6.3 使用 heapdump 模块

七、监控与告警

7.1 PM2 内置监控

7.2 自定义健康检查端点

7.3 Prometheus 指标集成

7.4 告警规则示例（Prometheus AlertManager）

八、安全实践

8.1 Helmet（HTTP 安全头）

8.2 CORS 配置

8.3 Rate Limiting（限流）

8.4 依赖安全审计

8.5 其他安全实践

九、CI/CD 集成

9.1 GitHub Actions 部署流程

9.2 GitLab CI 部署流程

9.3 零停机发布策略

十、常见问题排查

10.1 事件循环阻塞

10.2 CPU 占用高

10.3 OOM（Out of Memory）崩溃

10.4 端口冲突

10.5 DNS 解析问题

10.6 连接泄漏

十一、运维命令速查表

进程管理

调试与诊断

Docker 操作

日志查看

网络排查

资源监控

附录：推荐技术栈清单

作者

发布于

更新于

许可协议

评论