本篇是服务器运维系列的第 15 篇,聚焦 Ansible 自动化运维。从基础架构到实战场景,覆盖日常运维中最常用的操作模式,所有示例均可直接复用。
一、Ansible 基础
1.1 架构原理
Ansible 是一个无代理(Agentless)的自动化工具,通过 SSH 连接目标主机执行任务。核心架构:
控制节点 (Control Node)
│
├── Inventory(主机清单)
├── Playbook(剧本)
├── Modules(模块)
└── Plugins(插件)
│ SSH
▼
目标主机 (Managed Nodes) ← 无需安装 Agent核心组件说明:
1.2 安装配置
方式一:pip 安装(推荐)
# 安装 Python 3 和 pip
sudo apt update && sudo apt install -y python3 python3-pip # Debian/Ubuntu
sudo yum install -y python3 python3-pip # CentOS/RHEL
# 安装 Ansible
pip3 install ansible
# 验证安装
ansible --version方式二:系统包管理器
# Ubuntu/Debian
sudo apt-add-repository ppa:ansible/ansible
sudo apt update && sudo apt install -y ansible
# CentOS/RHEL
sudo yum install -y epel-release
sudo yum install -y ansibleAnsible 配置文件优先级(从高到低):
ANSIBLE_CONFIG环境变量指定的文件./ansible.cfg(当前目录)~/.ansible.cfg(用户家目录)/etc/ansible/ansible.cfg(全局配置)
常用配置项 ansible.cfg:
[defaults]
inventory = ./inventory/hosts
remote_user = deploy
private_key_file = ~/.ssh/id_ed25519
host_key_checking = False
timeout = 30
forks = 20 # 并行执行数
log_path = ./ansible.log
retry_files_enabled = False # 禁用 retry 文件
stdout_callback = yaml # 更易读的输出格式
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no
pipelining = True # 提升性能
control_path_dir = ~/.ansible/cp1.3 Inventory 管理
静态 Inventory 文件(INI 格式):
# inventory/hosts
# Web 服务器组
[web]
web01 ansible_host=192.168.1.101
web02 ansible_host=192.168.1.102
web03 ansible_host=192.168.1.103
# 数据库服务器组
[db]
db01 ansible_host=192.168.1.201 ansible_port=22022
db02 ansible_host=192.168.1.202 ansible_port=22022
# 缓存服务器
[cache]
redis01 ansible_host=192.168.1.211
# 生产环境 = web + db + cache
[production:children]
web
db
cache
# 全局变量
[all:vars]
ansible_user=deploy
ansible_ssh_private_key_file=~/.ssh/id_ed25519
ansible_python_interpreter=/usr/bin/python3
# 组变量
[web:vars]
http_port=80
nginx_version=1.24.0
[db:vars]
mysql_port=3306YAML 格式 Inventory:
# inventory/hosts.yml
all:
vars:
ansible_user: deploy
ansible_python_interpreter: /usr/bin/python3
children:
web:
hosts:
web01:
ansible_host: 192.168.1.101
web02:
ansible_host: 192.168.1.102
vars:
http_port: 80
db:
hosts:
db01:
ansible_host: 192.168.1.201
ansible_port: 22022
production:
children:
web:
db:验证 Inventory:
# 列出所有主机
ansible-inventory --list -i inventory/hosts.yml
# 图形化展示
ansible-inventory --graph -i inventory/hosts.yml
# 测试连通性
ansible all -i inventory/hosts.yml -m ping二、Ad-hoc 命令
Ad-hoc 命令适合快速执行一次性操作,无需编写 Playbook。
2.1 基本语法
ansible <主机模式> -m <模块名> -a '<模块参数>' [选项]2.2 常用模块示例
# 测试所有主机连通性
ansible all -m ping
# 查看所有主机的系统信息(收集 Facts)
ansible all -m setup -a 'filter=ansible_distribution*'
# 执行命令
ansible web -m command -a 'uptime'
ansible web -m shell -a 'df -h | grep /dev/sda'
# 复制文件
ansible web -m copy -a 'src=./app.conf dest=/etc/app.conf mode=0644 backup=yes'
# 安装软件包
ansible web -m yum -a 'name=nginx state=present'
ansible web -m apt -a 'name=nginx state=present update_cache=yes'
# 管理服务
ansible web -m service -a 'name=nginx state=started enabled=yes'
# 创建用户
ansible all -m user -a 'name=deploy shell=/bin/bash groups=wheel append=yes'
# 文件操作
ansible web -m file -a 'path=/data/app state=directory mode=0755 owner=deploy'
ansible web -m file -a 'path=/tmp/test.log state=touch mode=0644'
# 下载文件
ansible web -m get_url -a 'url=https://example.com/app.tar.gz dest=/tmp/ mode=0644'2.3 并行执行控制
# 同时在 10 台机器上执行
ansible all -m shell -a 'yum update -y' --forks 10
# 逐台执行(串行)
ansible web -m shell -a 'systemctl restart nginx' --forks 1
# 限制到特定主机
ansible web -m shell -a 'hostname' --limit web01,web02
# 从文件读取主机列表
ansible all -m shell -a 'uptime' --limit @host_list.txt
# 失败百分比阈值(超过 25% 失败则停止)
ansible all -m shell -a 'systemctl restart app' -p 252.4 实用 Ad-hoc 组合
# 批量查看磁盘使用率,超过 80% 的标记告警
ansible all -m shell -a 'df -h | awk "NR>1 && int(\$5)>80 {print \$0}"' -o
# 批量同步时间
ansible all -m shell -a 'chronyc makestep' --become
# 批量查找大文件
ansible all -m shell -a 'find /var/log -type f -size +100M -exec ls -lh {} \;' -o
# 批量清理 Docker 资源
ansible all -m shell -a 'docker system prune -f' --become三、Playbook 编写
3.1 基本语法结构
---
# deploy-nginx.yml
- name: 安装并配置 Nginx
hosts: web
become: yes
vars:
nginx_port: 80
server_name: example.com
tasks:
- name: 安装 Nginx
yum:
name: nginx
state: present
- name: 复制 Nginx 配置
template:
src: templates/nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
notify: Restart Nginx
- name: 启动 Nginx 并设置开机自启
service:
name: nginx
state: started
enabled: yes
handlers:
- name: Restart Nginx
service:
name: nginx
state: restarted执行命令:
ansible-playbook deploy-nginx.yml -i inventory/hosts.yml
ansible-playbook deploy-nginx.yml -i inventory/hosts.yml --check # Dry Run
ansible-playbook deploy-nginx.yml -i inventory/hosts.yml --diff # 显示变更差异3.2 变量
变量定义的多种方式(优先级从高到低):
---
# 1. Play 级别 vars
- hosts: web
vars:
app_version: "3.2.1"
app_port: 8080
# 2. vars_prompt(交互式输入)
vars_prompt:
- name: db_password
prompt: "请输入数据库密码"
private: yes
# 3. vars_files(外部变量文件)
vars_files:
- vars/common.yml
- "vars/{{ ansible_distribution }}.yml"
tasks:
# 4. register(捕获任务输出作为变量)
- name: 获取磁盘信息
shell: df -h /
register: disk_info
- name: 打印磁盘信息
debug:
msg: "{{ disk_info.stdout_lines }}"
# 5. set_fact(动态设置变量)
- name: 计算内存阈值
set_fact:
memory_threshold_mb: "{{ (ansible_memtotal_mb * 0.8) | int }}"
# 6. 使用变量
- name: 输出版本信息
debug:
msg: "部署 {{ app_version }} 到端口 {{ app_port }}"外部变量文件 vars/common.yml:
---
app_name: myapp
app_user: appuser
app_group: appgroup
log_dir: /var/log/{{ app_name }}
data_dir: /data/{{ app_name }}3.3 条件判断(when)
tasks:
# 基本条件
- name: 仅在 CentOS 上安装 EPEL
yum:
name: epel-release
state: present
when: ansible_distribution == "CentOS"
# 多条件
- name: 仅在 CentOS 7 或 RHEL 7 上执行
shell: some_command
when:
- ansible_distribution in ["CentOS", "RedHat"]
- ansible_distribution_major_version == "7"
# 基于变量的条件
- name: 仅在主节点上执行
shell: init_master.sh
when: is_master | default(false) | bool
# 基于 register 的条件
- name: 检查服务是否运行
shell: systemctl is-active nginx
register: nginx_status
ignore_errors: yes
- name: 如果 Nginx 未运行则启动
service:
name: nginx
state: started
when: nginx_status.rc != 0
# 条件取反
- name: 如果文件不存在则创建
file:
path: /etc/app/config.yml
state: touch
when: not config_file.stat.exists3.4 循环
tasks:
# 简单列表循环
- name: 安装多个软件包
yum:
name: "{{ item }}"
state: present
loop:
- nginx
- redis
- mysql-server
- python3
# 更推荐的写法(直接传列表给 name 参数)
- name: 安装多个软件包(推荐写法)
yum:
name:
- nginx
- redis
- mysql-server
state: present
# 字典循环
- name: 创建多个用户
user:
name: "{{ item.name }}"
groups: "{{ item.groups }}"
shell: "{{ item.shell }}"
loop:
- { name: "dev01", groups: "developers", shell: "/bin/bash" }
- { name: "ops01", groups: "operations", shell: "/bin/zsh" }
- { name: "test01", groups: "testers", shell: "/bin/bash" }
# 嵌套循环
- name: 给多个用户授权多个目录
file:
path: "{{ item.1 }}"
owner: "{{ item.0 }}"
recurse: yes
loop: "{{ ['user1', 'user2'] | product(['/data', '/logs']) | list }}"
# 使用 loop_control
- name: 部署多个虚拟主机
template:
src: vhost.conf.j2
dest: "/etc/nginx/conf.d/{{ item.name }}.conf"
loop: "{{ virtual_hosts }}"
loop_control:
label: "{{ item.name }}" # 精简输出
pause: 2 # 每次循环暂停 2 秒
# 使用 until 重试
- name: 等待服务就绪
uri:
url: "http://localhost:{{ app_port }}/health"
status_code: 200
register: health_check
until: health_check.status == 200
retries: 30
delay: 53.5 Handlers
---
- name: 配置管理
hosts: web
become: yes
tasks:
- name: 更新 Nginx 配置
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify:
- Validate Nginx Config
- Reload Nginx
- name: 更新应用配置
template:
src: app.conf.j2
dest: /etc/app/config.yml
notify: Restart App
handlers:
- name: Validate Nginx Config
command: nginx -t
listen: "Validate and Reload Nginx"
- name: Reload Nginx
service:
name: nginx
state: restarted
listen: "Validate and Reload Nginx"
- name: Restart App
service:
name: myapp
state: restarted
# 使用 flush_handlers 在中间触发
# - meta: flush_handlers3.6 Tags
---
- name: 系统初始化
hosts: all
become: yes
tasks:
- name: 设置时区
timezone:
name: Asia/Shanghai
tags: [timezone, init]
- name: 配置 NTP
template:
src: chrony.conf.j2
dest: /etc/chrony.conf
notify: Restart Chrony
tags: [ntp, init]
- name: 配置 SSH
template:
src: sshd_config.j2
dest: /etc/ssh/sshd_config
notify: Restart SSH
tags: [ssh, security]
- name: 配置防火墙
firewalld:
port: "{{ item }}/tcp"
permanent: yes
state: enabled
loop: [22, 80, 443]
tags: [firewall, security]
handlers:
- name: Restart Chrony
service: { name: chronyd, state: restarted }
- name: Restart SSH
service: { name: sshd, state: restarted }# 只执行特定 tag 的任务
ansible-playbook site.yml --tags "security"
ansible-playbook site.yml --tags "ntp,ssh"
# 排除特定 tag
ansible-playbook site.yml --skip-tags "firewall"
# 列出所有 tag
ansible-playbook site.yml --list-tags四、Role 组织
4.1 目录结构
roles/
└── nginx/
├── defaults/
│ └── main.yml # 默认变量(最低优先级)
├── vars/
│ └── main.yml # 角色变量(高优先级)
├── tasks/
│ └── main.yml # 任务清单
├── handlers/
│ └── main.yml # Handler 定义
├── templates/
│ └── nginx.conf.j2 # Jinja2 模板
├── files/
│ └── ssl.crt # 静态文件
├── meta/
│ └── main.yml # 角色元数据和依赖
├── tests/
│ ├── inventory
│ └── test.yml
└── README.md4.2 Role 示例:Nginx
roles/nginx/defaults/main.yml:
---
nginx_worker_processes: "{{ ansible_processor_vcpus }}"
nginx_worker_connections: 1024
nginx_keepalive_timeout: 65
nginx_client_max_body_size: 50m
nginx_server_name: localhost
nginx_ssl_enabled: false
nginx_listen_port: 80roles/nginx/tasks/main.yml:
---
- name: 安装 Nginx
package:
name: nginx
state: present
- name: 确保配置目录存在
file:
path: "{{ item }}"
state: directory
mode: '0755'
loop:
- /etc/nginx/conf.d
- /etc/nginx/ssl
- name: 部署主配置文件
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
validate: "nginx -t -c %s"
notify: Reload Nginx
- name: 部署虚拟主机配置
template:
src: vhost.conf.j2
dest: "/etc/nginx/conf.d/{{ item.server_name }}.conf"
loop: "{{ nginx_vhosts | default([]) }}"
notify: Reload Nginx
- name: 启动并启用 Nginx
service:
name: nginx
state: started
enabled: yes
- name: 配置防火墙放行
firewalld:
port: "{{ nginx_listen_port }}/tcp"
permanent: yes
state: enabled
notify: Reload Firewalld
when: ansible_os_family == "RedHat"roles/nginx/handlers/main.yml:
---
- name: Reload Nginx
service:
name: nginx
state: reloaded
- name: Reload Firewalld
service:
name: firewalld
state: reloadedroles/nginx/templates/nginx.conf.j2:
# Managed by Ansible — DO NOT EDIT MANUALLY
user nginx;
worker_processes {{ nginx_worker_processes }};
error_log /var/log/nginx/error.log warn;
pid /run/nginx.pid;
events {
worker_connections {{ nginx_worker_connections }};
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout {{ nginx_keepalive_timeout }};
client_max_body_size {{ nginx_client_max_body_size }};
gzip on;
gzip_types text/plain text/css application/json application/javascript;
gzip_min_length 1000;
include /etc/nginx/conf.d/*.conf;
}4.3 使用 Role
---
# site.yml — 主入口 Playbook
- name: 配置 Web 服务器
hosts: web
become: yes
roles:
- role: common # 基础配置
- role: nginx # Nginx
- role: app # 应用部署
# 带条件和标签使用
- name: 数据库服务器
hosts: db
become: yes
roles:
- role: common
- role: mysql
vars:
mysql_root_password: "{{ vault_mysql_root_password }}"
tags: [db, mysql]4.4 Ansible Galaxy
# 从 Galaxy 安装 Role
ansible-galaxy install geerlingguy.nginx
ansible-galaxy install geerlingguy.mysql -p roles/
# 从 requirements 文件批量安装
cat > requirements.yml << 'EOF'
---
roles:
- name: geerlingguy.nginx
version: "3.1.0"
- name: geerlingguy.mysql
version: "4.0.0"
- src: https://github.com/company/ansible-role-app.git
name: company.app
version: main
EOF
ansible-galaxy install -r requirements.yml
# 创建自定义 Role 脚手架
ansible-galaxy init roles/myapp五、常用模块详解
5.1 file 模块
# 创建目录
- name: 创建应用目录
file:
path: /opt/myapp/{{ item }}
state: directory
owner: appuser
group: appgroup
mode: '0755'
loop: ['bin', 'conf', 'logs', 'data']
# 创建符号链接
- name: 链接到当前版本
file:
src: /opt/myapp/releases/{{ app_version }}
dest: /opt/myapp/current
state: link
owner: appuser
# 删除文件
- name: 清理临时文件
file:
path: "{{ item }}"
state: absent
loop:
- /tmp/app_build.tar.gz
- /tmp/app_build/5.2 copy 模块
# 复制文件
- name: 部署配置文件
copy:
src: files/app.conf
dest: /etc/myapp/app.conf
owner: root
group: root
mode: '0644'
backup: yes # 覆盖前备份
# 内联内容
- name: 创建 MOTD
copy:
content: |
====================================
Server: {{ inventory_hostname }}
Environment: {{ env | default('production') }}
Managed by Ansible
====================================
dest: /etc/motd
mode: '0644'5.3 template 模块
- name: 部署应用配置
template:
src: templates/app.conf.j2
dest: /etc/myapp/app.conf
owner: appuser
group: appgroup
mode: '0640'
validate: "/usr/bin/myapp check-config %s" # 部署前验证
backup: yes5.4 service 模块
- name: 管理服务
service:
name: "{{ item.name }}"
state: "{{ item.state }}"
enabled: "{{ item.enabled }}"
loop:
- { name: nginx, state: started, enabled: true }
- { name: redis, state: started, enabled: true }
- { name: firewalld, state: stopped, enabled: false }5.5 yum/apt 模块
# CentOS/RHEL
- name: 安装软件包
yum:
name:
- nginx
- redis
- git
- vim
state: present
enablerepo: epel
# 安装本地 RPM
- name: 安装本地包
yum:
name: /tmp/app-1.0.rpm
state: present
# Ubuntu/Debian
- name: 安装软件包
apt:
name:
- nginx
- redis-server
- git
state: present
update_cache: yes
cache_valid_time: 3600
# 添加 APT 仓库
- name: 添加 Docker GPG Key
apt_key:
url: https://download.docker.com/linux/ubuntu/gpg
state: present
- name: 添加 Docker 仓库
apt_repository:
repo: "deb [arch=amd64] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
state: present5.6 user 模块
- name: 创建应用用户
user:
name: appuser
comment: "Application User"
shell: /bin/bash
home: /home/appuser
create_home: yes
system: yes
groups: "{{ item.groups | default(omit) }}"
append: yes
# 授权 sudo
- name: 配置 sudo 权限
copy:
content: "appuser ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart myapp\n"
dest: /etc/sudoers.d/appuser
mode: '0440'
validate: "visudo -cf %s"5.7 lineinfile 模块
# 修改配置文件中的单行
- name: 设置 SSH 端口
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^#?Port '
line: 'Port 22022'
state: present
notify: Restart SSH
# 确保某行存在
- name: 添加 hosts 记录
lineinfile:
path: /etc/hosts
line: "{{ item.ip }} {{ item.hostname }}"
state: present
loop:
- { ip: "192.168.1.10", hostname: "app-server" }
- { ip: "192.168.1.20", hostname: "db-server" }
# 删除匹配的行
- name: 移除危险配置
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^PermitRootLogin'
state: absent5.8 blockinfile 模块
- name: 添加自定义配置块
blockinfile:
path: /etc/sysctl.conf
block: |
# Ansible managed — network optimization
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
marker: "# {mark} ANSIBLE MANAGED BLOCK - Network"
state: present
notify: Reload Sysctl六、Jinja2 模板
6.1 基础语法
{# 这是注释 #}
{# 变量输出 #}
主机名: {{ inventory_hostname }}
IP 地址: {{ ansible_default_ipv4.address }}
CPU 核数: {{ ansible_processor_vcpus }}
{# 条件判断 #}
{% if ansible_memtotal_mb > 8192 %}
worker_connections 4096;
{% elif ansible_memtotal_mb > 4096 %}
worker_connections 2048;
{% else %}
worker_connections 1024;
{% endif %}
{# 循环 #}
{% for user in users %}
{{ user.name }}:{{ user.uid }}:{{ user.shell }}
{% endfor %}6.2 常用过滤器
{# 字符串过滤器 #}
{{ app_name | upper }} {# MYAPP #}
{{ app_name | lower }} {# myapp #}
{{ app_name | capitalize }} {# Myapp #}
{{ app_name | replace('old', 'new') }} {# 替换 #}
{{ path | basename }} {# 文件名 #}
{{ path | dirname }} {# 目录名 #}
{# 数值过滤器 #}
{{ value | int }} {# 转整数 #}
{{ value | float }} {# 浮点数 #}
{{ ansible_memtotal_mb * 0.8 | round(0) | int }}{# 计算并取整 #}
{# 列表过滤器 #}
{{ list | unique }} {# 去重 #}
{{ list | sort }} {# 排序 #}
{{ list | length }} {# 长度 #}
{{ list | join(', ') }} {# 连接 #}
{{ list | first }} {# 第一个 #}
{{ list | last }} {# 最后一个 #}
{{ list | default(['item1']) }} {# 默认值 #}
{{ list | map('upper') | list }} {# 映射 #}
{{ list | select('match', '^app') | list }} {# 过滤 #}
{# 字典过滤器 #}
{{ dict | dict2items }} {# 转列表 #}
{{ items | items2dict }} {# 转字典 #}
{{ dict | combine(other_dict) }} {# 合并字典 #}
{{ dict.keys() | list }} {# 获取键 #}
{# JSON/YAML #}
{{ config | to_json }} {# 转 JSON #}
{{ config | to_nice_json(indent=2) }} {# 美化 JSON #}
{{ config | to_yaml }} {# 转 YAML #}
{# 哈希/加密 #}
{{ password | password_hash('sha512') }} {# 生成密码哈希 #}
{{ content | hash('md5') }} {# MD5 #}
{{ content | b64encode }} {# Base64 编码 #}
{{ content | b64decode }} {# Base64 解码 #}6.3 高级模板示例
{# 动态生成 Nginx upstream 配置 #}
upstream {{ app_name }}_backend {
least_conn;
{% for host in groups['web'] %}
server {{ hostvars[host]['ansible_host'] }}:{{ app_port }} weight={{ hostvars[host].weight | default(1) }} max_fails=3 fail_timeout=30s;
{% endfor %}
}
{# 根据主机名生成不同配置 #}
{% set host_num = inventory_hostname | regex_replace('^web(\d+)$', '\1') | int %}
{% if host_num % 2 == 0 %}
{# 偶数节点作为备用 #}
server_role: standby
{% else %}
server_role: primary
{% endif %}
{# 条件合并字典 #}
{% set default_config = {'workers': 4, 'max_conn': 1000} %}
{% set final_config = default_config | combine(custom_config | default({})) %}
workers: {{ final_config.workers }}
max_conn: {{ final_config.max_conn }}七、Inventory 进阶
7.1 host_vars 和 group_vars
inventory/
├── hosts.yml
├── host_vars/
│ ├── web01.yml # web01 专属变量
│ ├── web02.yml # web02 专属变量
│ └── db01.yml # db01 专属变量
└── group_vars/
├── all.yml # 所有主机共享变量
├── web.yml # web 组变量
├── db.yml # db 组变量
└── production.yml # production 组变量inventory/group_vars/web.yml:
---
nginx_worker_processes: 4
app_port: 8080
app_env: production
deploy_user: deployinventory/host_vars/web01.yml:
---
nginx_worker_processes: 8 # 覆盖组变量
is_primary: true7.2 动态 Inventory
动态 Inventory 从外部数据源(云 API、CMDB、脚本)实时获取主机信息。
自定义动态 Inventory 脚本:
#!/usr/bin/env python3
"""dynamic_inventory.py — 从 API 获取主机清单"""
import json
import sys
import requests
def get_inventory():
# 从 CMDB/API 获取主机列表
resp = requests.get("http://cmdb.internal/api/hosts")
hosts = resp.json()
inventory = {
"_meta": {"hostvars": {}},
"all": {"children": []}
}
groups = {}
for host in hosts:
name = host["hostname"]
group = host["role"] # web, db, cache 等
if group not in groups:
groups[group] = {"hosts": [], "vars": {}}
groups[group]["hosts"].append(name)
inventory["_meta"]["hostvars"][name] = {
"ansible_host": host["ip"],
"ansible_port": host.get("ssh_port", 22),
"ansible_user": host.get("ssh_user", "deploy"),
"env": host.get("env", "production"),
}
inventory.update(groups)
inventory["all"]["children"] = list(groups.keys())
return inventory
if __name__ == "__main__":
if len(sys.argv) == 2 and sys.argv[1] == "--list":
print(json.dumps(get_inventory(), indent=2))
elif len(sys.argv) == 3 and sys.argv[1] == "--host":
print(json.dumps({}))
else:
sys.exit(1)# 使用动态 Inventory
chmod +x dynamic_inventory.py
ansible -i dynamic_inventory.py all -m ping
# AWS EC2 动态 Inventory 插件
pip install boto3
cat > aws_ec2.yml << 'EOF'
plugin: amazon.aws.aws_ec2
regions:
- ap-southeast-1
keyed_groups:
- key: tags.Environment
prefix: env
- key: instance_type
prefix: type
filters:
tag:Managed: ansible
compose:
ansible_host: public_ip_address
EOF
ansible -i aws_ec2.yml all -m ping八、Ansible Vault
8.1 基本操作
# 创建加密文件
ansible-vault create secrets.yml
# 加密已有文件
ansible-vault encrypt vars/production/secrets.yml
# 编辑加密文件
ansible-vault edit secrets.yml
# 解密文件
ansible-vault decrypt secrets.yml
# 查看加密内容
ansible-vault view secrets.yml
# 更换密码
ansible-vault rekey secrets.yml
# 加密单个字符串(内联使用)
ansible-vault encrypt_string 'SuperSecret123' --name 'db_password'
# 输出:
# db_password: !vault |
# $ANSIBLE_VAULT;1.1;AES256
# 66386439653236336...8.2 Vault 实战用法
# group_vars/production/vault.yml(加密文件)
---
vault_db_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
...
vault_api_key: !vault |
$ANSIBLE_VAULT;1.1;AES256
...# 执行时提供密码
ansible-playbook site.yml --ask-vault-pass
# 使用密码文件(CI/CD 推荐)
ansible-playbook site.yml --vault-password-file ~/.vault_pass
# 多 Vault ID(不同密钥加密不同级别)
ansible-vault encrypt_string --vault-id prod@prompt 'secret' --name 'api_key'
ansible-vault encrypt_string --vault-id dev@dev_pass.txt 'devsecret' --name 'dev_key'
ansible-playbook site.yml --vault-id prod@prompt --vault-id dev@dev_pass.txt8.3 CI/CD 集成
# GitLab CI 示例
deploy:
stage: deploy
script:
- echo "$VAULT_PASSWORD" > ~/.vault_pass
- chmod 600 ~/.vault_pass
- ansible-playbook -i inventory/production site.yml
--vault-password-file ~/.vault_pass
--limit "$TARGET_HOSTS"
variables:
ANSIBLE_HOST_KEY_CHECKING: "False"
only:
- main# Jenkins Pipeline 中使用 Credentials Binding
# 将 vault 密码存入 Jenkins Credentials,通过环境变量传递
echo "${ANSIBLE_VULT_PASS}" > .vault_pass
ansible-playbook -i inventory site.yml --vault-password-file .vault_pass
rm -f .vault_pass九、实战场景
9.1 系统初始化 Playbook
---
# playbooks/init-server.yml
- name: 系统初始化
hosts: all
become: yes
gather_facts: yes
vars:
timezone: Asia/Shanghai
ssh_port: 22022
swap_size_mb: 2048
sysctl_params:
net.core.somaxconn: 65535
net.ipv4.tcp_max_syn_backlog: 65535
net.ipv4.ip_local_port_range: "1024 65535"
vm.swappiness: 10
fs.file-max: 655350
tasks:
- name: 设置时区
timezone:
name: "{{ timezone }}"
- name: 安装基础软件包
package:
name:
- vim
- git
- curl
- wget
- htop
- iotop
- net-tools
- lsof
- strace
- chrony
- bash-completion
state: present
- name: 配置 NTP
template:
src: chrony.conf.j2
dest: /etc/chrony.conf
notify: Restart Chrony
- name: 设置系统参数
sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
state: present
reload: yes
sysctl_file: /etc/sysctl.d/99-ansible.conf
loop: "{{ sysctl_params | dict2items }}"
- name: 配置文件描述符限制
pam_limits:
domain: '*'
limit_type: "{{ item.type }}"
limit_item: nofile
value: "{{ item.value }}"
loop:
- { type: soft, value: 655350 }
- { type: hard, value: 655350 }
- name: 配置 SSH
lineinfile:
path: /etc/ssh/sshd_config
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
loop:
- { regexp: '^#?Port ', line: "Port {{ ssh_port }}" }
- { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin no' }
- { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication no' }
- { regexp: '^#?UseDNS', line: 'UseDNS no' }
notify: Restart SSH
- name: 创建 Swap 文件
block:
- name: 检查 Swap
command: swapon --show
register: swap_check
changed_when: false
- name: 创建 Swap
command: >
dd if=/dev/zero of=/swapfile bs=1M count={{ swap_size_mb }}
&& chmod 600 /swapfile
&& mkswap /swapfile
&& swapon /swapfile
when: swap_check.stdout == ""
- name: 写入 fstab
lineinfile:
path: /etc/fstab
line: "/swapfile swap swap defaults 0 0"
when: swap_check.stdout == ""
handlers:
- name: Restart Chrony
service: { name: chronyd, state: restarted }
- name: Restart SSH
service: { name: sshd, state: restarted }9.2 批量部署应用
---
# playbooks/deploy-app.yml
- name: 部署应用
hosts: web
become: yes
serial: "30%" # 滚动更新,每次 30%
max_fail_percentage: 10 # 失败超过 10% 停止
vars:
app_name: myapp
app_version: "3.2.1"
app_repo: "registry.example.com/{{ app_name }}"
deploy_dir: /opt/{{ app_name }}
release_dir: "{{ deploy_dir }}/releases/{{ app_version }}"
current_link: "{{ deploy_dir }}/current"
pre_tasks:
- name: 从负载均衡器摘除
uri:
url: "http://lb.internal/api/remove"
method: POST
body: '{"host": "{{ inventory_hostname }}"}'
body_format: json
delegate_to: localhost
run_once: true
tasks:
- name: 创建目录结构
file:
path: "{{ item }}"
state: directory
owner: deploy
mode: '0755'
loop:
- "{{ release_dir }}"
- "{{ deploy_dir }}/shared/config"
- "{{ deploy_dir }}/shared/logs"
- name: 拉取应用镜像
command: "docker pull {{ app_repo }}:{{ app_version }}"
- name: 部署配置文件
template:
src: templates/app.conf.j2
dest: "{{ deploy_dir }}/shared/config/app.conf"
owner: deploy
mode: '0640'
notify: Restart App
- name: 停止旧容器
docker_container:
name: "{{ app_name }}"
state: stopped
ignore_errors: yes
- name: 启动新容器
docker_container:
name: "{{ app_name }}"
image: "{{ app_repo }}:{{ app_version }}"
state: started
restart_policy: unless-stopped
ports:
- "{{ app_port }}:8080"
volumes:
- "{{ deploy_dir }}/shared/config:/app/config:ro"
- "{{ deploy_dir }}/shared/logs:/app/log"
env:
APP_ENV: "{{ app_env }}"
DB_HOST: "{{ db_host }}"
- name: 等待健康检查通过
uri:
url: "http://localhost:{{ app_port }}/health"
status_code: 200
register: health
until: health.status == 200
retries: 30
delay: 5
- name: 更新软链接
file:
src: "{{ release_dir }}"
dest: "{{ current_link }}"
state: link
owner: deploy
post_tasks:
- name: 重新加入负载均衡
uri:
url: "http://lb.internal/api/add"
method: POST
body: '{"host": "{{ inventory_hostname }}"}'
body_format: json
delegate_to: localhost
handlers:
- name: Restart App
docker_container:
name: "{{ app_name }}"
state: started
restart: yes9.3 安全加固 Playbook
---
# playbooks/hardening.yml
- name: 服务器安全加固
hosts: all
become: yes
tasks:
- name: 禁用不需要的服务
service:
name: "{{ item }}"
state: stopped
enabled: no
loop:
- cups
- avahi-daemon
- postfix
ignore_errors: yes
- name: 配置密码策略
lineinfile:
path: /etc/login.defs
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
loop:
- { regexp: '^PASS_MAX_DAYS', line: 'PASS_MAX_DAYS 90' }
- { regexp: '^PASS_MIN_DAYS', line: 'PASS_MIN_DAYS 7' }
- { regexp: '^PASS_MIN_LEN', line: 'PASS_MIN_LEN 12' }
- name: 锁定 root 账户
user:
name: root
password_lock: yes
- name: 配置 fail2ban
block:
- name: 安装 fail2ban
package:
name: fail2ban
state: present
- name: 配置 fail2ban
copy:
content: |
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 5
[sshd]
enabled = true
port = {{ ssh_port }}
logpath = /var/log/secure
dest: /etc/fail2ban/jail.local
notify: Restart fail2ban
- name: 启动 fail2ban
service:
name: fail2ban
state: started
enabled: yes
- name: 配置审计规则
copy:
content: |
-w /etc/passwd -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/sudoers -p wa -k sudoers
-w /var/log/ -p wa -k logs
dest: /etc/audit/rules.d/ansible.rules
notify: Restart auditd
handlers:
- name: Restart fail2ban
service: { name: fail2ban, state: restarted }
- name: Restart auditd
service: { name: auditd, state: restarted }9.4 配置管理
---
# playbooks/config-management.yml
- name: 配置文件管理
hosts: all
become: yes
vars:
config_files:
- src: sshd_config.j2
dest: /etc/ssh/sshd_config
mode: '0600'
notify: Restart SSH
- src: sysctl.conf.j2
dest: /etc/sysctl.d/99-custom.conf
mode: '0644'
notify: Reload Sysctl
- src: logrotate.conf.j2
dest: /etc/logrotate.d/custom
mode: '0644'
tasks:
- name: 批量部署配置文件
template:
src: "templates/{{ item.src }}"
dest: "{{ item.dest }}"
owner: root
group: root
mode: "{{ item.mode }}"
validate: "{{ item.validate | default(omit) }}"
loop: "{{ config_files }}"
notify: "{{ item.notify | default(omit) }}"
- name: 检查配置语法
command: "{{ item.check_cmd }}"
loop: "{{ config_files }}"
when: item.check_cmd is defined
changed_when: false
handlers:
- name: Restart SSH
service: { name: sshd, state: restarted }
- name: Reload Sysctl
command: sysctl --system十、最佳实践
10.1 幂等性
幂等性(Idempotency):多次执行结果一致,这是 Ansible 的核心设计原则。
# ❌ 错误写法:不是幂等的
- name: 添加配置行
shell: echo "option=value" >> /etc/app.conf
# ✅ 正确写法:幂等
- name: 添加配置行
lineinfile:
path: /etc/app.conf
line: "option=value"
state: present
# ❌ 错误写法:每次都执行
- name: 初始化数据库
shell: /opt/app/bin/init-db.sh
# ✅ 正确写法:检查后再执行
- name: 检查数据库是否已初始化
command: /opt/app/bin/check-db.sh
register: db_check
changed_when: false
failed_when: false
- name: 初始化数据库
command: /opt/app/bin/init-db.sh
when: db_check.rc != 010.2 错误处理
tasks:
# ignore_errors — 忽略错误继续执行
- name: 尝试停止可能未运行的服务
service:
name: myapp
state: stopped
ignore_errors: yes
# block/rescue/always — 类似 try/catch/finally
- name: 带错误处理的部署流程
block:
- name: 部署新版本
command: deploy.sh
- name: 运行健康检查
uri:
url: "http://localhost/health"
status_code: 200
retries: 10
delay: 5
rescue:
- name: 回滚到旧版本
command: rollback.sh
- name: 发送告警
slack:
token: "{{ slack_token }}"
msg: "部署失败,已回滚: {{ inventory_hostname }}"
channel: "#ops-alerts"
always:
- name: 清理构建产物
file:
path: /tmp/build
state: absent
# failed_when — 自定义失败条件
- name: 执行迁移脚本
shell: migrate.sh
register: migrate_result
failed_when:
- migrate_result.rc != 0
- "'already migrated' not in migrate_result.stderr"
# changed_when — 自定义变更判定
- name: 检查配置是否需要更新
shell: md5sum /etc/app.conf
register: config_hash
changed_when: false10.3 性能优化
# ansible.cfg 性能优化配置
[defaults]
forks = 50 # 增加并行数
gathering = smart # 智能收集 Facts
fact_caching = jsonfile # 缓存 Facts
fact_caching_connection = /tmp/ansible_facts_cache
fact_caching_timeout = 86400 # 缓存 24 小时
[ssh_connection]
pipelining = True # 启用管道模式(关键优化)
ssh_args = -o ControlMaster=auto -o ControlPersist=600s# Playbook 级别优化
- name: 优化示例
hosts: web
gather_facts: yes
strategy: free # free 策略:不等待慢主机
tasks:
# 仅在需要时收集指定 Facts
- name: 收集网络信息
setup:
gather_subset:
- network
- hardware
when: need_network_facts | default(false)
# 使用 async 异步执行长任务
- name: 异步安装系统更新
yum:
name: "*"
state: latest
async: 600
poll: 0
register: yum_update
# ... 其他任务 ...
- name: 等待系统更新完成
async_status:
jid: "{{ yum_update.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 60
delay: 1010.4 调试技巧
# 详细输出
ansible-playbook site.yml -v # 基本
ansible-playbook site.yml -vvv # 详细
ansible-playbook site.yml -vvvv # 连接级别调试
# 只检查不执行
ansible-playbook site.yml --check --diff
# 逐步确认
ansible-playbook site.yml --step
# 限制到特定主机
ansible-playbook site.yml --limit web01
# 从特定任务开始
ansible-playbook site.yml --start-at-task="Deploy Config"
# 列出所有任务
ansible-playbook site.yml --list-tasks
# 列出所有主机
ansible-playbook site.yml --list-hosts
# 性能分析
ANSIBLE_CALLBACK_WHITELIST=timer,profile_tasks ansible-playbook site.yml# Playbook 中的调试任务
- name: 打印所有变量
debug:
var: hostvars[inventory_hostname]
verbosity: 1 # 仅在 -v 时显示
- name: 打印特定信息
debug:
msg: |
主机名: {{ inventory_hostname }}
IP 地址: {{ ansible_default_ipv4.address }}
系统: {{ ansible_distribution }} {{ ansible_distribution_version }}
内存: {{ ansible_memtotal_mb }}MB
CPU: {{ ansible_processor_vcpus }} cores
# 使用 assert 进行断言检查
- name: 验证前置条件
assert:
that:
- ansible_memtotal_mb >= 4096
- ansible_distribution in ["CentOS", "Ubuntu", "Debian"]
- ansible_distribution_major_version | int >= 7
fail_msg: "主机不满足最低系统要求"
success_msg: "前置条件检查通过"10.5 项目结构推荐
ansible-project/
├── ansible.cfg
├── site.yml # 主入口
├── requirements.yml # Galaxy 依赖
├── inventory/
│ ├── production/
│ │ ├── hosts.yml
│ │ ├── group_vars/
│ │ │ ├── all.yml
│ │ │ ├── web.yml
│ │ │ └── vault.yml # 加密变量
│ │ └── host_vars/
│ └── staging/
│ ├── hosts.yml
│ └── group_vars/
├── playbooks/
│ ├── init-server.yml
│ ├── deploy-app.yml
│ └── hardening.yml
├── roles/
│ ├── common/
│ ├── nginx/
│ ├── mysql/
│ └── app/
├── templates/
├── files/
├── vars/
└── scripts/
└── dynamic_inventory.py附录:常用命令速查
💡 持续更新:本文档会随着实际运维经验的积累持续完善。如有疑问或建议,欢迎反馈。