7.2 CI/CD流水线设计
🔄 核心目标:构建自动化的持续集成和持续部署流水线,提升开发效率和部署质量
⏱️ 预计时长:45分钟
📊 难度级别:⭐⭐⭐⭐
🎯 学习目标
通过本节学习,你将掌握:
- 设计完整的CI/CD流水线架构
- 实现自动化测试和质量检测
- 配置多环境部署策略
- 建立监控和回滚机制
🏗️ CI/CD架构设计
流水线架构图
🔧 GitHub Actions实现
基础CI/CD配置
yaml
# .github/workflows/ci-cd.yml
name: MCP Server CI/CD
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
PYTHON_VERSION: "3.11"
jobs:
# 代码质量检查
quality-check:
runs-on: ubuntu-latest
steps:
- name: 检出代码
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: 配置Python环境
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: 安装依赖
run: |
pip install --upgrade pip
pip install poetry
poetry install --with dev
- name: 代码格式检查
run: |
poetry run black --check .
poetry run isort --check-only .
- name: 代码质量检查
run: |
poetry run flake8 .
poetry run pylint src/
poetry run mypy src/
- name: 安全扫描
run: |
poetry run bandit -r src/
poetry run safety check
# 单元测试
unit-tests:
runs-on: ubuntu-latest
needs: quality-check
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- name: 配置Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: 安装依赖
run: |
pip install poetry
poetry install --with dev
- name: 运行测试
run: |
poetry run pytest \
--cov=src \
--cov-report=xml \
--cov-report=html \
--junit-xml=test-results.xml
- name: 上传覆盖率报告
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
flags: unittests
name: codecov-umbrella
# 集成测试
integration-tests:
runs-on: ubuntu-latest
needs: unit-tests
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: testpass
POSTGRES_DB: testdb
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: 配置Python环境
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: 安装依赖
run: |
pip install poetry
poetry install --with dev
- name: 运行集成测试
env:
DATABASE_URL: postgresql://postgres:testpass@localhost:5432/testdb
REDIS_URL: redis://localhost:6379
run: |
poetry run pytest tests/integration/ -v
# 构建Docker镜像
build-image:
runs-on: ubuntu-latest
needs: [unit-tests, integration-tests]
if: github.event_name == 'push'
outputs:
image-digest: ${{ steps.build.outputs.digest }}
image-url: ${{ steps.build.outputs.image-url }}
steps:
- name: 检出代码
uses: actions/checkout@v4
- name: 配置Docker Buildx
uses: docker/setup-buildx-action@v3
- name: 登录容器注册表
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: 提取元数据
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix={{branch}}-
type=raw,value=latest,enable={{is_default_branch}}
- name: 构建并推送镜像
id: build
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64
# 部署到测试环境
deploy-staging:
runs-on: ubuntu-latest
needs: build-image
if: github.ref == 'refs/heads/develop'
environment:
name: staging
url: https://staging.example.com
steps:
- name: 检出代码
uses: actions/checkout@v4
- name: 配置kubectl
uses: azure/setup-kubectl@v3
with:
version: 'latest'
- name: 部署到Kubernetes
run: |
echo "${{ secrets.KUBECONFIG_STAGING }}" | base64 -d > /tmp/kubeconfig
export KUBECONFIG=/tmp/kubeconfig
# 更新镜像
kubectl set image deployment/mcp-server \
mcp-server=${{ needs.build-image.outputs.image-url }} \
-n staging
# 等待部署完成
kubectl rollout status deployment/mcp-server -n staging --timeout=300s
- name: 运行冒烟测试
run: |
# 等待服务就绪
sleep 30
# 健康检查
curl -f https://staging.example.com/health || exit 1
# API基础测试
poetry run pytest tests/smoke/ \
--base-url=https://staging.example.com
# 部署到生产环境
deploy-production:
runs-on: ubuntu-latest
needs: [build-image, deploy-staging]
if: github.ref == 'refs/heads/main'
environment:
name: production
url: https://api.example.com
steps:
- name: 检出代码
uses: actions/checkout@v4
- name: 人工审批检查
uses: trstringer/manual-approval@v1
with:
secret: ${{ github.TOKEN }}
approvers: admin,devops-team
minimum-approvals: 2
issue-title: "生产环境部署审批"
issue-body: |
请审批以下部署:
- 分支: ${{ github.ref }}
- 提交: ${{ github.sha }}
- 镜像: ${{ needs.build-image.outputs.image-url }}
- name: 蓝绿部署
run: |
echo "${{ secrets.KUBECONFIG_PRODUCTION }}" | base64 -d > /tmp/kubeconfig
export KUBECONFIG=/tmp/kubeconfig
# 创建新版本部署
envsubst < k8s/deployment-green.yaml | kubectl apply -f -
# 等待新版本就绪
kubectl rollout status deployment/mcp-server-green -n production
# 切换流量
kubectl patch service mcp-server -n production \
-p '{"spec":{"selector":{"version":"green"}}}'
# 验证部署
sleep 60
curl -f https://api.example.com/health || exit 1
- name: 清理旧版本
run: |
# 保留旧版本5分钟用于快速回滚
sleep 300
kubectl delete deployment mcp-server-blue -n production --ignore-not-found
🏭 Jenkins Pipeline实现
Jenkinsfile配置
groovy
// Jenkinsfile
pipeline {
agent any
environment {
DOCKER_REGISTRY = 'your-registry.com'
IMAGE_NAME = 'mcp-server'
PYTHON_VERSION = '3.11'
}
options {
buildDiscarder(logRotator(numToKeepStr: '10'))
timeout(time: 1, unit: 'HOURS')
timestamps()
}
stages {
stage('准备环境') {
steps {
script {
env.BUILD_VERSION = sh(
script: "echo ${env.BUILD_NUMBER}-${env.GIT_COMMIT[0..7]}",
returnStdout: true
).trim()
}
echo "构建版本: ${env.BUILD_VERSION}"
// 清理工作空间
cleanWs()
// 检出代码
checkout scm
}
}
stage('代码质量检查') {
parallel {
stage('格式检查') {
steps {
sh '''
python -m venv venv
. venv/bin/activate
pip install black isort flake8
black --check .
isort --check-only .
flake8 .
'''
}
}
stage('安全扫描') {
steps {
sh '''
. venv/bin/activate
pip install bandit safety
bandit -r src/ -f json -o bandit-report.json
safety check --json --output safety-report.json
'''
}
post {
always {
archiveArtifacts artifacts: '*-report.json'
}
}
}
}
}
stage('测试') {
parallel {
stage('单元测试') {
steps {
sh '''
. venv/bin/activate
pip install pytest pytest-cov
pytest tests/unit/ \
--cov=src \
--cov-report=xml \
--junit-xml=unit-test-results.xml
'''
}
post {
always {
publishTestResults testResultsPattern: 'unit-test-results.xml'
publishCoverage adapters: [
coberturaAdapter('coverage.xml')
], sourceFileResolver: sourceFiles('STORE_LAST_BUILD')
}
}
}
stage('集成测试') {
steps {
script {
docker.image('postgres:15').withRun('-e POSTGRES_DB=testdb -e POSTGRES_PASSWORD=testpass') { postgres ->
docker.image('redis:7').withRun() { redis ->
sh '''
. venv/bin/activate
export DATABASE_URL="postgresql://postgres:testpass@${POSTGRES_PORT_5432_TCP_ADDR}:${POSTGRES_PORT_5432_TCP_PORT}/testdb"
export REDIS_URL="redis://${REDIS_PORT_6379_TCP_ADDR}:${REDIS_PORT_6379_TCP_PORT}"
pytest tests/integration/ \
--junit-xml=integration-test-results.xml
'''
}
}
}
}
post {
always {
publishTestResults testResultsPattern: 'integration-test-results.xml'
}
}
}
}
}
stage('构建镜像') {
steps {
script {
def image = docker.build("${env.DOCKER_REGISTRY}/${env.IMAGE_NAME}:${env.BUILD_VERSION}")
// 推送镜像
docker.withRegistry("https://${env.DOCKER_REGISTRY}", 'docker-registry-credentials') {
image.push()
image.push('latest')
}
env.IMAGE_TAG = "${env.DOCKER_REGISTRY}/${env.IMAGE_NAME}:${env.BUILD_VERSION}"
}
}
}
stage('部署测试环境') {
when {
branch 'develop'
}
steps {
script {
// 部署到测试环境
sh """
helm upgrade --install mcp-server-staging ./helm/mcp-server \\
--set image.tag=${env.BUILD_VERSION} \\
--set environment=staging \\
--namespace staging
"""
// 等待部署完成
sh 'kubectl rollout status deployment/mcp-server-staging -n staging --timeout=300s'
// 运行冒烟测试
sh '''
sleep 30
curl -f http://mcp-server-staging.staging.svc.cluster.local/health
'''
}
}
}
stage('生产部署审批') {
when {
branch 'main'
}
steps {
script {
def deployApproved = input(
id: 'deploy-approval',
message: '是否部署到生产环境?',
parameters: [
choice(
choices: ['是', '否'],
description: '选择是否继续部署',
name: 'DEPLOY_CHOICE'
)
]
)
if (deployApproved != '是') {
error('用户取消了生产环境部署')
}
}
}
}
stage('生产环境部署') {
when {
branch 'main'
}
steps {
script {
// 蓝绿部署
sh """
# 部署绿色版本
helm upgrade --install mcp-server-green ./helm/mcp-server \\
--set image.tag=${env.BUILD_VERSION} \\
--set environment=production \\
--set service.selector.version=green \\
--namespace production
# 等待就绪
kubectl rollout status deployment/mcp-server-green -n production --timeout=600s
# 健康检查
kubectl exec -n production deployment/mcp-server-green -- curl -f http://localhost:8000/health
# 切换流量
kubectl patch service mcp-server -n production \\
-p '{"spec":{"selector":{"version":"green"}}}'
# 验证生产环境
sleep 60
curl -f https://api.example.com/health
"""
}
}
post {
success {
script {
// 清理蓝色版本
sh '''
sleep 300 # 保留5分钟用于快速回滚
helm uninstall mcp-server-blue -n production || true
'''
}
}
failure {
script {
// 自动回滚
sh '''
kubectl patch service mcp-server -n production \\
-p '{"spec":{"selector":{"version":"blue"}}}'
helm uninstall mcp-server-green -n production || true
'''
}
}
}
}
}
post {
always {
// 清理
sh 'docker system prune -f'
cleanWs()
}
success {
// 发送成功通知
slackSend(
channel: '#deployments',
color: 'good',
message: "✅ 部署成功: ${env.JOB_NAME} - ${env.BUILD_VERSION}"
)
}
failure {
// 发送失败通知
slackSend(
channel: '#deployments',
color: 'danger',
message: "❌ 部署失败: ${env.JOB_NAME} - ${env.BUILD_VERSION}"
)
}
}
}
🔍 GitLab CI/CD实现
.gitlab-ci.yml配置
yaml
# .gitlab-ci.yml
stages:
- test
- build
- deploy-staging
- deploy-production
variables:
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
PYTHON_VERSION: "3.11"
# 模板定义
.python_template: &python_template
image: python:$PYTHON_VERSION
before_script:
- pip install --upgrade pip
- pip install poetry
- poetry install --with dev
cache:
paths:
- .venv/
# 代码质量检查
code_quality:
<<: *python_template
stage: test
script:
- poetry run black --check .
- poetry run isort --check-only .
- poetry run flake8 .
- poetry run pylint src/
- poetry run mypy src/
artifacts:
reports:
codequality: codequality-report.json
# 安全扫描
security_scan:
<<: *python_template
stage: test
script:
- poetry run bandit -r src/ -f json -o bandit-report.json
- poetry run safety check --json --output safety-report.json
artifacts:
reports:
sast: bandit-report.json
paths:
- safety-report.json
# 单元测试
unit_tests:
<<: *python_template
stage: test
script:
- poetry run pytest tests/unit/
--cov=src
--cov-report=xml
--cov-report=term
--junit-xml=unit-test-report.xml
artifacts:
reports:
junit: unit-test-report.xml
coverage_report:
coverage_format: cobertura
path: coverage.xml
coverage: '/TOTAL.+ ([0-9]{1,3}%)/'
# 集成测试
integration_tests:
<<: *python_template
stage: test
services:
- name: postgres:15
alias: postgres
variables:
POSTGRES_DB: testdb
POSTGRES_USER: postgres
POSTGRES_PASSWORD: testpass
- name: redis:7
alias: redis
variables:
DATABASE_URL: "postgresql://postgres:testpass@postgres:5432/testdb"
REDIS_URL: "redis://redis:6379"
script:
- poetry run pytest tests/integration/
--junit-xml=integration-test-report.xml
artifacts:
reports:
junit: integration-test-report.xml
# 构建Docker镜像
build_image:
stage: build
image: docker:latest
services:
- docker:dind
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- docker build -t $IMAGE_TAG .
- docker push $IMAGE_TAG
- docker tag $IMAGE_TAG $CI_REGISTRY_IMAGE:latest
- docker push $CI_REGISTRY_IMAGE:latest
only:
- main
- develop
# 部署到测试环境
deploy_staging:
stage: deploy-staging
image: bitnami/kubectl:latest
environment:
name: staging
url: https://staging.example.com
before_script:
- echo $KUBECONFIG_STAGING | base64 -d > /tmp/kubeconfig
- export KUBECONFIG=/tmp/kubeconfig
script:
- kubectl set image deployment/mcp-server
mcp-server=$IMAGE_TAG -n staging
- kubectl rollout status deployment/mcp-server -n staging --timeout=300s
- sleep 30
- curl -f https://staging.example.com/health
only:
- develop
# 生产环境部署
deploy_production:
stage: deploy-production
image: bitnami/kubectl:latest
environment:
name: production
url: https://api.example.com
before_script:
- echo $KUBECONFIG_PRODUCTION | base64 -d > /tmp/kubeconfig
- export KUBECONFIG=/tmp/kubeconfig
script:
# 蓝绿部署策略
- |
# 检查当前活跃版本
CURRENT_VERSION=$(kubectl get service mcp-server -n production -o jsonpath='{.spec.selector.version}' || echo "blue")
if [ "$CURRENT_VERSION" = "blue" ]; then
NEW_VERSION="green"
OLD_VERSION="blue"
else
NEW_VERSION="blue"
OLD_VERSION="green"
fi
echo "当前版本: $CURRENT_VERSION"
echo "新版本: $NEW_VERSION"
# 部署新版本
envsubst < k8s/deployment-template.yaml |
sed "s/{{VERSION}}/$NEW_VERSION/g" |
sed "s/{{IMAGE_TAG}}/$IMAGE_TAG/g" |
kubectl apply -f -
# 等待新版本就绪
kubectl rollout status deployment/mcp-server-$NEW_VERSION -n production --timeout=600s
# 健康检查
kubectl exec -n production deployment/mcp-server-$NEW_VERSION -- curl -f http://localhost:8000/health
# 切换流量
kubectl patch service mcp-server -n production -p "{\"spec\":{\"selector\":{\"version\":\"$NEW_VERSION\"}}}"
# 验证生产环境
sleep 60
curl -f https://api.example.com/health
# 清理旧版本(延迟5分钟)
(sleep 300 && kubectl delete deployment mcp-server-$OLD_VERSION -n production --ignore-not-found) &
when: manual
only:
- main
📊 质量门禁配置
SonarQube集成
yaml
# sonar-project.properties
sonar.projectKey=mcp-server
sonar.organization=your-org
sonar.host.url=https://sonarcloud.io
# 源码路径
sonar.sources=src/
sonar.tests=tests/
# Python特定配置
sonar.python.coverage.reportPaths=coverage.xml
sonar.python.xunit.reportPath=test-results.xml
# 质量门禁
sonar.qualitygate.wait=true
# 排除文件
sonar.exclusions=**/*.pyc,**/__pycache__/**,**/migrations/**
质量检查脚本
python
#!/usr/bin/env python3
# scripts/quality_check.py
import subprocess
import sys
from pathlib import Path
from typing import Dict, List, Tuple
class QualityChecker:
"""代码质量检查器"""
def __init__(self, project_root: Path):
self.project_root = project_root
self.results: Dict[str, bool] = {}
def run_command(self, command: List[str]) -> Tuple[bool, str]:
"""运行命令并返回结果"""
try:
result = subprocess.run(
command,
cwd=self.project_root,
capture_output=True,
text=True,
check=True
)
return True, result.stdout
except subprocess.CalledProcessError as e:
return False, e.stderr
def check_formatting(self) -> bool:
"""检查代码格式"""
print("🔍 检查代码格式...")
# Black检查
success, output = self.run_command(['black', '--check', '.'])
if not success:
print(f"❌ Black格式检查失败:\n{output}")
return False
# isort检查
success, output = self.run_command(['isort', '--check-only', '.'])
if not success:
print(f"❌ isort导入排序检查失败:\n{output}")
return False
print("✅ 代码格式检查通过")
return True
def check_linting(self) -> bool:
"""检查代码质量"""
print("🔍 检查代码质量...")
# Flake8检查
success, output = self.run_command(['flake8', '.'])
if not success:
print(f"❌ Flake8质量检查失败:\n{output}")
return False
# Pylint检查
success, output = self.run_command(['pylint', 'src/'])
if not success:
print(f"⚠️ Pylint警告:\n{output}")
# Pylint不作为阻塞条件
print("✅ 代码质量检查通过")
return True
def check_types(self) -> bool:
"""检查类型注解"""
print("🔍 检查类型注解...")
success, output = self.run_command(['mypy', 'src/'])
if not success:
print(f"❌ MyPy类型检查失败:\n{output}")
return False
print("✅ 类型检查通过")
return True
def check_security(self) -> bool:
"""安全检查"""
print("🔍 执行安全检查...")
# Bandit检查
success, output = self.run_command(['bandit', '-r', 'src/'])
if not success:
print(f"❌ Bandit安全检查失败:\n{output}")
return False
# Safety检查
success, output = self.run_command(['safety', 'check'])
if not success:
print(f"❌ Safety依赖安全检查失败:\n{output}")
return False
print("✅ 安全检查通过")
return True
def run_tests(self) -> bool:
"""运行测试"""
print("🔍 运行测试套件...")
success, output = self.run_command([
'pytest',
'tests/',
'--cov=src',
'--cov-report=term-missing',
'--cov-fail-under=80'
])
if not success:
print(f"❌ 测试失败:\n{output}")
return False
print("✅ 测试通过")
return True
def run_all_checks(self) -> bool:
"""运行所有检查"""
checks = [
('formatting', self.check_formatting),
('linting', self.check_linting),
('types', self.check_types),
('security', self.check_security),
('tests', self.run_tests)
]
all_passed = True
for check_name, check_func in checks:
try:
result = check_func()
self.results[check_name] = result
if not result:
all_passed = False
except Exception as e:
print(f"❌ 检查 {check_name} 时出错: {e}")
self.results[check_name] = False
all_passed = False
return all_passed
def print_summary(self):
"""打印检查摘要"""
print("\n" + "="*50)
print("📊 质量检查摘要")
print("="*50)
for check_name, result in self.results.items():
status = "✅ 通过" if result else "❌ 失败"
print(f"{check_name.ljust(15)}: {status}")
overall = all(self.results.values())
status = "✅ 全部通过" if overall else "❌ 存在问题"
print(f"\n{'整体状态'.ljust(15)}: {status}")
if not overall:
print("\n🚨 请修复以上问题后再提交代码!")
return False
else:
print("\n🎉 代码质量良好,可以提交!")
return True
def main():
"""主函数"""
project_root = Path(__file__).parent.parent
checker = QualityChecker(project_root)
print("🚀 开始代码质量检查...")
success = checker.run_all_checks()
checker.print_summary()
sys.exit(0 if success else 1)
if __name__ == '__main__':
main()
🚀 部署策略
蓝绿部署模板
yaml
# k8s/deployment-template.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server-{{VERSION}}
namespace: production
labels:
app: mcp-server
version: {{VERSION}}
spec:
replicas: 3
selector:
matchLabels:
app: mcp-server
version: {{VERSION}}
template:
metadata:
labels:
app: mcp-server
version: {{VERSION}}
spec:
containers:
- name: mcp-server
image: {{IMAGE_TAG}}
ports:
- containerPort: 8000
env:
- name: VERSION
value: {{VERSION}}
- name: ENVIRONMENT
value: production
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
Canary部署脚本
bash
#!/bin/bash
# scripts/canary_deploy.sh
set -e
NAMESPACE=${1:-production}
IMAGE_TAG=${2:-latest}
CANARY_WEIGHT=${3:-10}
echo "🚀 开始Canary部署"
echo "命名空间: $NAMESPACE"
echo "镜像标签: $IMAGE_TAG"
echo "流量权重: $CANARY_WEIGHT%"
# 部署Canary版本
echo "📦 部署Canary版本..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server-canary
namespace: $NAMESPACE
spec:
replicas: 1
selector:
matchLabels:
app: mcp-server
version: canary
template:
metadata:
labels:
app: mcp-server
version: canary
spec:
containers:
- name: mcp-server
image: $IMAGE_TAG
ports:
- containerPort: 8000
env:
- name: VERSION
value: canary
EOF
# 等待Canary就绪
echo "⏳ 等待Canary版本就绪..."
kubectl rollout status deployment/mcp-server-canary -n $NAMESPACE
# 配置流量分割
echo "🔀 配置流量分割..."
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: mcp-server
namespace: $NAMESPACE
spec:
http:
- match:
- headers:
canary:
exact: "true"
route:
- destination:
host: mcp-server
subset: canary
- route:
- destination:
host: mcp-server
subset: stable
weight: $((100 - CANARY_WEIGHT))
- destination:
host: mcp-server
subset: canary
weight: $CANARY_WEIGHT
EOF
echo "✅ Canary部署完成"
echo "💡 使用以下命令监控部署:"
echo " kubectl get pods -n $NAMESPACE -l version=canary"
echo " kubectl logs -n $NAMESPACE -l version=canary -f"
# 监控脚本
echo "📊 开始监控Canary版本..."
for i in {1..10}; do
echo "检查 $i/10..."
# 健康检查
if ! kubectl exec -n $NAMESPACE deployment/mcp-server-canary -- curl -f http://localhost:8000/health; then
echo "❌ Canary版本健康检查失败,开始回滚..."
kubectl delete deployment mcp-server-canary -n $NAMESPACE
exit 1
fi
# 错误率检查(这里需要集成监控系统)
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query?query=rate(http_requests_total{job=\"mcp-server\",status=~\"5..\"}[5m])" | jq -r '.data.result[0].value[1] // "0"')
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "❌ Canary版本错误率过高: $ERROR_RATE,开始回滚..."
kubectl delete deployment mcp-server-canary -n $NAMESPACE
exit 1
fi
sleep 60
done
echo "✅ Canary版本运行正常,可以继续推广"
📈 监控和告警
Prometheus监控配置
yaml
# monitoring/prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: mcp-server-alerts
spec:
groups:
- name: mcp-server
rules:
- alert: MCPServerDown
expr: up{job="mcp-server"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MCP Server is down"
description: "MCP Server has been down for more than 1 minute"
- alert: MCPServerHighErrorRate
expr: rate(http_requests_total{job="mcp-server",status=~"5.."}[5m]) > 0.01
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} errors per second"
- alert: MCPServerHighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="mcp-server"}[5m])) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "High latency detected"
description: "95th percentile latency is {{ $value }}s"
部署通知脚本
python
#!/usr/bin/env python3
# scripts/deployment_notification.py
import requests
import os
import sys
from datetime import datetime
def send_slack_notification(webhook_url: str, message: str, status: str):
"""发送Slack通知"""
color = "good" if status == "success" else "danger"
payload = {
"attachments": [{
"color": color,
"fields": [{
"title": "部署通知",
"value": message,
"short": False
}],
"timestamp": datetime.now().isoformat()
}]
}
response = requests.post(webhook_url, json=payload)
response.raise_for_status()
def send_email_notification(smtp_config: dict, message: str, status: str):
"""发送邮件通知"""
import smtplib
from email.mime.text import MimeText
from email.mime.multipart import MimeMultipart
msg = MimeMultipart()
msg['From'] = smtp_config['from']
msg['To'] = smtp_config['to']
msg['Subject'] = f"部署通知 - {status.upper()}"
msg.attach(MimeText(message, 'plain'))
server = smtplib.SMTP(smtp_config['host'], smtp_config['port'])
server.starttls()
server.login(smtp_config['user'], smtp_config['password'])
server.send_message(msg)
server.quit()
def main():
"""主函数"""
if len(sys.argv) < 3:
print("Usage: python notification.py <status> <message>")
sys.exit(1)
status = sys.argv[1] # success or failure
message = sys.argv[2]
# Slack通知
slack_webhook = os.getenv('SLACK_WEBHOOK_URL')
if slack_webhook:
try:
send_slack_notification(slack_webhook, message, status)
print("✅ Slack通知发送成功")
except Exception as e:
print(f"❌ Slack通知发送失败: {e}")
# 邮件通知
smtp_config = {
'host': os.getenv('SMTP_HOST'),
'port': int(os.getenv('SMTP_PORT', 587)),
'user': os.getenv('SMTP_USER'),
'password': os.getenv('SMTP_PASSWORD'),
'from': os.getenv('SMTP_FROM'),
'to': os.getenv('SMTP_TO')
}
if all(smtp_config.values()):
try:
send_email_notification(smtp_config, message, status)
print("✅ 邮件通知发送成功")
except Exception as e:
print(f"❌ 邮件通知发送失败: {e}")
if __name__ == '__main__':
main()
🎯 最佳实践
1. 流水线设计原则
- 快速反馈:尽早发现问题
- 自动化程度:减少人工干预
- 可靠性:确保部署的一致性
- 可观测性:完整的日志和监控
2. 分支策略
- main分支:生产环境代码,严格保护
- develop分支:开发集成分支,自动部署到测试环境
- feature分支:功能开发分支,合并前需要review
3. 测试策略
- 单元测试:快速反馈,覆盖率>80%
- 集成测试:验证组件协作
- 端到端测试:验证完整流程
- 性能测试:确保系统性能
4. 部署策略选择
- 蓝绿部署:零停机,快速回滚
- 滚动部署:资源利用率高
- 金丝雀部署:风险控制,渐进式发布
🚀 下一步
完成CI/CD流水线设计后,你可以:
- 学习监控和日志系统 → 7.3 监控与日志系统
- 了解性能调优技巧 → 7.4 性能调优与扩容
- 掌握故障排除方法 → 7.5 故障排除与维护
📚 扩展阅读: