Skip to content

7.2 CI/CD流水线设计

🔄 核心目标:构建自动化的持续集成和持续部署流水线,提升开发效率和部署质量
⏱️ 预计时长:45分钟
📊 难度级别:⭐⭐⭐⭐

🎯 学习目标

通过本节学习,你将掌握:

  • 设计完整的CI/CD流水线架构
  • 实现自动化测试和质量检测
  • 配置多环境部署策略
  • 建立监控和回滚机制

🏗️ CI/CD架构设计

流水线架构图

🔧 GitHub Actions实现

基础CI/CD配置

yaml
# .github/workflows/ci-cd.yml
name: MCP Server CI/CD

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  PYTHON_VERSION: "3.11"

jobs:
  # 代码质量检查
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - name: 检出代码
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: 配置Python环境
        uses: actions/setup-python@v4
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      - name: 安装依赖
        run: |
          pip install --upgrade pip
          pip install poetry
          poetry install --with dev

      - name: 代码格式检查
        run: |
          poetry run black --check .
          poetry run isort --check-only .

      - name: 代码质量检查
        run: |
          poetry run flake8 .
          poetry run pylint src/
          poetry run mypy src/

      - name: 安全扫描
        run: |
          poetry run bandit -r src/
          poetry run safety check

  # 单元测试
  unit-tests:
    runs-on: ubuntu-latest
    needs: quality-check
    strategy:
      matrix:
        python-version: ["3.9", "3.10", "3.11", "3.12"]
    
    steps:
      - uses: actions/checkout@v4
      
      - name: 配置Python ${{ matrix.python-version }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}

      - name: 安装依赖
        run: |
          pip install poetry
          poetry install --with dev

      - name: 运行测试
        run: |
          poetry run pytest \
            --cov=src \
            --cov-report=xml \
            --cov-report=html \
            --junit-xml=test-results.xml

      - name: 上传覆盖率报告
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage.xml
          flags: unittests
          name: codecov-umbrella

  # 集成测试
  integration-tests:
    runs-on: ubuntu-latest
    needs: unit-tests
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: testpass
          POSTGRES_DB: testdb
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

      redis:
        image: redis:7
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4
      
      - name: 配置Python环境
        uses: actions/setup-python@v4
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      - name: 安装依赖
        run: |
          pip install poetry
          poetry install --with dev

      - name: 运行集成测试
        env:
          DATABASE_URL: postgresql://postgres:testpass@localhost:5432/testdb
          REDIS_URL: redis://localhost:6379
        run: |
          poetry run pytest tests/integration/ -v

  # 构建Docker镜像
  build-image:
    runs-on: ubuntu-latest
    needs: [unit-tests, integration-tests]
    if: github.event_name == 'push'
    
    outputs:
      image-digest: ${{ steps.build.outputs.digest }}
      image-url: ${{ steps.build.outputs.image-url }}

    steps:
      - name: 检出代码
        uses: actions/checkout@v4

      - name: 配置Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: 登录容器注册表
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: 提取元数据
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=sha,prefix={{branch}}-
            type=raw,value=latest,enable={{is_default_branch}}

      - name: 构建并推送镜像
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          platforms: linux/amd64,linux/arm64

  # 部署到测试环境
  deploy-staging:
    runs-on: ubuntu-latest
    needs: build-image
    if: github.ref == 'refs/heads/develop'
    environment:
      name: staging
      url: https://staging.example.com

    steps:
      - name: 检出代码
        uses: actions/checkout@v4

      - name: 配置kubectl
        uses: azure/setup-kubectl@v3
        with:
          version: 'latest'

      - name: 部署到Kubernetes
        run: |
          echo "${{ secrets.KUBECONFIG_STAGING }}" | base64 -d > /tmp/kubeconfig
          export KUBECONFIG=/tmp/kubeconfig
          
          # 更新镜像
          kubectl set image deployment/mcp-server \
            mcp-server=${{ needs.build-image.outputs.image-url }} \
            -n staging
          
          # 等待部署完成
          kubectl rollout status deployment/mcp-server -n staging --timeout=300s

      - name: 运行冒烟测试
        run: |
          # 等待服务就绪
          sleep 30
          
          # 健康检查
          curl -f https://staging.example.com/health || exit 1
          
          # API基础测试
          poetry run pytest tests/smoke/ \
            --base-url=https://staging.example.com

  # 部署到生产环境
  deploy-production:
    runs-on: ubuntu-latest
    needs: [build-image, deploy-staging]
    if: github.ref == 'refs/heads/main'
    environment:
      name: production
      url: https://api.example.com

    steps:
      - name: 检出代码
        uses: actions/checkout@v4

      - name: 人工审批检查
        uses: trstringer/manual-approval@v1
        with:
          secret: ${{ github.TOKEN }}
          approvers: admin,devops-team
          minimum-approvals: 2
          issue-title: "生产环境部署审批"
          issue-body: |
            请审批以下部署:
            - 分支: ${{ github.ref }}
            - 提交: ${{ github.sha }}
            - 镜像: ${{ needs.build-image.outputs.image-url }}

      - name: 蓝绿部署
        run: |
          echo "${{ secrets.KUBECONFIG_PRODUCTION }}" | base64 -d > /tmp/kubeconfig
          export KUBECONFIG=/tmp/kubeconfig
          
          # 创建新版本部署
          envsubst < k8s/deployment-green.yaml | kubectl apply -f -
          
          # 等待新版本就绪
          kubectl rollout status deployment/mcp-server-green -n production
          
          # 切换流量
          kubectl patch service mcp-server -n production \
            -p '{"spec":{"selector":{"version":"green"}}}'
          
          # 验证部署
          sleep 60
          curl -f https://api.example.com/health || exit 1

      - name: 清理旧版本
        run: |
          # 保留旧版本5分钟用于快速回滚
          sleep 300
          kubectl delete deployment mcp-server-blue -n production --ignore-not-found

🏭 Jenkins Pipeline实现

Jenkinsfile配置

groovy
// Jenkinsfile
pipeline {
    agent any
    
    environment {
        DOCKER_REGISTRY = 'your-registry.com'
        IMAGE_NAME = 'mcp-server'
        PYTHON_VERSION = '3.11'
    }
    
    options {
        buildDiscarder(logRotator(numToKeepStr: '10'))
        timeout(time: 1, unit: 'HOURS')
        timestamps()
    }
    
    stages {
        stage('准备环境') {
            steps {
                script {
                    env.BUILD_VERSION = sh(
                        script: "echo ${env.BUILD_NUMBER}-${env.GIT_COMMIT[0..7]}",
                        returnStdout: true
                    ).trim()
                }
                
                echo "构建版本: ${env.BUILD_VERSION}"
                
                // 清理工作空间
                cleanWs()
                
                // 检出代码
                checkout scm
            }
        }
        
        stage('代码质量检查') {
            parallel {
                stage('格式检查') {
                    steps {
                        sh '''
                            python -m venv venv
                            . venv/bin/activate
                            pip install black isort flake8
                            
                            black --check .
                            isort --check-only .
                            flake8 .
                        '''
                    }
                }
                
                stage('安全扫描') {
                    steps {
                        sh '''
                            . venv/bin/activate
                            pip install bandit safety
                            
                            bandit -r src/ -f json -o bandit-report.json
                            safety check --json --output safety-report.json
                        '''
                    }
                    
                    post {
                        always {
                            archiveArtifacts artifacts: '*-report.json'
                        }
                    }
                }
            }
        }
        
        stage('测试') {
            parallel {
                stage('单元测试') {
                    steps {
                        sh '''
                            . venv/bin/activate
                            pip install pytest pytest-cov
                            
                            pytest tests/unit/ \
                                --cov=src \
                                --cov-report=xml \
                                --junit-xml=unit-test-results.xml
                        '''
                    }
                    
                    post {
                        always {
                            publishTestResults testResultsPattern: 'unit-test-results.xml'
                            publishCoverage adapters: [
                                coberturaAdapter('coverage.xml')
                            ], sourceFileResolver: sourceFiles('STORE_LAST_BUILD')
                        }
                    }
                }
                
                stage('集成测试') {
                    steps {
                        script {
                            docker.image('postgres:15').withRun('-e POSTGRES_DB=testdb -e POSTGRES_PASSWORD=testpass') { postgres ->
                                docker.image('redis:7').withRun() { redis ->
                                    sh '''
                                        . venv/bin/activate
                                        
                                        export DATABASE_URL="postgresql://postgres:testpass@${POSTGRES_PORT_5432_TCP_ADDR}:${POSTGRES_PORT_5432_TCP_PORT}/testdb"
                                        export REDIS_URL="redis://${REDIS_PORT_6379_TCP_ADDR}:${REDIS_PORT_6379_TCP_PORT}"
                                        
                                        pytest tests/integration/ \
                                            --junit-xml=integration-test-results.xml
                                    '''
                                }
                            }
                        }
                    }
                    
                    post {
                        always {
                            publishTestResults testResultsPattern: 'integration-test-results.xml'
                        }
                    }
                }
            }
        }
        
        stage('构建镜像') {
            steps {
                script {
                    def image = docker.build("${env.DOCKER_REGISTRY}/${env.IMAGE_NAME}:${env.BUILD_VERSION}")
                    
                    // 推送镜像
                    docker.withRegistry("https://${env.DOCKER_REGISTRY}", 'docker-registry-credentials') {
                        image.push()
                        image.push('latest')
                    }
                    
                    env.IMAGE_TAG = "${env.DOCKER_REGISTRY}/${env.IMAGE_NAME}:${env.BUILD_VERSION}"
                }
            }
        }
        
        stage('部署测试环境') {
            when {
                branch 'develop'
            }
            
            steps {
                script {
                    // 部署到测试环境
                    sh """
                        helm upgrade --install mcp-server-staging ./helm/mcp-server \\
                            --set image.tag=${env.BUILD_VERSION} \\
                            --set environment=staging \\
                            --namespace staging
                    """
                    
                    // 等待部署完成
                    sh 'kubectl rollout status deployment/mcp-server-staging -n staging --timeout=300s'
                    
                    // 运行冒烟测试
                    sh '''
                        sleep 30
                        curl -f http://mcp-server-staging.staging.svc.cluster.local/health
                    '''
                }
            }
        }
        
        stage('生产部署审批') {
            when {
                branch 'main'
            }
            
            steps {
                script {
                    def deployApproved = input(
                        id: 'deploy-approval',
                        message: '是否部署到生产环境?',
                        parameters: [
                            choice(
                                choices: ['是', '否'],
                                description: '选择是否继续部署',
                                name: 'DEPLOY_CHOICE'
                            )
                        ]
                    )
                    
                    if (deployApproved != '是') {
                        error('用户取消了生产环境部署')
                    }
                }
            }
        }
        
        stage('生产环境部署') {
            when {
                branch 'main'
            }
            
            steps {
                script {
                    // 蓝绿部署
                    sh """
                        # 部署绿色版本
                        helm upgrade --install mcp-server-green ./helm/mcp-server \\
                            --set image.tag=${env.BUILD_VERSION} \\
                            --set environment=production \\
                            --set service.selector.version=green \\
                            --namespace production
                        
                        # 等待就绪
                        kubectl rollout status deployment/mcp-server-green -n production --timeout=600s
                        
                        # 健康检查
                        kubectl exec -n production deployment/mcp-server-green -- curl -f http://localhost:8000/health
                        
                        # 切换流量
                        kubectl patch service mcp-server -n production \\
                            -p '{"spec":{"selector":{"version":"green"}}}'
                        
                        # 验证生产环境
                        sleep 60
                        curl -f https://api.example.com/health
                    """
                }
            }
            
            post {
                success {
                    script {
                        // 清理蓝色版本
                        sh '''
                            sleep 300  # 保留5分钟用于快速回滚
                            helm uninstall mcp-server-blue -n production || true
                        '''
                    }
                }
                
                failure {
                    script {
                        // 自动回滚
                        sh '''
                            kubectl patch service mcp-server -n production \\
                                -p '{"spec":{"selector":{"version":"blue"}}}'
                            
                            helm uninstall mcp-server-green -n production || true
                        '''
                    }
                }
            }
        }
    }
    
    post {
        always {
            // 清理
            sh 'docker system prune -f'
            cleanWs()
        }
        
        success {
            // 发送成功通知
            slackSend(
                channel: '#deployments',
                color: 'good',
                message: "✅ 部署成功: ${env.JOB_NAME} - ${env.BUILD_VERSION}"
            )
        }
        
        failure {
            // 发送失败通知
            slackSend(
                channel: '#deployments',
                color: 'danger',
                message: "❌ 部署失败: ${env.JOB_NAME} - ${env.BUILD_VERSION}"
            )
        }
    }
}

🔍 GitLab CI/CD实现

.gitlab-ci.yml配置

yaml
# .gitlab-ci.yml
stages:
  - test
  - build
  - deploy-staging
  - deploy-production

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  PYTHON_VERSION: "3.11"

# 模板定义
.python_template: &python_template
  image: python:$PYTHON_VERSION
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install --with dev
  cache:
    paths:
      - .venv/

# 代码质量检查
code_quality:
  <<: *python_template
  stage: test
  script:
    - poetry run black --check .
    - poetry run isort --check-only .
    - poetry run flake8 .
    - poetry run pylint src/
    - poetry run mypy src/
  artifacts:
    reports:
      codequality: codequality-report.json

# 安全扫描
security_scan:
  <<: *python_template
  stage: test
  script:
    - poetry run bandit -r src/ -f json -o bandit-report.json
    - poetry run safety check --json --output safety-report.json
  artifacts:
    reports:
      sast: bandit-report.json
    paths:
      - safety-report.json

# 单元测试
unit_tests:
  <<: *python_template
  stage: test
  script:
    - poetry run pytest tests/unit/ 
        --cov=src 
        --cov-report=xml 
        --cov-report=term
        --junit-xml=unit-test-report.xml
  artifacts:
    reports:
      junit: unit-test-report.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
  coverage: '/TOTAL.+ ([0-9]{1,3}%)/'

# 集成测试
integration_tests:
  <<: *python_template
  stage: test
  services:
    - name: postgres:15
      alias: postgres
      variables:
        POSTGRES_DB: testdb
        POSTGRES_USER: postgres
        POSTGRES_PASSWORD: testpass
    - name: redis:7
      alias: redis
  variables:
    DATABASE_URL: "postgresql://postgres:testpass@postgres:5432/testdb"
    REDIS_URL: "redis://redis:6379"
  script:
    - poetry run pytest tests/integration/ 
        --junit-xml=integration-test-report.xml
  artifacts:
    reports:
      junit: integration-test-report.xml

# 构建Docker镜像
build_image:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build -t $IMAGE_TAG .
    - docker push $IMAGE_TAG
    - docker tag $IMAGE_TAG $CI_REGISTRY_IMAGE:latest
    - docker push $CI_REGISTRY_IMAGE:latest
  only:
    - main
    - develop

# 部署到测试环境
deploy_staging:
  stage: deploy-staging
  image: bitnami/kubectl:latest
  environment:
    name: staging
    url: https://staging.example.com
  before_script:
    - echo $KUBECONFIG_STAGING | base64 -d > /tmp/kubeconfig
    - export KUBECONFIG=/tmp/kubeconfig
  script:
    - kubectl set image deployment/mcp-server 
        mcp-server=$IMAGE_TAG -n staging
    - kubectl rollout status deployment/mcp-server -n staging --timeout=300s
    - sleep 30
    - curl -f https://staging.example.com/health
  only:
    - develop

# 生产环境部署
deploy_production:
  stage: deploy-production
  image: bitnami/kubectl:latest
  environment:
    name: production
    url: https://api.example.com
  before_script:
    - echo $KUBECONFIG_PRODUCTION | base64 -d > /tmp/kubeconfig
    - export KUBECONFIG=/tmp/kubeconfig
  script:
    # 蓝绿部署策略
    - |
      # 检查当前活跃版本
      CURRENT_VERSION=$(kubectl get service mcp-server -n production -o jsonpath='{.spec.selector.version}' || echo "blue")
      
      if [ "$CURRENT_VERSION" = "blue" ]; then
        NEW_VERSION="green"
        OLD_VERSION="blue"
      else
        NEW_VERSION="blue"
        OLD_VERSION="green"
      fi
      
      echo "当前版本: $CURRENT_VERSION"
      echo "新版本: $NEW_VERSION"
      
      # 部署新版本
      envsubst < k8s/deployment-template.yaml | 
        sed "s/{{VERSION}}/$NEW_VERSION/g" | 
        sed "s/{{IMAGE_TAG}}/$IMAGE_TAG/g" | 
        kubectl apply -f -
      
      # 等待新版本就绪
      kubectl rollout status deployment/mcp-server-$NEW_VERSION -n production --timeout=600s
      
      # 健康检查
      kubectl exec -n production deployment/mcp-server-$NEW_VERSION -- curl -f http://localhost:8000/health
      
      # 切换流量
      kubectl patch service mcp-server -n production -p "{\"spec\":{\"selector\":{\"version\":\"$NEW_VERSION\"}}}"
      
      # 验证生产环境
      sleep 60
      curl -f https://api.example.com/health
      
      # 清理旧版本(延迟5分钟)
      (sleep 300 && kubectl delete deployment mcp-server-$OLD_VERSION -n production --ignore-not-found) &
  when: manual
  only:
    - main

📊 质量门禁配置

SonarQube集成

yaml
# sonar-project.properties
sonar.projectKey=mcp-server
sonar.organization=your-org
sonar.host.url=https://sonarcloud.io

# 源码路径
sonar.sources=src/
sonar.tests=tests/

# Python特定配置
sonar.python.coverage.reportPaths=coverage.xml
sonar.python.xunit.reportPath=test-results.xml

# 质量门禁
sonar.qualitygate.wait=true

# 排除文件
sonar.exclusions=**/*.pyc,**/__pycache__/**,**/migrations/**

质量检查脚本

python
#!/usr/bin/env python3
# scripts/quality_check.py

import subprocess
import sys
from pathlib import Path
from typing import Dict, List, Tuple

class QualityChecker:
    """代码质量检查器"""
    
    def __init__(self, project_root: Path):
        self.project_root = project_root
        self.results: Dict[str, bool] = {}
        
    def run_command(self, command: List[str]) -> Tuple[bool, str]:
        """运行命令并返回结果"""
        try:
            result = subprocess.run(
                command,
                cwd=self.project_root,
                capture_output=True,
                text=True,
                check=True
            )
            return True, result.stdout
        except subprocess.CalledProcessError as e:
            return False, e.stderr
    
    def check_formatting(self) -> bool:
        """检查代码格式"""
        print("🔍 检查代码格式...")
        
        # Black检查
        success, output = self.run_command(['black', '--check', '.'])
        if not success:
            print(f"❌ Black格式检查失败:\n{output}")
            return False
            
        # isort检查
        success, output = self.run_command(['isort', '--check-only', '.'])
        if not success:
            print(f"❌ isort导入排序检查失败:\n{output}")
            return False
            
        print("✅ 代码格式检查通过")
        return True
    
    def check_linting(self) -> bool:
        """检查代码质量"""
        print("🔍 检查代码质量...")
        
        # Flake8检查
        success, output = self.run_command(['flake8', '.'])
        if not success:
            print(f"❌ Flake8质量检查失败:\n{output}")
            return False
            
        # Pylint检查
        success, output = self.run_command(['pylint', 'src/'])
        if not success:
            print(f"⚠️ Pylint警告:\n{output}")
            # Pylint不作为阻塞条件
            
        print("✅ 代码质量检查通过")
        return True
    
    def check_types(self) -> bool:
        """检查类型注解"""
        print("🔍 检查类型注解...")
        
        success, output = self.run_command(['mypy', 'src/'])
        if not success:
            print(f"❌ MyPy类型检查失败:\n{output}")
            return False
            
        print("✅ 类型检查通过")
        return True
    
    def check_security(self) -> bool:
        """安全检查"""
        print("🔍 执行安全检查...")
        
        # Bandit检查
        success, output = self.run_command(['bandit', '-r', 'src/'])
        if not success:
            print(f"❌ Bandit安全检查失败:\n{output}")
            return False
            
        # Safety检查
        success, output = self.run_command(['safety', 'check'])
        if not success:
            print(f"❌ Safety依赖安全检查失败:\n{output}")
            return False
            
        print("✅ 安全检查通过")
        return True
    
    def run_tests(self) -> bool:
        """运行测试"""
        print("🔍 运行测试套件...")
        
        success, output = self.run_command([
            'pytest', 
            'tests/',
            '--cov=src',
            '--cov-report=term-missing',
            '--cov-fail-under=80'
        ])
        
        if not success:
            print(f"❌ 测试失败:\n{output}")
            return False
            
        print("✅ 测试通过")
        return True
    
    def run_all_checks(self) -> bool:
        """运行所有检查"""
        checks = [
            ('formatting', self.check_formatting),
            ('linting', self.check_linting),
            ('types', self.check_types),
            ('security', self.check_security),
            ('tests', self.run_tests)
        ]
        
        all_passed = True
        
        for check_name, check_func in checks:
            try:
                result = check_func()
                self.results[check_name] = result
                if not result:
                    all_passed = False
            except Exception as e:
                print(f"❌ 检查 {check_name} 时出错: {e}")
                self.results[check_name] = False
                all_passed = False
        
        return all_passed
    
    def print_summary(self):
        """打印检查摘要"""
        print("\n" + "="*50)
        print("📊 质量检查摘要")
        print("="*50)
        
        for check_name, result in self.results.items():
            status = "✅ 通过" if result else "❌ 失败"
            print(f"{check_name.ljust(15)}: {status}")
        
        overall = all(self.results.values())
        status = "✅ 全部通过" if overall else "❌ 存在问题"
        print(f"\n{'整体状态'.ljust(15)}: {status}")
        
        if not overall:
            print("\n🚨 请修复以上问题后再提交代码!")
            return False
        else:
            print("\n🎉 代码质量良好,可以提交!")
            return True

def main():
    """主函数"""
    project_root = Path(__file__).parent.parent
    checker = QualityChecker(project_root)
    
    print("🚀 开始代码质量检查...")
    success = checker.run_all_checks()
    checker.print_summary()
    
    sys.exit(0 if success else 1)

if __name__ == '__main__':
    main()

🚀 部署策略

蓝绿部署模板

yaml
# k8s/deployment-template.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server-{{VERSION}}
  namespace: production
  labels:
    app: mcp-server
    version: {{VERSION}}
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mcp-server
      version: {{VERSION}}
  template:
    metadata:
      labels:
        app: mcp-server
        version: {{VERSION}}
    spec:
      containers:
      - name: mcp-server
        image: {{IMAGE_TAG}}
        ports:
        - containerPort: 8000
        env:
        - name: VERSION
          value: {{VERSION}}
        - name: ENVIRONMENT
          value: production
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Canary部署脚本

bash
#!/bin/bash
# scripts/canary_deploy.sh

set -e

NAMESPACE=${1:-production}
IMAGE_TAG=${2:-latest}
CANARY_WEIGHT=${3:-10}

echo "🚀 开始Canary部署"
echo "命名空间: $NAMESPACE"
echo "镜像标签: $IMAGE_TAG"
echo "流量权重: $CANARY_WEIGHT%"

# 部署Canary版本
echo "📦 部署Canary版本..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server-canary
  namespace: $NAMESPACE
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mcp-server
      version: canary
  template:
    metadata:
      labels:
        app: mcp-server
        version: canary
    spec:
      containers:
      - name: mcp-server
        image: $IMAGE_TAG
        ports:
        - containerPort: 8000
        env:
        - name: VERSION
          value: canary
EOF

# 等待Canary就绪
echo "⏳ 等待Canary版本就绪..."
kubectl rollout status deployment/mcp-server-canary -n $NAMESPACE

# 配置流量分割
echo "🔀 配置流量分割..."
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: mcp-server
  namespace: $NAMESPACE
spec:
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: mcp-server
        subset: canary
  - route:
    - destination:
        host: mcp-server
        subset: stable
      weight: $((100 - CANARY_WEIGHT))
    - destination:
        host: mcp-server
        subset: canary
      weight: $CANARY_WEIGHT
EOF

echo "✅ Canary部署完成"
echo "💡 使用以下命令监控部署:"
echo "   kubectl get pods -n $NAMESPACE -l version=canary"
echo "   kubectl logs -n $NAMESPACE -l version=canary -f"

# 监控脚本
echo "📊 开始监控Canary版本..."
for i in {1..10}; do
    echo "检查 $i/10..."
    
    # 健康检查
    if ! kubectl exec -n $NAMESPACE deployment/mcp-server-canary -- curl -f http://localhost:8000/health; then
        echo "❌ Canary版本健康检查失败,开始回滚..."
        kubectl delete deployment mcp-server-canary -n $NAMESPACE
        exit 1
    fi
    
    # 错误率检查(这里需要集成监控系统)
    ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query?query=rate(http_requests_total{job=\"mcp-server\",status=~\"5..\"}[5m])" | jq -r '.data.result[0].value[1] // "0"')
    
    if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
        echo "❌ Canary版本错误率过高: $ERROR_RATE,开始回滚..."
        kubectl delete deployment mcp-server-canary -n $NAMESPACE
        exit 1
    fi
    
    sleep 60
done

echo "✅ Canary版本运行正常,可以继续推广"

📈 监控和告警

Prometheus监控配置

yaml
# monitoring/prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: mcp-server-alerts
spec:
  groups:
  - name: mcp-server
    rules:
    - alert: MCPServerDown
      expr: up{job="mcp-server"} == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "MCP Server is down"
        description: "MCP Server has been down for more than 1 minute"
    
    - alert: MCPServerHighErrorRate
      expr: rate(http_requests_total{job="mcp-server",status=~"5.."}[5m]) > 0.01
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "High error rate detected"
        description: "Error rate is {{ $value }} errors per second"
    
    - alert: MCPServerHighLatency
      expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="mcp-server"}[5m])) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High latency detected"
        description: "95th percentile latency is {{ $value }}s"

部署通知脚本

python
#!/usr/bin/env python3
# scripts/deployment_notification.py

import requests
import os
import sys
from datetime import datetime

def send_slack_notification(webhook_url: str, message: str, status: str):
    """发送Slack通知"""
    color = "good" if status == "success" else "danger"
    
    payload = {
        "attachments": [{
            "color": color,
            "fields": [{
                "title": "部署通知",
                "value": message,
                "short": False
            }],
            "timestamp": datetime.now().isoformat()
        }]
    }
    
    response = requests.post(webhook_url, json=payload)
    response.raise_for_status()

def send_email_notification(smtp_config: dict, message: str, status: str):
    """发送邮件通知"""
    import smtplib
    from email.mime.text import MimeText
    from email.mime.multipart import MimeMultipart
    
    msg = MimeMultipart()
    msg['From'] = smtp_config['from']
    msg['To'] = smtp_config['to']
    msg['Subject'] = f"部署通知 - {status.upper()}"
    
    msg.attach(MimeText(message, 'plain'))
    
    server = smtplib.SMTP(smtp_config['host'], smtp_config['port'])
    server.starttls()
    server.login(smtp_config['user'], smtp_config['password'])
    server.send_message(msg)
    server.quit()

def main():
    """主函数"""
    if len(sys.argv) < 3:
        print("Usage: python notification.py <status> <message>")
        sys.exit(1)
    
    status = sys.argv[1]  # success or failure
    message = sys.argv[2]
    
    # Slack通知
    slack_webhook = os.getenv('SLACK_WEBHOOK_URL')
    if slack_webhook:
        try:
            send_slack_notification(slack_webhook, message, status)
            print("✅ Slack通知发送成功")
        except Exception as e:
            print(f"❌ Slack通知发送失败: {e}")
    
    # 邮件通知
    smtp_config = {
        'host': os.getenv('SMTP_HOST'),
        'port': int(os.getenv('SMTP_PORT', 587)),
        'user': os.getenv('SMTP_USER'),
        'password': os.getenv('SMTP_PASSWORD'),
        'from': os.getenv('SMTP_FROM'),
        'to': os.getenv('SMTP_TO')
    }
    
    if all(smtp_config.values()):
        try:
            send_email_notification(smtp_config, message, status)
            print("✅ 邮件通知发送成功")
        except Exception as e:
            print(f"❌ 邮件通知发送失败: {e}")

if __name__ == '__main__':
    main()

🎯 最佳实践

1. 流水线设计原则

  • 快速反馈:尽早发现问题
  • 自动化程度:减少人工干预
  • 可靠性:确保部署的一致性
  • 可观测性:完整的日志和监控

2. 分支策略

  • main分支:生产环境代码,严格保护
  • develop分支:开发集成分支,自动部署到测试环境
  • feature分支:功能开发分支,合并前需要review

3. 测试策略

  • 单元测试:快速反馈,覆盖率>80%
  • 集成测试:验证组件协作
  • 端到端测试:验证完整流程
  • 性能测试:确保系统性能

4. 部署策略选择

  • 蓝绿部署:零停机,快速回滚
  • 滚动部署:资源利用率高
  • 金丝雀部署:风险控制,渐进式发布

🚀 下一步

完成CI/CD流水线设计后,你可以:

  1. 学习监控和日志系统7.3 监控与日志系统
  2. 了解性能调优技巧7.4 性能调优与扩容
  3. 掌握故障排除方法7.5 故障排除与维护

📚 扩展阅读

🏠 返回教程首页 | 📖 查看完整目录 | ▶️ 下一节: 监控与日志