SOL CoD Phase 4: Gemini 2.5 API 통합 아키텍처

📋 문서 개요

작성일: 2025-09-03
버전: 1.0.0
담당: sol_cod_architect
승인: 대기 중

🎯 목표

Gemini 2.5 API를 SOL CoD 시스템의 4개 전문가 에이전트와 통합
Clean Architecture 원칙을 준수한 LLM Service Layer 설계
LangGraph.js StateGraph와 Gemini API 완벽 통합
Production 보안 및 성능 요구사항 충족

🏗️ 아키텍처 개요

1. 시스템 아키텍처 구조

2. Gemini API 통합 아키텍처

핵심 설계 원칙:

Hexagonal Architecture: Gemini API를 Port & Adapter 패턴으로 통합
Repository Pattern: LLM 호출을 Repository로 추상화
Strategy Pattern: 다양한 LLM Provider 대응 가능
Circuit Breaker: API 장애 시 자동 복구
Rate Limiting: 비용 최적화 및 API 제한 준수

🔧 핵심 컴포넌트 설계

1. LLM Service Layer 아키텍처

// Domain Layer - LLM Service Interface
interface LLMServicePort {
  generateResponse(prompt: string, options: LLMOptions): Promise<LLMResponse>;
  validateResponse(response: LLMResponse): Promise<boolean>;
  getTokenUsage(): TokenUsage;
  isHealthy(): Promise<boolean>;
}

// Infrastructure Layer - Gemini Service Implementation  
@Injectable()
export class GeminiLLMService implements LLMServicePort {
  constructor(
    private readonly geminiClient: GeminiClient,
    private readonly tokenManager: TokenManagerService,
    private readonly rateLimiter: RateLimiterService,
    private readonly circuitBreaker: CircuitBreakerService
  ) {}
}

2. Expert Agent + Gemini 통합 패턴

// BaseExpertAgent에 LLM 통합
export abstract class BaseExpertAgent {
  constructor(
    // 기존 dependencies...
    private readonly llmService: LLMServicePort,
    private readonly promptManager: PromptManagerService
  ) {}

  protected async performAnalysis(preprocessedData: any): Promise<ExpertAnalysis> {
    // 1. 전문가별 프롬프트 생성
    const prompt = await this.generateAnalysisPrompt(preprocessedData);
    
    // 2. Gemini API 호출 (Circuit Breaker + Rate Limiting)
    const llmResponse = await this.llmService.generateResponse(prompt, {
      model: 'gemini-2.5-pro',
      maxTokens: 2048,
      temperature: 0.1,
      expertType: this.agentType
    });
    
    // 3. 응답 구조화 및 검증
    return await this.parseAndValidateResponse(llmResponse);
  }
}

3. LangGraph StateGraph + Gemini 통합

// StateGraph에서 Gemini API 통합
export class SOLCoDStateGraphService {
  private buildStateGraph(): StateGraph {
    return new StateGraphBuilder()
      .addNode('analysis_phase', async (state) => {
        // 4개 Expert Agent가 병렬로 Gemini API 호출
        const analyses = await Promise.all(
          this.expertAgents.map(agent => 
            agent.executeAsNode(state) // 내부적으로 Gemini API 사용
          )
        );
        return { ...state, currentAnalyses: analyses };
      })
      .addNode('debate_phase', async (state) => {
        // Expert Agent 간 토론도 Gemini API 기반
        const debates = await this.conductDebateRound(state);
        return { ...state, debateHistory: debates };
      })
      // ... 기타 노드들
      .build();
  }
}

🔐 보안 및 키 관리 방안

1. API 키 보안 관리

// 환경별 키 관리 전략
interface GeminiConfig {
  apiKey: string;          // Secret Manager에서 로드
  projectId: string;       // GCP Project ID  
  region: string;          // 'europe-west3' (GDPR 준수)
  endpoint?: string;       // Custom endpoint if needed
  retryConfig: RetryConfig;
  circuitBreakerConfig: CircuitBreakerConfig;
}

@Injectable()
export class GeminiConfigService {
  private readonly config: GeminiConfig;
  
  constructor(private readonly secretManager: SecretManagerService) {
    this.config = {
      apiKey: this.secretManager.getSecret('GEMINI_API_KEY'),
      projectId: process.env.GCP_PROJECT_ID,
      region: 'europe-west3',
      retryConfig: { maxRetries: 3, backoffMs: 1000 },
      circuitBreakerConfig: { threshold: 5, timeout: 30000 }
    };
  }
}

2. API 키 로테이션 및 모니터링

// API 키 상태 모니터링
@Injectable()
export class GeminiMonitoringService {
  async checkApiKeyHealth(): Promise<HealthCheckResult> {
    try {
      // 간단한 API 호출로 키 유효성 확인
      await this.geminiService.generateResponse('Health check', {
        maxTokens: 10,
        model: 'gemini-2.5-pro'
      });
      return { healthy: true, lastChecked: new Date() };
    } catch (error) {
      return { 
        healthy: false, 
        error: error.message,
        lastChecked: new Date() 
      };
    }
  }
}

💰 Token 비용 최적화 전략

1. Token Usage 추적 및 관리

@Injectable()
export class TokenManagerService {
  private readonly dailyBudget = 10000; // 일일 토큰 예산
  private readonly userTokenLimits = new Map<string, number>();
  
  async trackTokenUsage(
    userId: string, 
    agentType: ExpertAgentType,
    promptTokens: number,
    completionTokens: number
  ): Promise<void> {
    const totalTokens = promptTokens + completionTokens;
    
    // 사용자별 토큰 사용량 추적
    await this.redisClient.hincrby(`tokens:${userId}:daily`, agentType, totalTokens);
    
    // 전체 시스템 토큰 사용량 추적
    await this.redisClient.hincrby('tokens:system:daily', 'total', totalTokens);
    
    // 비용 계산 (Gemini 2.5 Pro 기준)
    const cost = this.calculateCost(promptTokens, completionTokens);
    await this.redisClient.hincrby('costs:daily', 'gemini', cost);
  }
  
  private calculateCost(promptTokens: number, completionTokens: number): number {
    // Gemini 2.5 Pro 가격 (추정)
    const PROMPT_COST_PER_1K = 0.001;  // $0.001 per 1K tokens
    const COMPLETION_COST_PER_1K = 0.002; // $0.002 per 1K tokens
    
    return (promptTokens / 1000) * PROMPT_COST_PER_1K + 
           (completionTokens / 1000) * COMPLETION_COST_PER_1K;
  }
}

2. 프롬프트 최적화 시스템

// 각 전문가별 최적화된 프롬프트 템플릿
export class ExpertPromptTemplates {
  // Sleep Pattern Analyst용 최적화 프롬프트 (토큰 효율성 고려)
  static readonly SLEEP_ANALYSIS_TEMPLATE: PromptTemplate = {
    id: 'sleep_analysis_v2_optimized',
    template: `Analyze sleep patterns for SOL prediction:
User Data: {{sleepDataSummary}}
Questionnaire: {{questionnaireSummary}}

Provide structured response:
1. SOL Score (0-120 min): [number]
2. Confidence (0.0-1.0): [number]  
3. Key Factors: [max 3 bullet points]
4. Reasoning: [max 100 words]

Focus on sleep latency indicators only.`,
    
    tokenEstimate: 150, // 예상 입력 토큰
    maxOutputTokens: 300,
    version: '2.0'
  };
  
  // CBT-I Behavior Expert용 최적화 프롬프트
  static readonly CBTI_ANALYSIS_TEMPLATE: PromptTemplate = {
    id: 'cbti_analysis_v2_optimized',
    template: `CBT-I behavioral analysis for SOL prediction:
Sleep Behaviors: {{behaviorDataSummary}}
Previous SOL: {{previousSOL}}

Response format:
1. SOL Score: [number]
2. Confidence: [number]
3. CBT-I Factors: [max 3]
4. Behavioral Issues: [max 2]
5. Recommendations: [max 2]

Be concise, focus on behavioral patterns.`,
    
    tokenEstimate: 120,
    maxOutputTokens: 250,
    version: '2.0'
  };
}

🚦 Rate Limiting 및 Error Handling

1. 계층화된 Rate Limiting 전략

@Injectable()
export class RateLimiterService {
  private readonly limits = {
    perUser: { requests: 100, windowMs: 3600000 }, // 시간당 100회
    perAgent: { requests: 50, windowMs: 3600000 },  // 에이전트당 시간당 50회
    system: { requests: 1000, windowMs: 3600000 }   // 시스템 전체 시간당 1000회
  };
  
  async checkRateLimit(
    userId: string, 
    agentType: ExpertAgentType
  ): Promise<RateLimitResult> {
    const userKey = `rate_limit:user:${userId}`;
    const agentKey = `rate_limit:agent:${agentType}`;
    const systemKey = 'rate_limit:system';
    
    const [userCount, agentCount, systemCount] = await Promise.all([
      this.redisClient.incr(userKey),
      this.redisClient.incr(agentKey),
      this.redisClient.incr(systemKey)
    ]);
    
    // TTL 설정 (첫 요청 시)
    if (userCount === 1) await this.redisClient.expire(userKey, 3600);
    if (agentCount === 1) await this.redisClient.expire(agentKey, 3600);
    if (systemCount === 1) await this.redisClient.expire(systemKey, 3600);
    
    return {
      allowed: userCount <= this.limits.perUser.requests &&
               agentCount <= this.limits.perAgent.requests &&
               systemCount <= this.limits.system.requests,
      remainingRequests: Math.min(
        this.limits.perUser.requests - userCount,
        this.limits.perAgent.requests - agentCount,
        this.limits.system.requests - systemCount
      )
    };
  }
}

2. Circuit Breaker 패턴 구현

@Injectable()
export class CircuitBreakerService {
  private readonly breakers = new Map<string, CircuitBreaker>();
  
  async executeWithCircuitBreaker<T>(
    serviceName: string,
    operation: () => Promise<T>
  ): Promise<T> {
    const breaker = this.getOrCreateBreaker(serviceName);
    
    if (breaker.state === 'OPEN') {
      const timeSinceLastFailure = Date.now() - breaker.lastFailureTime;
      if (timeSinceLastFailure < breaker.timeout) {
        throw new ServiceUnavailableException(
          `Circuit breaker is OPEN for ${serviceName}`
        );
      } else {
        breaker.state = 'HALF_OPEN';
      }
    }
    
    try {
      const result = await operation();
      
      if (breaker.state === 'HALF_OPEN') {
        breaker.state = 'CLOSED';
        breaker.failureCount = 0;
      }
      
      return result;
    } catch (error) {
      breaker.failureCount++;
      breaker.lastFailureTime = Date.now();
      
      if (breaker.failureCount >= breaker.threshold) {
        breaker.state = 'OPEN';
      }
      
      throw error;
    }
  }
}

📊 프롬프트 최적화 및 성능 튜닝

1. 4개 전문가별 프롬프트 전략

// 1. Sleep Pattern Analyst - 수면 패턴 분석 전문가
export class SleepPatternAnalystAgent extends BaseExpertAgent {
  protected initializePromptTemplates(): void {
    this.registerPromptTemplate({
      id: 'sleep_initial_analysis',
      template: `As a sleep pattern specialist, analyze the following data for SOL prediction:

**Sleep Metrics (Last 7 days)**:
- Average bedtime: {{avgBedtime}}
- Average sleep onset: {{avgSleepOnset}}  
- Sleep efficiency: {{sleepEfficiency}}%
- Wake episodes: {{avgWakeEpisodes}}

**Current Factors**:
{{sleepDataSummary}}

**Task**: Predict tonight's Sleep Onset Latency (SOL) in minutes.

**Response Format** (JSON):
{
  "solScore": number (0-120),
  "confidenceScore": number (0.0-1.0),
  "keyFactors": [string, string, string],
  "analysis": "Brief explanation focusing on sleep patterns (max 50 words)",
  "riskFactors": [string, string]
}

Be precise and data-driven.`,
      
      tokenEstimate: 200,
      maxOutputTokens: 300,
      version: '2.1'
    });
  }
}

// 2. Psychological State Analyst - 심리 상태 분석 전문가  
export class PsychologicalStateAnalystAgent extends BaseExpertAgent {
  protected initializePromptTemplates(): void {
    this.registerPromptTemplate({
      id: 'psychological_analysis',
      template: `As a sleep psychology expert, assess psychological factors affecting SOL:

**Mental Health Indicators**:
- PHQ-9 Score: {{phq9Score}} (Depression screening)
- GAD-7 Score: {{gad7Score}} (Anxiety screening)  
- Stress Level (1-10): {{stressLevel}}
- Sleep Anxiety: {{sleepAnxiety}}

**Behavioral Patterns**:
{{psychologicalDataSummary}}

**Previous SOL**: {{previousSOL}} minutes

Predict SOL considering psychological state:

**Response Format** (JSON):
{
  "solScore": number,
  "confidenceScore": number,
  "keyFactors": ["psychological factor 1", "factor 2", "factor 3"],
  "analysis": "Psychology-focused explanation (max 50 words)",  
  "interventions": ["suggestion 1", "suggestion 2"]
}

Focus on mind-sleep connection.`,
      
      tokenEstimate: 180,
      maxOutputTokens: 280,
      version: '2.1'
    });
  }
}

// 3. CBT-I Sleep Behavior Expert - 인지행동치료 전문가
export class CBTISleepBehaviorExpertAgent extends BaseExpertAgent {
  protected initializePromptTemplates(): void {
    this.registerPromptTemplate({
      id: 'cbti_behavior_analysis',
      template: `As a CBT-I specialist, evaluate sleep behaviors impacting SOL:

**Sleep Hygiene Behaviors**:
- Pre-sleep routine: {{preSleepRoutine}}
- Screen time before bed: {{screenTime}} minutes
- Caffeine intake: {{caffeineIntake}}
- Exercise timing: {{exerciseTiming}}

**CBT-I Compliance**:
- Sleep restriction adherence: {{sleepRestriction}}%
- Stimulus control: {{stimulusControl}}%
- Sleep diary completion: {{sleepDiaryRate}}%

**Problematic Behaviors**: {{behavioralIssues}}

Predict SOL from CBT-I perspective:

**Response Format** (JSON):
{
  "solScore": number,
  "confidenceScore": number,
  "keyFactors": ["behavior 1", "behavior 2", "behavior 3"],
  "analysis": "CBT-I focused explanation (max 50 words)",
  "behavioralTargets": ["target 1", "target 2"],
  "complianceScore": number (0.0-1.0)
}

Emphasize behavioral modifications.`,
      
      tokenEstimate: 220,
      maxOutputTokens: 320,
      version: '2.1'
    });
  }
}

// 4. Digital Sleep Environment Expert - 디지털 환경 전문가
export class DigitalSleepEnvironmentExpertAgent extends BaseExpertAgent {
  protected initializePromptTemplates(): void {
    this.registerPromptTemplate({
      id: 'digital_environment_analysis',
      template: `As a digital sleep environment specialist, analyze tech impact on SOL:

**Sleep Environment**:
- Room temperature: {{roomTemp}}°C
- Light exposure: {{lightLevel}} lux  
- Noise level: {{noiseLevel}} dB
- Air quality index: {{airQuality}}

**Digital Factors**:  
- Device usage pattern: {{deviceUsage}}
- Blue light exposure: {{blueLightHours}} hours/day
- Sleep app engagement: {{appEngagement}}%
- Smart device interference: {{smartDevices}}

**Environmental Data**: {{environmentalSummary}}

Predict SOL considering digital environment:

**Response Format** (JSON):
{
  "solScore": number,
  "confidenceScore": number,
  "keyFactors": ["env factor 1", "factor 2", "factor 3"],
  "analysis": "Environment-focused explanation (max 50 words)",
  "environmentalRisks": ["risk 1", "risk 2"],
  "optimizationScore": number (0.0-1.0)
}

Focus on environmental optimization.`,
      
      tokenEstimate: 200,
      maxOutputTokens: 300,
      version: '2.1'
    });
  }
}

2. Chain of Debate 최적화 프롬프트

// 토론 단계에서 사용하는 최적화된 프롬프트
export class DebatePromptTemplates {
  static readonly DEBATE_ROUND_TEMPLATE = `**Chain of Debate - Round {{roundNumber}}**

**My Analysis**: SOL {{mySOL}}min (confidence: {{myConfidence}})  
**Other Expert Opinions**:
{{otherAnalyses}}

**Disagreement Points**:
{{disagreements}}

As {{agentType}}, defend or revise your prediction:

**Response Format** (JSON):
{
  "revisedSOL": number (if changed),
  "revisedConfidence": number (if changed), 
  "counterArguments": ["argument 1", "argument 2"],
  "supportingEvidence": ["evidence 1", "evidence 2"],
  "finalStance": "maintain/revise",
  "reasoning": "Brief explanation (max 40 words)"
}

Be concise but convincing.`;

  static readonly CONSENSUS_BUILDING_TEMPLATE = `**Consensus Building Phase**

**Expert Predictions**:
{{expertPredictions}}

**Debate History**:
{{debateHistory}}

**Convergence Analysis**:
- Standard deviation: {{standardDeviation}}
- Agreement level: {{agreementLevel}}%

Build final consensus:

**Response Format** (JSON):
{
  "finalSOLScore": number,
  "consensusConfidence": number,  
  "consensusFactors": ["factor 1", "factor 2", "factor 3"],
  "expertAgreement": number (0.0-1.0),
  "reasoning": "Consensus explanation (max 60 words)",
  "uncertaintyAreas": ["area 1", "area 2"]
}`;
}

🔍 성능 모니터링 및 최적화

1. Gemini API 성능 모니터링

@Injectable()
export class GeminiPerformanceMonitor {
  private readonly metrics = {
    totalRequests: 0,
    totalTokens: 0,
    totalCost: 0,
    averageLatency: 0,
    errorRate: 0,
    expertAgentUsage: new Map<ExpertAgentType, number>()
  };
  
  async recordAPICall(
    agentType: ExpertAgentType,
    promptTokens: number,
    completionTokens: number,
    latencyMs: number,
    success: boolean
  ): Promise<void> {
    this.metrics.totalRequests++;
    this.metrics.totalTokens += promptTokens + completionTokens;
    
    // Expert Agent별 사용량 추적
    const currentUsage = this.metrics.expertAgentUsage.get(agentType) || 0;
    this.metrics.expertAgentUsage.set(agentType, currentUsage + 1);
    
    // 평균 레이턴시 업데이트
    this.metrics.averageLatency = 
      (this.metrics.averageLatency * (this.metrics.totalRequests - 1) + latencyMs) / 
      this.metrics.totalRequests;
    
    // 비용 추적
    this.metrics.totalCost += this.calculateCost(promptTokens, completionTokens);
    
    // 에러율 업데이트
    if (!success) {
      this.metrics.errorRate = (this.metrics.errorRate + 1) / this.metrics.totalRequests;
    }
    
    // Redis에 메트릭 저장
    await this.saveMetricsToRedis();
  }
  
  async getPerformanceReport(): Promise<PerformanceReport> {
    return {
      ...this.metrics,
      costPerPrediction: this.metrics.totalCost / this.metrics.totalRequests,
      tokensPerPrediction: this.metrics.totalTokens / this.metrics.totalRequests,
      expertEfficiency: Array.from(this.metrics.expertAgentUsage.entries())
        .map(([agent, usage]) => ({ agent, usage, efficiency: usage / this.metrics.totalRequests }))
    };
  }
}

2. 프롬프트 A/B 테스팅 시스템

@Injectable()
export class PromptOptimizationService {
  private readonly activeTests = new Map<string, PromptABTest>();
  
  async createABTest(
    agentType: ExpertAgentType,
    promptVariants: PromptTemplate[]
  ): Promise<string> {
    const testId = `ab_test_${agentType}_${Date.now()}`;
    
    this.activeTests.set(testId, {
      id: testId,
      agentType,
      variants: promptVariants,
      results: new Map(),
      startTime: Date.now(),
      sampleSize: 100 // 100개 예측 후 분석
    });
    
    return testId;
  }
  
  async selectPromptVariant(testId: string): Promise<PromptTemplate> {
    const test = this.activeTests.get(testId);
    if (!test) throw new Error(`AB test not found: ${testId}`);
    
    // Round-robin 또는 확률적 선택
    const variantIndex = Math.floor(Math.random() * test.variants.length);
    return test.variants[variantIndex];
  }
  
  async recordTestResult(
    testId: string,
    variantId: string,
    accuracy: number,
    tokenUsage: number,
    latency: number
  ): Promise<void> {
    const test = this.activeTests.get(testId);
    if (!test) return;
    
    const results = test.results.get(variantId) || [];
    results.push({ accuracy, tokenUsage, latency, timestamp: Date.now() });
    test.results.set(variantId, results);
    
    // 충분한 샘플 수집 시 통계 분석
    if (results.length >= test.sampleSize) {
      await this.analyzeABTestResults(testId);
    }
  }
}

🌐 Production 배포 및 운영

1. 환경별 설정 관리

// production.config.ts
export const productionGeminiConfig: GeminiConfig = {
  apiKey: '${SECRET_MANAGER}', // Secret Manager에서 로드
  projectId: 'dta-wide-prod',
  region: 'europe-west3',
  endpoint: 'https://generativelanguage.googleapis.com',
  
  rateLimits: {
    requestsPerMinute: 60,
    tokensPerMinute: 150000,
    dailyBudget: 1000000 // 일일 100만 토큰 한도
  },
  
  circuitBreaker: {
    failureThreshold: 5,
    resetTimeout: 30000,
    monitoringWindow: 60000
  },
  
  retry: {
    maxRetries: 3,
    backoffStrategy: 'exponential',
    initialDelayMs: 1000
  }
};

// development.config.ts  
export const developmentGeminiConfig: GeminiConfig = {
  apiKey: 'AIzaSyCnAGVcrSvfy4UlteJV39cDOIF7sK7ki98',
  projectId: 'dta-wide-dev',
  region: 'europe-west3',
  
  rateLimits: {
    requestsPerMinute: 30,
    tokensPerMinute: 50000,
    dailyBudget: 100000 // 일일 10만 토큰 한도
  }
};

2. 모니터링 및 알람 시스템

@Injectable()
export class GeminiAlertingService {
  private readonly alertThresholds = {
    errorRate: 0.05, // 5% 에러율 초과 시 알람
    latency: 10000,  // 10초 초과 시 알람  
    dailyCost: 100,  // 일일 100달러 초과 시 알람
    tokenUsage: 0.8  // 일일 한도의 80% 사용 시 알람
  };
  
  async checkAlertConditions(): Promise<void> {
    const metrics = await this.performanceMonitor.getPerformanceReport();
    
    // 에러율 체크
    if (metrics.errorRate > this.alertThresholds.errorRate) {
      await this.sendAlert('HIGH_ERROR_RATE', {
        currentRate: metrics.errorRate,
        threshold: this.alertThresholds.errorRate
      });
    }
    
    // 레이턴시 체크
    if (metrics.averageLatency > this.alertThresholds.latency) {
      await this.sendAlert('HIGH_LATENCY', {
        currentLatency: metrics.averageLatency,
        threshold: this.alertThresholds.latency
      });
    }
    
    // 비용 체크
    if (metrics.totalCost > this.alertThresholds.dailyCost) {
      await this.sendAlert('COST_THRESHOLD_EXCEEDED', {
        currentCost: metrics.totalCost,
        threshold: this.alertThresholds.dailyCost
      });
    }
  }
  
  private async sendAlert(type: string, data: any): Promise<void> {
    // Slack, Email, 또는 GCP 알람으로 전송
    await this.notificationService.send({
      type: 'GEMINI_API_ALERT',
      priority: 'HIGH',
      message: `Gemini API Alert: ${type}`,
      data
    });
  }
}

📈 성능 목표 및 검증

1. 성능 목표 설정

export const GeminiPerformanceTargets = {
  // 응답 시간 목표
  maxLatencyMs: 8000,        // 8초 이내 응답
  averageLatencyMs: 3000,    // 평균 3초 응답
  
  // 정확도 목표  
  minAccuracy: 0.80,         // 80% 이상 정확도
  targetAccuracy: 0.85,      // 목표 85% 정확도
  
  // 가용성 목표
  uptime: 0.999,             // 99.9% 가동률
  maxErrorRate: 0.01,        // 1% 이하 에러율
  
  // 비용 목표
  maxCostPerPrediction: 0.05, // 예측당 5센트 이하
  dailyBudgetLimit: 500,      // 일일 500달러 이하
  
  // 토큰 효율성
  maxTokensPerPrediction: 2000, // 예측당 최대 2000토큰
  targetTokensPerPrediction: 1200 // 목표 1200토큰
};

2. 성능 검증 테스트

@Injectable()
export class GeminiPerformanceValidator {
  async validatePerformanceTargets(): Promise<ValidationReport> {
    const metrics = await this.performanceMonitor.getPerformanceReport();
    
    return {
      latencyCheck: {
        target: GeminiPerformanceTargets.averageLatencyMs,
        actual: metrics.averageLatency,
        passed: metrics.averageLatency <= GeminiPerformanceTargets.averageLatencyMs
      },
      
      accuracyCheck: {
        target: GeminiPerformanceTargets.minAccuracy,
        actual: await this.calculateAccuracy(),
        passed: await this.calculateAccuracy() >= GeminiPerformanceTargets.minAccuracy
      },
      
      costCheck: {
        target: GeminiPerformanceTargets.maxCostPerPrediction,
        actual: metrics.costPerPrediction,
        passed: metrics.costPerPrediction <= GeminiPerformanceTargets.maxCostPerPrediction
      },
      
      tokenEfficiencyCheck: {
        target: GeminiPerformanceTargets.targetTokensPerPrediction,
        actual: metrics.tokensPerPrediction,
        passed: metrics.tokensPerPrediction <= GeminiPerformanceTargets.maxTokensPerPrediction
      }
    };
  }
}

🔄 마이그레이션 및 롤아웃 전략

1. 단계적 롤아웃 계획

export class GeminiMigrationPlan {
  static readonly ROLLOUT_PHASES = [
    {
      phase: 1,
      name: 'Development Testing',
      percentage: 0,
      duration: '1 week',
      criteria: 'Internal testing with synthetic data'
    },
    {
      phase: 2, 
      name: 'Canary Release',
      percentage: 5,
      duration: '1 week', 
      criteria: '5% of production traffic'
    },
    {
      phase: 3,
      name: 'Limited Production',
      percentage: 25,
      duration: '2 weeks',
      criteria: '25% traffic with performance monitoring'  
    },
    {
      phase: 4,
      name: 'Full Production',
      percentage: 100,
      duration: 'Ongoing',
      criteria: 'All traffic after validation'
    }
  ];
}

@Injectable()
export class GeminiRolloutService {
  async getCurrentRolloutPhase(): Promise<number> {
    const configValue = await this.configService.get('GEMINI_ROLLOUT_PERCENTAGE');
    return parseInt(configValue, 10) || 0;
  }
  
  async shouldUseGemini(userId: string): Promise<boolean> {
    const rolloutPercentage = await this.getCurrentRolloutPhase();
    
    if (rolloutPercentage === 0) return false; // 아직 비활성화
    if (rolloutPercentage === 100) return true; // 전체 활성화
    
    // 사용자 ID 기반 일관된 할당
    const userHash = this.hashUserId(userId);
    return userHash % 100 < rolloutPercentage;
  }
}

📋 구현 체크리스트

Phase 4.1: 기본 인프라 구축 (1주)

GeminiLLMService 구현 (Port & Adapter 패턴)
TokenManagerService 구현 (비용 추적)
RateLimiterService 구현 (API 제한)
CircuitBreakerService 구현 (장애 복구)
환경별 설정 관리 시스템

Phase 4.2: Expert Agent 통합 (2주)

BaseExpertAgent + Gemini 통합
4개 전문가별 최적화 프롬프트 구현
Chain of Debate 프롬프트 시스템
응답 파싱 및 검증 로직
LangGraph StateGraph + Gemini 통합

Phase 4.3: 성능 최적화 (1주)

프롬프트 A/B 테스팅 시스템
성능 모니터링 대시보드
알림 및 알람 시스템
비용 최적화 자동화
토큰 사용량 최적화

Phase 4.4: 프로덕션 준비 (1주)

🎯 성공 기준

기술적 성공 기준

응답 시간: 평균 3초 이내, 최대 8초 이내
정확도: 80% 이상 SOL 예측 정확도 달성
가용성: 99.9% 이상 서비스 가동률
비용 효율성: 예측당 5센트 이하 비용

품질 속성 달성

Clean Architecture 준수: 100% 의존성 역전 원칙 적용
보안: API 키 보안 관리 및 암호화 통신
확장성: 일일 10,000회 예측 처리 가능
모니터링: 실시간 성능 추적 및 알림

비즈니스 성공 기준

의료진 만족도: 4.5/5.0 이상 예측 품질 평가
시스템 안정성: 월간 99.5% 이상 정상 서비스
비용 통제: 월간 예산 내 운영 비용 유지
규정 준수: GDPR 및 의료정보보호법 100% 준수

작성자: sol_cod_architect
검토 요청: @sol_cod_pm
최종 업데이트: 2025-09-03

📋 문서 개요​

🎯 목표​

🏗️ 아키텍처 개요​

1. 시스템 아키텍처 구조​

2. Gemini API 통합 아키텍처​

🔧 핵심 컴포넌트 설계​

1. LLM Service Layer 아키텍처​

2. Expert Agent + Gemini 통합 패턴​

3. LangGraph StateGraph + Gemini 통합​

🔐 보안 및 키 관리 방안​

1. API 키 보안 관리​

2. API 키 로테이션 및 모니터링​

💰 Token 비용 최적화 전략​

1. Token Usage 추적 및 관리​

2. 프롬프트 최적화 시스템​

🚦 Rate Limiting 및 Error Handling​

1. 계층화된 Rate Limiting 전략​

2. Circuit Breaker 패턴 구현​

📊 프롬프트 최적화 및 성능 튜닝​

1. 4개 전문가별 프롬프트 전략​

2. Chain of Debate 최적화 프롬프트​

🔍 성능 모니터링 및 최적화​

1. Gemini API 성능 모니터링​

2. 프롬프트 A/B 테스팅 시스템​

🌐 Production 배포 및 운영​

1. 환경별 설정 관리​

2. 모니터링 및 알람 시스템​

📈 성능 목표 및 검증​

1. 성능 목표 설정​

2. 성능 검증 테스트​

🔄 마이그레이션 및 롤아웃 전략​

1. 단계적 롤아웃 계획​

📋 구현 체크리스트​

Phase 4.1: 기본 인프라 구축 (1주)​

Phase 4.2: Expert Agent 통합 (2주)​

Phase 4.3: 성능 최적화 (1주)​

Phase 4.4: 프로덕션 준비 (1주)​

🎯 성공 기준​

기술적 성공 기준​

품질 속성 달성​

비즈니스 성공 기준​

📋 문서 개요

🎯 목표

🏗️ 아키텍처 개요

1. 시스템 아키텍처 구조

2. Gemini API 통합 아키텍처

🔧 핵심 컴포넌트 설계

1. LLM Service Layer 아키텍처

2. Expert Agent + Gemini 통합 패턴

3. LangGraph StateGraph + Gemini 통합

🔐 보안 및 키 관리 방안

1. API 키 보안 관리

2. API 키 로테이션 및 모니터링

💰 Token 비용 최적화 전략

1. Token Usage 추적 및 관리

2. 프롬프트 최적화 시스템

🚦 Rate Limiting 및 Error Handling

1. 계층화된 Rate Limiting 전략

2. Circuit Breaker 패턴 구현

📊 프롬프트 최적화 및 성능 튜닝

1. 4개 전문가별 프롬프트 전략

2. Chain of Debate 최적화 프롬프트

🔍 성능 모니터링 및 최적화

1. Gemini API 성능 모니터링

2. 프롬프트 A/B 테스팅 시스템

🌐 Production 배포 및 운영

1. 환경별 설정 관리

2. 모니터링 및 알람 시스템

📈 성능 목표 및 검증

1. 성능 목표 설정

2. 성능 검증 테스트

🔄 마이그레이션 및 롤아웃 전략

1. 단계적 롤아웃 계획

📋 구현 체크리스트

Phase 4.1: 기본 인프라 구축 (1주)

Phase 4.2: Expert Agent 통합 (2주)

Phase 4.3: 성능 최적화 (1주)

Phase 4.4: 프로덕션 준비 (1주)

🎯 성공 기준

기술적 성공 기준

품질 속성 달성

비즈니스 성공 기준