LLMOps & Monitoring
Theory
LLMOps extends the MLOps stack with three concerns unique to language model systems.
| Pillar | What it covers | Example tool |
|---|---|---|
| Prompt monitoring | Log traces; track judge score, latency, tokens | Comet ML Opik |
| Guardrails | Block injection/PII in; filter PII/hallucination out | Pydantic validators |
| Continuous training | Drift-triggered retrain → eval → promote | ZenML pipeline |
LLM lifecycle:
Input guardrailblock injection · PII · length
Inferencemodel generates · tokens logged
Output guardrailvalidate format · filter PII
Monitorjudge score · latency · drift alert
CT pipelinecollect traces · fine-tune · promote
CI/CD additions for LLM systems:
- Prompt templates versioned in git; changes require passing an eval suite before deploy
- Model promotion gated on judge score above threshold (4.0 / 5 in LLM Twin)
- Alerts fire below 3.5 avg score (1-hour window) or above p95 latency of 5000ms