"summary": "Runbook for detecting, triaging, and mitigating failures that originate in third-party model dependencies (hosted models, external model APIs, or partner model endpoints). Focuses on ...
When something breaks in production, the evidence is scattered across logs, metrics, traces, runbooks, and Slack threads. OpenSRE is an open-source framework for AI SRE agents that resolve production ...