IncidentFox is an open-source AI SRE platform that automatically investigates production incidents end-to-end. Part of YC W2026, it was founded by Chiehmin (Jimmy) Wei (ex-Roblox, ex-Meta FAIR) and Long Yi (ex-Roblox), both with experience building distributed systems serving millions of users.
When an alert fires, IncidentFox kicks off an investigation within Slack threads — querying logs, checking pod status, correlating with recent deployments — and delivers root cause analysis with executable fix scripts. The platform ships with 300+ prebuilt integrations covering Kubernetes, AWS, Grafana, Prometheus, Datadog, Elasticsearch, PagerDuty, and GitHub. It auto-discovers each team's stack and generates needed integrations, reducing setup from months to under a day.
The system uses multi-agent orchestration routing specialist agents to sub-problems, intelligent log sampling (statistical analysis before targeted fetching), and 3-layer alert correlation (temporal, topology, semantic) that reduces alert noise by 85-95%. It supports 24+ LLM providers and can be deployed as SaaS, on-prem/VPC, or fully self-hosted. The core is Apache 2.0 licensed with full feature parity on the free tier.
Free trial available
SRE and DevOps teams
IncidentFox investigates production incidents across infrastructure, while Respan monitors AI/LLM-specific issues. Together they provide comprehensive incident response covering both traditional infrastructure and AI application layers.
Top companies in Engineering Analytics you can use instead of IncidentFox.
Companies from adjacent layers in the AI stack that work well with IncidentFox.
Last verified: March 27, 2026