SRE Intelligence Platform

Know Before You Deploy.
Recover Before It Hurts.

Strake tells you whether it's safe to push right now — based on system health, error budget burn, open incidents, and deploy velocity. When things break anyway, your runbooks are connected to alerts, your team knows what to do, and the knowledge doesn't live only in whoever has been here longest.

Teams are running more infrastructure with fewer SREs than at any point in the last decade. The tools haven't kept up.

strake / deploy-gate
Current conditions:
feat/checkout-v2 → main
payment-service · 47 files changed · prod-us-east-1
prod-us-east-1
Strake Recommendation
Hold Deploy

Error budget at 18% remaining with SLO window closing in 72 hours. 2 active incidents on service dependencies. 3 deploys in the last 2 hours. Risk of cascading failure is elevated.

Error Budget18%
System HealthDegraded
Open Incidents2 active
Deploy Velocity3 / 2hr
Active Runbooks2 triggered
RB-14postgres-high-connection-countLive
RB-07payment-service-5xx-spikeMonitoring
Updated just now
01
How Strake Compounds

The system gets smarter
with every incident.

This is not two disconnected products. Deploy Gate and Runbooks feed each other — and feed back into a risk model that improves over time. That's the flywheel.

// 01
Deploy Gate evaluates the push

Before every deploy, Strake reads system health, error budget, incident state, and change velocity. It gives a GO or HOLD recommendation — with reasoning.

// 02
Incident fires — runbook activates

When an alert triggers, the connected runbook opens. Steps are tracked. The engineer works through it — what they found, what they decided, what resolved it.

// 03
Postmortem closes the loop

Every incident record becomes a data point — what the deploy context was, which runbook ran, how long resolution took, whether the gate called it right or wrong.

// 04
Risk model gets smarter

Strake uses the accumulated incident history to refine what signals actually predict failures in your specific system. After 12 months, you have a model no competitor can replicate.

Every deploy Strake evaluates makes the next risk signal more accurate. Every incident it runs makes the next runbook faster. The data moat compounds — and it's built from your own production history, not a generic model.

02
Deploy Gate

What Strake reads
before you push.

Every deploy decision is based on four live signals pulled from your existing stack. No new agents. No new dashboards. Strake reads what's already there and tells you what it means right now.

Signal 01 — SLO Budget

How much error budget is left in the current window, and how fast it's burning. Strake flags when a deploy risks exhausting the rest of it before the window closes.

Signal 02 — Active Incidents

Open incidents across the service and its direct dependencies. Deploying into an active incident almost always makes things worse and root cause harder to find.

Signal 03 — Change Velocity

How many deploys have gone out in the last few hours. High velocity makes root cause isolation nearly impossible when something breaks.

Signal 04 — System Health

Current health of the target service and its dependency graph — latency, error rates, resource saturation. The baseline you're deploying into matters.

strake / deploy-gate · all servicesupdated 12s ago
Service
Budget
Incidents
Velocity
Health
Gate
payment-service
18%
2
3/2hr
Degraded
HOLD
api-gateway
84%
0
1/2hr
Nominal
GO
checkout-svc
41%
1
2/2hr
Elevated
HOLD
user-service
92%
0
0/2hr
Nominal
GO
notification-svc
6%
1
5/2hr
Critical
HOLD
search-indexer
77%
0
1/2hr
Nominal
GO
Services monitored
6
Currently blocked
3
Clear to deploy
3

Every incident Strake runs makes the next deploy decision smarter. After 12 months, your deployment and incident history is a risk model trained on your specific system — a model no competitor can replicate, because they don't have your data.

// The Moat
RB-14postgres-high-connection-count
LIVE
Alert
postgres.conn > 90%
Service
postgres-primary
Connections
100 / 100
On-call
@mwhelan
Steps4 / 6 complete
01
Verify alert is not spurious
✓ Confirmed — connections at 100/100, p99 latency 1840ms
02:14:18
02
Check for long-running queries
✓ Found 14 queries > 30s — all from payment-service v2.1.7
02:14:31
03
Verify pgBouncer pool configuration
✓ pool_size: 100 · max_client_conn: 100 · pool_mode: transaction
02:14:47
04
Correlate with recent deploys
✓ payment-service v2.1.7 deployed 02:13:44 — N+1 query pattern introduced
02:15:02
05
Resize pool or initiate rollback
Decision required — pool resize (faster) vs rollback v2.1.7 (safer). See notes for tradeoffs.
In progress
06
Verify recovery and close incident
Monitor error rate for 10 min · update incident record · add postmortem note
This runbook has run 7 times · last updated 2 days ago by @mwhelanView incident history →
03
Runbook Engine

The Notion page
from 2021 is not
a runbook.

Strake connects runbooks directly to the alerts that trigger them. When PagerDuty fires, the right runbook opens — not a search bar, not a Notion space, not a Slack message asking who knows what to do.

Steps are tracked. What the engineer found, what they decided. Every time the runbook runs, the record gets richer. The next engineer who gets paged starts from that, not from zero.

RB-14 · Incident History
Date
Resolution
Time
Jan 15
Pool resize · resolved
22min
Dec 28
Rollback v2.0.4 · resolved
41min
Dec 09
Query kill + pool flush
18min
Nov 22
Escalated · manual DBA
94min
// What this means

The Nov 22 incident took 94 minutes. The last three averaged 27 minutes. That's the runbook getting smarter — and it's the clearest signal of what Strake actually does.

Built for the stack you already run
datadogDatadog
pagerdutyPagerDuty
gh-actionsGitHub Actions
prometheusPrometheus
grafanaGrafana
kubernetesKubernetes
opsgenieOpsGenie
slackSlack

+ CloudWatch · Terraform · Loki · Confluence · Notion · GCP Cloud Run · AWS ECS · and more

The runbook for our database failover lived in a Confluence page that hadn't been opened in 14 months. Found that out at 3am on a Sunday.

Senior SRE · Series B fintech

We had a deploy gate. It was a Slack message: "anyone know if it's okay to push right now?" Someone always said yes. Then we'd find out.

Staff Engineer · Infrastructure team

The tribal knowledge problem is real. Three incidents in the last year that came down to one person not knowing what another person knew.

VP Engineering · 80-person startup

Built for the engineer
who gets the page.

Strake is in private beta with a small cohort of senior engineers and SRE leads. If you're on-call and your MTTR isn't where it needs to be, we want to talk.

6 of 40 spots remaining
Private Beta · Controlled Access
40 teams · by application only
Apply for AccessBook a DemoNo sales script. No demo theater.
A real conversation with the team.
Enter App →