All systems operational

API

api.futurmix.one — run creation, polling, event streams

Operational 99.98%90-day uptime

Agent Runtime

Workflow execution, step orchestration, model routing

Operational 99.98%90-day uptime

Dashboard

Web console, usage analytics, cost attribution

Operational 100.00%90-day uptime

Webhooks

Outbound event delivery to customer endpoints

Operational 100.00%90-day uptime
Operational Degraded performance

Each cell is one UTC day in the 90-day window ending . Degraded-performance minutes are weighted against component uptime by the share of requests affected.

Incident history

What happened, when, and what we changed afterward.

April 2026
Elevated run queue latency Degraded performance API · Agent Runtime Resolved · 38 min
  • Apr 18, 14:02 UTC Investigating Internal SLO monitors alerted on elevated p95 queue wait for new runs. Run creation succeeding, but time-to-first-step rising across regions. Investigating queue infrastructure.
  • Apr 18, 14:11 UTC Identified Cause identified as queue shard hot-spotting: the shard assignment function shipped with the April 9 parallel-execution rollout keyed on workspace ID, so large workspaces fanning out many concurrent branches concentrated load onto two of sixteen shards. p95 queue wait peaked at 3.1s against a 240ms baseline.
  • Apr 18, 14:26 UTC Monitoring Shard assignment rebalanced and temporary enqueue smoothing applied to the affected shards. Queue wait times returning to baseline; monitoring for regression under sustained load.
  • Apr 18, 14:40 UTC Resolved Latency back to baseline across all shards. Total impact 38 minutes of degraded performance. No runs failed and no data was lost; affected runs completed with longer queue waits.
Root cause. The 2.4x parallel-execution rollout changed enqueue patterns: the shard assignment function keyed on workspace ID, so large workspaces fanning out many concurrent branches concentrated load onto a small set of queue shards. Under a period of elevated batch traffic, two shards hot-spotted while fourteen sat under capacity. Shard assignment now keys on workspace plus branch with bounded rebalancing, per-shard depth alarms page before queue wait becomes customer-visible, and throughput-affecting rollouts are load-tested against recorded peak traffic before deployment.
February 2026
Webhook delivery delays Degraded performance Webhooks Resolved · 22 min
  • Feb 3, 09:47 UTC Investigating Webhook delivery queue depth growing; deliveries to customer endpoints delayed. API, Agent Runtime, and Dashboard unaffected — runs completing normally.
  • Feb 3, 09:55 UTC Identified A scheduled certificate rotation on the webhook egress proxy completed without reloading long-lived worker processes, which continued presenting the previous client certificate. Deliveries to endpoints validating the client certificate chain failed TLS handshake and entered retry with exponential backoff.
  • Feb 3, 10:04 UTC Monitoring Rolling restart of egress workers picked up the rotated certificate. Handshakes succeeding; delivery backlog draining at full rate.
  • Feb 3, 10:09 UTC Resolved Backlog fully drained. Total impact 22 minutes of delayed deliveries. No webhook events were lost; all deliveries completed under at-least-once semantics.
Root cause. The certificate rotation procedure on the egress proxy updated certificates on disk but did not signal long-lived worker processes to reload them, leaving workers pinned to a stale client certificate until restart. The rotation pipeline now sends a reload signal to every egress worker and runs a canary delivery against a reference endpoint before the rotation is marked complete.

No other incidents since platform GA in January 2026.

Scheduled maintenance

None planned

The platform deploys continuously without downtime windows. Any maintenance that could affect availability is announced here at least 72 hours in advance.

Subscribe to updates

Incident notifications and maintenance announcements go out by email as updates post.

Subscribe to Updates