Last checked: — · Auto-refresh every 60s · JSON
Brief queue backlog (~38 minutes) caused by a third-party webhook destination hosted on a US-East CDN that was returning 502s for ~4% of requests. Our retry workers were correctly handling individual failures but overall queue throughput dropped while waiting on the slow tier-2 backoff window. Mitigated by raising the worker concurrency limit from 32 to 80 for the duration; permanently fixed by per-destination concurrency in v2.4.0. No deliveries lost.
Pre-announced maintenance window (notification sent 72h in advance to all customers). Read-only mode for 12 minutes during the cut-over. Events submitted during the window were queued in our edge buffer and processed once the primary returned. Total delay for affected events: 11–13 minutes. Total events delayed: 1,847.
Dashboard event-list query slowed from ~80ms to ~6s for a subset of users after the daily index rebuild ran longer than expected. The API itself was unaffected — only the in-browser dashboard. Rolled forward with a missing partial index on (project_id, created_at desc). Surfaced because we don't run the index rebuild on the read replica; we'll fix that next quarter.
Less than 0.4% of POST /v1/events requests returned 503 for ~4 minutes after our ingest worker pool was restarted as part of a routine deploy. Affected requests retried by the SDK succeeded on the retry. Affected requests sent without an SDK should already have been re-submitted by your retry logic. If you don't have retry logic on your POST, that's the lesson — wrap it in a retry.
Let's Encrypt rate limit hit briefly during cert renewal due to too many subdomains being renewed at once. Existing certs valid for another 28 days at the time, no production impact. Now staggered. Including this here for transparency — internal-only, not customer-facing.
Want incident notifications? Subscribe at hello@hookcastle.com or follow @hookcastle. RSS feed: /status/feed.xml.