Observability Stack¶
TL;DR — Logz.io is the single platform for logs, errors, and alerts. Correlation IDs (injected via an internal shared NestJS package) let you trace a request across microservices. Alerts are delivered to Slack.
Overview¶
Observability at Securitize is concentrated in one tool (Logz.io) combined with a shared library pattern that propagates correlation IDs across service calls. The goal is that every production error can be traced back from symptom → originating request.
Components¶
Logz.io¶
- What it covers: monitoring, error/log management, alert configuration.
- URL: see databases-and-services.md.
- Alert delivery: certain error types (e.g., failing events) are configured to notify Slack channels.
Correlation IDs¶
- Propagated via an internal NestJS shared package (part of
nestjs-shared— see shared-libraries.md). - Every inbound request gets or generates a correlation ID.
- All outbound service calls forward the correlation ID in headers.
- Logz.io queries can filter by correlation ID to reconstruct full request traces across microservices.
Slack alerts¶
- Certain high-signal errors (e.g., failing events, pipeline failures) post to Slack.
- Channel routing is configured per alert in Logz.io.
Usage patterns¶
Debugging a user-reported issue
- Ask the user (or find in logs) a correlation ID or approximate timestamp.
- In Logz.io, filter by correlation ID or time window.
- Follow the trace across services — Logz.io shows the chain from entry point to failure.
- Cross-reference with the service deploy (ECR commit hash — see jenkins-k8s-jobs.md) to confirm which code is running.
Current state notes¶
- Correlation ID emission varies across services; some legacy Express services do not use the shared package.
- Alert configuration lives inside Logz.io (not backed by IaC).
- Formal SLO/SLI documentation is not yet in place.
See also¶
- Databases & External Services — Logz.io URL and account details.
- Shared Libraries —
nestjs-sharedhosts the correlation-ID middleware. - Rollback & Incidents — How alerts trigger incident response.