Monitoring¶
Monitor PolyBot with Prometheus and Grafana.
Quick Setup¶
Access: - Prometheus: http://localhost:9090 - Grafana: http://localhost:3000 (admin/polybot123)
Metrics¶
PolyBot exposes Prometheus metrics at /metrics:
Request Metrics¶
# HTTP request duration
polybot_http_request_duration_seconds{method, endpoint, status}
# HTTP request count
polybot_http_requests_total{method, endpoint, status}
Trading Metrics¶
# Orders placed
polybot_orders_total{strategy, side, status}
# Position value
polybot_position_value_usd{strategy, market}
# P&L
polybot_pnl_total{strategy}
polybot_pnl_unrealized{strategy}
Strategy Metrics¶
# Signals generated
polybot_signals_total{strategy, action}
# Scans performed
polybot_scans_total{strategy}
# Strategy running status
polybot_strategy_running{strategy}
System Metrics¶
# Active WebSocket connections
polybot_websocket_connections
# NNG message queue depth
polybot_nng_queue_depth{channel}
Prometheus Configuration¶
deploy/prometheus/prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'polybot'
static_configs:
- targets: ['polybot:8000']
metrics_path: '/metrics'
scrape_interval: 10s
Grafana Dashboards¶
Import Dashboard¶
- Open Grafana (http://localhost:3000)
- Go to Dashboards → Import
- Upload
deploy/grafana/dashboards/polybot.json
Key Panels¶
- P&L Overview: Total and per-strategy P&L
- Order Activity: Orders by status and strategy
- Position Summary: Open positions and exposure
- Signal Rate: Signals per minute by strategy
- API Performance: Request latency and error rates
Alerting¶
Prometheus Alerts¶
# deploy/prometheus/alerts.yml
groups:
- name: polybot
rules:
- alert: HighErrorRate
expr: rate(polybot_http_requests_total{status="5xx"}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: High error rate detected
- alert: DailyLossLimit
expr: polybot_pnl_total < -500
for: 1m
labels:
severity: critical
annotations:
summary: Daily loss limit approaching
Grafana Alerts¶
Configure alerts in Grafana: 1. Edit panel → Alert tab 2. Set conditions 3. Configure notification channels
Health Checks¶
Endpoints¶
# Liveness (is the process running?)
curl http://localhost:8000/health/live
# Readiness (is the service ready for traffic?)
curl http://localhost:8000/health/ready
Docker Health Check¶
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health/live"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
Logging¶
Structured Logging¶
Set LOG_FORMAT=json for structured logs:
{
"timestamp": "2026-01-01T12:00:00Z",
"level": "INFO",
"message": "Order filled",
"order_id": "123",
"strategy": "arbitrage",
"fill_price": 0.45
}
Log Aggregation¶
For centralized logging, configure: - Loki (with Grafana) - Elasticsearch (with Kibana) - CloudWatch Logs
Best Practices¶
- Set up alerts for critical conditions
- Monitor P&L daily
- Track error rates for early warning
- Review dashboards regularly
- Keep metrics retention appropriate (15-30 days)