Monitoring & Observability
1. Overview¶
ATTEST provides comprehensive monitoring and observability features through metrics, logging, and distributed tracing to ensure pipeline health and performance.
2. Metrics¶
2.1 Prometheus Integration¶
# Enable Prometheus metrics
monitoring:
prometheus:
enabled: true
endpoint: ":9090"
path: "/metrics"
namespace: "attest"
2.2 Core Metrics¶
# Pipeline metrics
attest_pipeline_executions_total{status="success|failure"}
attest_pipeline_duration_seconds{pipeline="name"}
attest_step_duration_seconds{step="name",status="success|failure"}
# Cache metrics
attest_cache_hits_total{backend="local|s3|redis"}
attest_cache_misses_total{backend="local|s3|redis"}
attest_cache_size_bytes{backend="local|s3|redis"}
# Verification metrics
attest_verifications_total{type="signature|receipt|policy"}
attest_verification_failures_total{reason="invalid_signature|expired|policy_violation"}
attest_verification_duration_seconds{type="signature|receipt|policy"}
3. Logging¶
3.1 Structured Logging¶
{
"timestamp": "2024-12-01T14:30:52Z",
"level": "INFO",
"target": "attest::pipeline",
"message": "Pipeline execution completed",
"fields": {
"pipeline": "build-test-deploy",
"duration_ms": 45230,
"steps_total": 5,
"steps_success": 5,
"cache_hit_rate": 0.8
}
}
3.2 Log Levels¶
# Configure log levels
export ATTEST_LOG_LEVEL=debug
attest run --log-level info
attest config set log_level warn
4. Distributed Tracing¶
4.1 OpenTelemetry Integration¶
# Enable distributed tracing
tracing:
enabled: true
service_name: "attest-pipeline"
exporter:
type: "jaeger"
endpoint: "http://jaeger:14268/api/traces"
# Trace sampling
sampling:
rate: 0.1 # Sample 10% of traces
always_sample_errors: true
4.2 Trace Context¶
# Pipeline execution with tracing
TRACEPARENT=00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 \
attest run --trace
5. Dashboards¶
5.1 Grafana Dashboard¶
{
"dashboard": {
"title": "ATTEST Pipeline Monitoring",
"panels": [
{
"title": "Pipeline Success Rate",
"type": "stat",
"targets": [
{
"expr": "rate(attest_pipeline_executions_total{status=\"success\"}[5m]) / rate(attest_pipeline_executions_total[5m])"
}
]
},
{
"title": "Cache Hit Rate",
"type": "stat",
"targets": [
{
"expr": "rate(attest_cache_hits_total[5m]) / (rate(attest_cache_hits_total[5m]) + rate(attest_cache_misses_total[5m]))"
}
]
}
]
}
}
6. Alerting¶
6.1 Prometheus Alerts¶
groups:
- name: attest-pipeline
rules:
- alert: PipelineFailureRate
expr: rate(attest_pipeline_executions_total{status="failure"}[5m]) > 0.1
for: 2m
annotations:
summary: "High pipeline failure rate detected"
- alert: CachePerformanceDegraded
expr: rate(attest_cache_hits_total[5m]) / (rate(attest_cache_hits_total[5m]) + rate(attest_cache_misses_total[5m])) < 0.5
for: 5m
annotations:
summary: "Cache hit rate below threshold"
7. Health Checks¶
7.1 Endpoint Configuration¶
# Health check endpoints
health:
enabled: true
endpoints:
liveness: "/health/live"
readiness: "/health/ready"
metrics: "/metrics"
7.2 Custom Health Checks¶
// Custom health check
impl HealthCheck for CacheHealthCheck {
async fn check(&self) -> HealthStatus {
match self.cache.ping().await {
Ok(_) => HealthStatus::Healthy,
Err(e) => HealthStatus::Unhealthy(format!("Cache unavailable: {}", e))
}
}
}
8. Performance Monitoring¶
8.1 Profiling¶
# CPU profiling
attest run --profile cpu --profile-output cpu.prof
# Memory profiling
attest run --profile memory --profile-output memory.prof
# Analyze profiles
attest analyze cpu.prof --format html --output cpu-analysis.html
8.2 Benchmarking¶
# Benchmark pipeline performance
attest benchmark pipeline --iterations 10 --output benchmark.json
# Compare performance across versions
attest benchmark compare \
--baseline v0.1.0 \
--current v0.2.0 \
--format table
This monitoring system provides comprehensive visibility into ATTEST pipeline operations and performance.