OpenTelemetry Integration¶
Haiway provides seamless integration with OpenTelemetry for distributed tracing, metrics collection, and structured logging. This integration allows you to observe your applications with industry-standard tooling while maintaining Haiway's functional programming principles.
Overview¶
The OpenTelemetry integration in Haiway bridges the framework's observability abstractions with the OpenTelemetry SDK, enabling:
- Distributed Tracing: Automatic span creation and context propagation across async operations
- Metrics Collection: Counter, histogram, and gauge metrics with custom attributes
- Structured Logging: Context-aware logs correlated with traces
- External Trace Linking: Connect to existing distributed traces from other services
Quick Start¶
1. Installation¶
The OpenTelemetry integration requires additional dependencies:
pip install haiway[opentelemetry]
# or manually:
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
2. Configuration¶
Configure OpenTelemetry once at application startup (configuring more than once will cause an errror):
from haiway.opentelemetry import OpenTelemetry
# Configure for local development (console output)...
OpenTelemetry.configure(
service="my-service",
version="1.0.0",
environment="development"
)
# ...or for production (OTLP export)
OpenTelemetry.configure(
service="my-service",
version="1.0.0",
environment="production",
otlp_endpoint="http://jaeger:4317",
insecure=True,
export_interval_millis=5000,
attributes={
"team": "backend",
"component": "api"
}
)
Configuration Options¶
Basic Configuration¶
Parameter | Type | Description |
---|---|---|
service |
str |
Name of your service |
version |
str |
Version of your service |
environment |
str |
Deployment environment (e.g., "production", "staging") |
OTLP Export Configuration¶
Parameter | Type | Default | Description |
---|---|---|---|
otlp_endpoint |
str \| None |
None |
OTLP endpoint URL. If None, uses console exporters |
insecure |
bool |
True |
Whether to use insecure connections |
export_interval_millis |
int |
5000 |
Metrics export interval in milliseconds |
attributes |
Mapping[str, Any] \| None |
None |
Additional resource attributes |
3. Usage with Context¶
Use the configured OpenTelemetry observability in your Haiway contexts:
from haiway import ctx
from haiway.opentelemetry import OpenTelemetry
async def main():
# Use in context scope
async with ctx.scope(
"application",
# Create observability instance
observability=OpenTelemetry.observability()
):
await process_requests()
async def process_requests():
# Automatic span creation and context propagation
async with ctx.scope("request-processing"):
ctx.log_info("Processing batch of requests")
# Record custom metrics
ctx.record_metric("requests.processed", value=10, kind="counter")
# Record custom events
ctx.record_event("batch.started", attributes={
"batch_size": 10,
"priority": "high"
})
await process_individual_requests()
async def process_individual_requests():
# Nested spans are automatically created
async with ctx.scope("individual-request"):
ctx.record_attributes({
"request.id": "req-123",
"user.id": "user-456"
})
# Simulated work
await asyncio.sleep(0.1)
ctx.record_metric("request.duration", value=100, unit="ms", kind="histogram")
Console vs OTLP Export¶
Console Export: When no otlp endpoint was specified all metrics and logs will be utilizing python logging system.
OpenTelemetry.configure(
service="my-service",
version="1.0.0",
environment="development"
# otlp_endpoint=None (default) uses console exporters
)
OTLP Export: With otlp endpoint provided there will be no logs mirroring, all metrics and logs will be sent to the specified endpoint.
OpenTelemetry.configure(
service="my-service",
version="1.0.0",
environment="production",
otlp_endpoint="http://collector:4317"
)
Distributed Tracing¶
Automatic Span Creation¶
Haiway automatically creates OpenTelemetry spans for each context scope:
async def handle_request():
async with ctx.scope("http-request"): # Creates span "http-request"
async with ctx.scope("database-query"): # Creates child span "database-query"
await query_database()
async with ctx.scope("external-api"): # Creates child span "external-api"
await call_external_service()
External Trace Linking¶
Connect to existing distributed traces from other services:
# Link to external trace (e.g., from HTTP headers)
external_trace_id = request.headers.get("x-trace-id")
observability = OpenTelemetry.observability(
external_trace_id=external_trace_id
)
async with ctx.scope("service-handler", observability=observability):
# This span will be linked to the external trace
await handle_service_request()
Trace Context Propagation¶
Traces automatically propagate across async operations:
async def parent_operation():
async with ctx.scope("parent"):
# Start concurrent operations - they inherit trace context
tasks = [
asyncio.create_task(child_operation(i))
for i in range(3)
]
await asyncio.gather(*tasks)
async def child_operation(task_id: int):
async with ctx.scope(f"child-{task_id}"):
# Each child gets its own span under the parent trace
await asyncio.sleep(0.1)
Metrics Collection¶
Metric Types¶
Haiway supports three OpenTelemetry metric types:
# Counter: Monotonically increasing values
ctx.record_metric("requests.total", value=1, kind="counter")
# Histogram: Distribution of values (e.g., latencies, sizes)
ctx.record_metric("request.duration", value=150, unit="ms", kind="histogram")
# Gauge: Point-in-time values that can go up or down
ctx.record_metric("active_connections", value=42, kind="gauge")
Metric Attributes¶
Add dimensional data to metrics:
ctx.record_metric(
"requests.processed",
value=1,
kind="counter",
attributes={
"method": "POST",
"endpoint": "/api/users",
"status": "success"
}
)
Custom Units¶
Specify units for better observability:
ctx.record_metric("response.size", value=1024, unit="byte", kind="histogram")
ctx.record_metric("cpu.usage", value=75.5, unit="percent", kind="gauge")
ctx.record_metric("request.rate", value=150, unit="1/s", kind="gauge")
Structured Logging¶
Context-Aware Logging¶
Logs are automatically correlated with active spans:
async with ctx.scope("user-service"):
ctx.log_info("Processing user request") # Correlated with span
try:
user = await fetch_user(user_id)
ctx.log_debug("User fetched successfully", user_id=user_id)
except UserNotFound as e:
ctx.log_error("User not found", user_id=user_id, exception=e)
Log Levels¶
Control log verbosity by setting the observability level:
from haiway.context import ObservabilityLevel
# Only log warnings and errors
observability = OpenTelemetry.observability(level=ObservabilityLevel.WARNING)
async with ctx.scope("critical-operation", observability=observability):
ctx.log_debug("This won't be recorded") # Below threshold
ctx.log_warning("This will be recorded") # At or above threshold
Event Recording¶
Custom Events¶
Record significant events with structured attributes:
ctx.record_event("user.login", attributes={
"user_id": "user-123",
"login_method": "oauth",
"client_ip": "192.168.1.100",
"success": True
})
ctx.record_event("cache.miss", attributes={
"cache_key": "user:profile:123",
"ttl": 3600
})
Business Events¶
Track business-relevant events:
ctx.record_event("order.created", attributes={
"order_id": "ord-789",
"customer_id": "cust-456",
"total_amount": 99.99,
"currency": "USD",
"items_count": 3
})
Advanced Usage¶
Custom Resource Attributes¶
Add service-specific metadata:
OpenTelemetry.configure(
service="payment-service",
version="2.1.0",
environment="production",
otlp_endpoint="http://collector:4317",
attributes={
"service.namespace": "payments",
"service.instance.id": os.environ.get("INSTANCE_ID"),
"deployment.version": "v2.1.0-rc.1",
"team": "payments-team",
"region": "us-east-1"
}
)
Error Handling and Status¶
Spans automatically record error status when exceptions occur:
async with ctx.scope("risky-operation"):
try:
await potentially_failing_operation()
# Span status: OK
except Exception as e:
# Span status: ERROR, exception recorded
ctx.log_error("Operation failed", exception=e)
raise
Integration with Popular Tools¶
Jaeger¶
OpenTelemetry.configure(
service="my-service",
version="1.0.0",
environment="production",
otlp_endpoint="http://jaeger-collector:14250",
insecure=True,
)
Prometheus + Grafana¶
OpenTelemetry.configure(
service="my-service",
version="1.0.0",
environment="production",
otlp_endpoint="http://otel-collector:4317",
export_interval_millis=10000, # 10 second export interval
)
SigNoz¶
For self-hosted SigNoz:
OpenTelemetry.configure(
service="my-service",
version="1.0.0",
environment="production",
otlp_endpoint="http://signoz-otel-collector:4317",
insecure=True,
export_interval_millis=5000,
)
Best Practices¶
1. Service Naming¶
Use consistent service names across your organization:
# Good: Consistent with service discovery
OpenTelemetry.configure(service="user-service", ...)
# Avoid: Inconsistent naming
OpenTelemetry.configure(service="userSvc", ...)
2. Meaningful Span Names¶
Use descriptive span names that indicate the operation:
# Good: Descriptive operation names
async with ctx.scope("validate-user-permissions"):
...
async with ctx.scope("fetch-user-profile"):
...
# Avoid: Generic names
async with ctx.scope("operation"):
...
3. Attribute Consistency¶
Use consistent attribute names across your services:
# Good: Consistent attribute naming
ctx.record_attributes({
"user.id": user_id,
"user.role": user_role,
"request.id": request_id
})
# Establish naming conventions:
# - Use dots for namespacing
# - Use snake_case for attribute names
# - Use consistent prefixes (user., request., etc.)
Troubleshooting¶
Common Issues¶
1. No telemetry data appearing - Verify OTLP endpoint is reachable - Check if OpenTelemetry.configure() was called before creating observability - Ensure proper network connectivity to your observability backend
2. High memory usage - Consider increasing export intervals - Check if you're creating too many unique metric label combinations - Review span attribute cardinality
3. Missing trace correlation - Ensure observability is properly passed through context scopes - Verify external trace ID format is correct - Check that async context is properly propagated