Observability
Full-stack LLM observability: trace every agent run with prompts, completions, token counts, tool calls, and latency -- visible in Datadog LLM Observability or any OTel backend.
How It Works
The SDK instruments every agent run using OpenTelemetry GenAI semantic conventions v1.37+. No configuration is needed to collect metrics -- attach an observability engine and every run is traced automatically.
Trace structure
Each run produces a connected trace tree:
What Datadog LLM Observability shows
The trace list shows the user prompt and final agent response for each run. Clicking into a trace shows the full per-span breakdown:
| Span | Input Messages | Output Messages |
|---|---|---|
invoke_agent | Original user prompt | Final agent response |
chat (cycle 1) | Conversation at the time of this model call | Model response for this cycle (may be a tool call decision) |
chat (cycle 2+) | Full conversation history including prior cycles | Final model response |
execute_tool | Tool input arguments | Tool result |
Span attributes
| Attribute | Where set | Notes |
|---|---|---|
gen_ai.input.messages | invoke_agent, chat | JSON array: [{"role":"user","parts":[{"type":"text","content":"..."}]}] |
gen_ai.output.messages | invoke_agent, chat | JSON array: [{"role":"assistant","parts":[...],"finish_reason":"end_turn"}] |
gen_ai.system | chat | aws.bedrock, anthropic, openai, etc. |
gen_ai.request.model | chat | Model ID as passed to the provider |
gen_ai.request.max_tokens | chat | From provider config |
gen_ai.request.temperature | chat | From provider config, if set |
gen_ai.usage.input_tokens | gen_ai.choice event | |
gen_ai.usage.output_tokens | gen_ai.choice event | |
gen_ai.tool.name | execute_tool | |
gen_ai.tool.call.id | execute_tool | |
enduser.id | Resource attribute (all spans) | Set via userId: parameter -- see below |
ml_app | Resource attribute (all spans) | Routes trace to correct Datadog LLM Obs application |
Sending to Datadog LLM Observability
The SDK emits OTel GenAI v1.37+ conventions that Datadog LLM Observability reads natively. Before choosing a setup, read the section below on API key safety.
Do not embed a Datadog API key in a shipped app. An API key extracted from your binary gives full write access to your Datadog account. See API Key Safety below.
import StrandsAgents
let agent = Agent(
model: provider,
tools: [myTool],
observability: OTelObservabilityEngine.datadog(
apiKey: "...", // see API Key Safety below
service: "my-app"
)
)
OTelObservabilityEngine.datadog(
apiKey: "your-dd-api-key",
service: "my-app", // service name / LLM Obs application
version: "2.1.0", // optional, defaults to "1.0"
site: "datadoghq.eu", // optional, defaults to "datadoghq.com"
endpoint: URL(string: "https://your-collector.example.com/v1/traces"),
userId: "user-123", // see User Identity below
extraResourceAttributes: ["tenant.id": "acme-corp"]
)
User Identity
Pass userId to attach the authenticated user to every trace. It becomes the enduser.id resource attribute, stamped on all spans for that session. In Datadog LLM Observability you can then filter traces by user, or ask "what did this user send?" directly.
With Cognito / Amplify auth
Create the observability engine after sign-in using the Cognito userId (the user's unique sub). Recreate it per user session -- this is lightweight and correct.
// Create or update after login
let user = try await Amplify.Auth.getCurrentUser()
let observability = OTelObservabilityEngine.datadog(
apiKey: "",
service: "my-app",
endpoint: URL(string: "https://your-collector.example.com/v1/traces")!,
userId: user.userId // Cognito sub -- stable unique ID per user
)
// Reset to anonymous on sign-out
let observability = OTelObservabilityEngine.datadog(
apiKey: "",
service: "my-app",
endpoint: URL(string: "https://your-collector.example.com/v1/traces")!
// userId omitted = anonymous
)
Create OTelObservabilityEngine once per user session -- not once per agent call. The internal BatchSpanProcessor needs to persist between calls to flush spans reliably. Store it as a @State var (SwiftUI) or instance property, and replace it on login/logout.
With a local model (no auth)
For apps using MLX local inference with no authentication, pass any stable identifier -- a stored username, a device-specific UUID generated once at install, or any string that identifies the user in your system.
// Username from local preferences
OTelObservabilityEngine.datadog(
apiKey: "", service: "my-app", endpoint: collectorURL,
userId: UserDefaults.standard.string(forKey: "username") ?? "anonymous"
)
// Stable per-device ID (generate once, store in Keychain)
OTelObservabilityEngine.datadog(
apiKey: "", service: "my-app", endpoint: collectorURL,
userId: UIDevice.current.identifierForVendor?.uuidString ?? "anonymous"
)
// Any custom string
OTelObservabilityEngine.datadog(
apiKey: "", service: "my-app", endpoint: collectorURL,
userId: "alice"
)
Custom trace attributes
Use extraResourceAttributes to attach any metadata that should appear on every trace -- tenant, environment, app version, device model, etc.
OTelObservabilityEngine.datadog(
apiKey: "", service: "my-app", endpoint: collectorURL,
userId: user.userId,
extraResourceAttributes: [
"tenant.id": "acme-corp",
"device.model": ProcessInfo.processInfo.machineModel,
"app.version": Bundle.main.shortVersionString ?? "unknown",
]
)
API Key Safety
Datadog's OTLP endpoint requires an API key. Unlike Datadog's native mobile SDK which uses restricted client tokens, the OTLP intake has no client-safe credential type.
| Context | Approach |
|---|---|
| Development / internal tools | Environment variable or .xcconfig (gitignored) |
| Server-side agent | Embed directly via environment variable on the server |
| Shipped iOS / macOS app | Collector proxy -- never embed the key in the binary |
Development: environment variable
OTelObservabilityEngine.datadog(
apiKey: ProcessInfo.processInfo.environment["DD_API_KEY"] ?? "",
service: "my-app"
)
Production: proxy backend
Put a lightweight backend between your app and Datadog. The app sends OTLP with no credentials. The backend adds the API key and forwards to Datadog. The key never reaches the device.
OTelObservabilityEngine.datadog(
apiKey: "", // proxy adds the key server-side
service: "my-app",
endpoint: URL(string: "https://your-proxy.example.com/v1/traces")!
)
Option A: Lambda proxy (serverless)
export const handler = async (event) => {
const body = event.isBase64Encoded
? Buffer.from(event.body, "base64")
: Buffer.from(event.body || "");
const response = await fetch("https://otlp.datadoghq.com/v1/traces", {
method: "POST",
headers: {
"Content-Type": event.headers["content-type"] ?? "application/x-protobuf",
"dd-api-key": process.env.DD_API_KEY, // set in Lambda env, not in app
"dd-otlp-source": "llmobs",
},
body,
});
return {
statusCode: response.ok ? 200 : response.status,
headers: { "Access-Control-Allow-Origin": "*" },
body: "",
};
};
Option B: Datadog DDOT Collector
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
exporters:
datadog:
api:
key: ${env:DD_API_KEY}
site: datadoghq.com
service:
pipelines:
traces:
receivers: [otlp]
exporters: [datadog]
Manual OTel setup (any backend)
import OpenTelemetrySdk
import OpenTelemetryProtocolExporterHttp
let exporter = OtlpHttpTraceExporter(
endpoint: URL(string: "https://collector.yourbackend.com/v1/traces")!
)
let provider = TracerProviderBuilder()
.add(spanProcessor: BatchSpanProcessor(spanExporter: exporter))
.build()
OpenTelemetry.registerTracerProvider(tracerProvider: provider)
let tracer = provider.get(instrumentationName: "my-app", instrumentationVersion: "1.0")
let observability = OTelObservabilityEngine(tracer: tracer)
Run Metrics
Every AgentResult carries metrics for the full run and per-cycle breakdowns with no extra configuration:
let result = try await agent.run("Summarize the quarterly report")
print(result.metrics.cycleCount) // number of reasoning cycles
print(result.metrics.totalLatencyMs) // wall-clock time
print(result.metrics.outputTokensPerSecond)
print(result.usage.inputTokens)
print(result.usage.outputTokens)
for cycle in result.metrics.cycles {
print("Cycle \(cycle.cycleNumber): \(cycle.modelLatencyMs)ms, \(cycle.usage.outputTokens) tokens")
}
Lifecycle Hooks
Hooks let you observe or react to agent events without modifying the agent:
agent.hookRegistry.addCallback(BeforeModelCallEvent.self) { event in
print("Sending \(event.messages.count) messages to model")
}
agent.hookRegistry.addCallback(AfterModelCallEvent.self) { event in
print("Model responded: \(event.usage?.totalTokens ?? 0) tokens")
}
agent.hookRegistry.addCallback(AfterToolCallEvent.self) { event in
print("Tool \(event.toolUse.name): \(event.result.status)")
}
agent.hookRegistry.addCallback(MetricsEvent.self) { event in
myMetrics.record(
cycles: event.metrics.cycleCount,
tokens: event.metrics.totalUsage.totalTokens,
latency: event.metrics.totalLatencyMs
)
}