Observability

Full-stack LLM observability: trace every agent run with prompts, completions, token counts, tool calls, and latency -- visible in Datadog LLM Observability or any OTel backend.

How It Works

The SDK instruments every agent run using OpenTelemetry GenAI semantic conventions v1.37+. No configuration is needed to collect metrics -- attach an observability engine and every run is traced automatically.

Trace structure

Each run produces a connected trace tree:

invoke_agent root span -- input prompt and final response

↳ execute_event_loop_cycle one per reasoning cycle

↳ chat model call -- full conversation + response + token counts

↳ execute_tool <name> tool input and output captured per call

What Datadog LLM Observability shows

The trace list shows the user prompt and final agent response for each run. Clicking into a trace shows the full per-span breakdown:

Span	Input Messages	Output Messages
`invoke_agent`	Original user prompt	Final agent response
`chat` (cycle 1)	Conversation at the time of this model call	Model response for this cycle (may be a tool call decision)
`chat` (cycle 2+)	Full conversation history including prior cycles	Final model response
`execute_tool`	Tool input arguments	Tool result

Span attributes

Attribute	Where set	Notes
`gen_ai.input.messages`	`invoke_agent`, `chat`	JSON array: `[{"role":"user","parts":[{"type":"text","content":"..."}]}]`
`gen_ai.output.messages`	`invoke_agent`, `chat`	JSON array: `[{"role":"assistant","parts":[...],"finish_reason":"end_turn"}]`
`gen_ai.system`	`chat`	`aws.bedrock`, `anthropic`, `openai`, etc.
`gen_ai.request.model`	`chat`	Model ID as passed to the provider
`gen_ai.request.max_tokens`	`chat`	From provider config
`gen_ai.request.temperature`	`chat`	From provider config, if set
`gen_ai.usage.input_tokens`	`gen_ai.choice` event
`gen_ai.usage.output_tokens`	`gen_ai.choice` event
`gen_ai.tool.name`	`execute_tool`
`gen_ai.tool.call.id`	`execute_tool`
`enduser.id`	Resource attribute (all spans)	Set via `userId:` parameter -- see below
`ml_app`	Resource attribute (all spans)	Routes trace to correct Datadog LLM Obs application

Sending to Datadog LLM Observability

The SDK emits OTel GenAI v1.37+ conventions that Datadog LLM Observability reads natively. Before choosing a setup, read the section below on API key safety.

⚠️

Do not embed a Datadog API key in a shipped app. An API key extracted from your binary gives full write access to your Datadog account. See API Key Safety below.

Minimal setupSwift

import StrandsAgents

let agent = Agent(
    model: provider,
    tools: [myTool],
    observability: OTelObservabilityEngine.datadog(
        apiKey: "...",   // see API Key Safety below
        service: "my-app"
    )
)

All optionsSwift

OTelObservabilityEngine.datadog(
    apiKey:                 "your-dd-api-key",
    service:                "my-app",         // service name / LLM Obs application
    version:                "2.1.0",          // optional, defaults to "1.0"
    site:                   "datadoghq.eu",   // optional, defaults to "datadoghq.com"
    endpoint:               URL(string: "https://your-collector.example.com/v1/traces"),
    userId:                 "user-123",       // see User Identity below
    extraResourceAttributes: ["tenant.id": "acme-corp"]
)

User Identity

Pass userId to attach the authenticated user to every trace. It becomes the enduser.id resource attribute, stamped on all spans for that session. In Datadog LLM Observability you can then filter traces by user, or ask "what did this user send?" directly.

With Cognito / Amplify auth

Create the observability engine after sign-in using the Cognito userId (the user's unique sub). Recreate it per user session -- this is lightweight and correct.

Swift

// Create or update after login
let user = try await Amplify.Auth.getCurrentUser()

let observability = OTelObservabilityEngine.datadog(
    apiKey: "",
    service: "my-app",
    endpoint: URL(string: "https://your-collector.example.com/v1/traces")!,
    userId: user.userId      // Cognito sub -- stable unique ID per user
)

// Reset to anonymous on sign-out
let observability = OTelObservabilityEngine.datadog(
    apiKey: "",
    service: "my-app",
    endpoint: URL(string: "https://your-collector.example.com/v1/traces")!
    // userId omitted = anonymous
)

ℹ️

Create OTelObservabilityEngine once per user session -- not once per agent call. The internal BatchSpanProcessor needs to persist between calls to flush spans reliably. Store it as a @State var (SwiftUI) or instance property, and replace it on login/logout.

With a local model (no auth)

For apps using MLX local inference with no authentication, pass any stable identifier -- a stored username, a device-specific UUID generated once at install, or any string that identifies the user in your system.

Swift

// Username from local preferences
OTelObservabilityEngine.datadog(
    apiKey: "", service: "my-app", endpoint: collectorURL,
    userId: UserDefaults.standard.string(forKey: "username") ?? "anonymous"
)

// Stable per-device ID (generate once, store in Keychain)
OTelObservabilityEngine.datadog(
    apiKey: "", service: "my-app", endpoint: collectorURL,
    userId: UIDevice.current.identifierForVendor?.uuidString ?? "anonymous"
)

// Any custom string
OTelObservabilityEngine.datadog(
    apiKey: "", service: "my-app", endpoint: collectorURL,
    userId: "alice"
)

Custom trace attributes

Use extraResourceAttributes to attach any metadata that should appear on every trace -- tenant, environment, app version, device model, etc.

Swift

OTelObservabilityEngine.datadog(
    apiKey: "", service: "my-app", endpoint: collectorURL,
    userId: user.userId,
    extraResourceAttributes: [
        "tenant.id":    "acme-corp",
        "device.model": ProcessInfo.processInfo.machineModel,
        "app.version":  Bundle.main.shortVersionString ?? "unknown",
    ]
)

API Key Safety

Datadog's OTLP endpoint requires an API key. Unlike Datadog's native mobile SDK which uses restricted client tokens, the OTLP intake has no client-safe credential type.

Context	Approach
Development / internal tools	Environment variable or `.xcconfig` (gitignored)
Server-side agent	Embed directly via environment variable on the server
Shipped iOS / macOS app	Collector proxy -- never embed the key in the binary

Development: environment variable

Swift

OTelObservabilityEngine.datadog(
    apiKey: ProcessInfo.processInfo.environment["DD_API_KEY"] ?? "",
    service: "my-app"
)

Production: proxy backend

Put a lightweight backend between your app and Datadog. The app sends OTLP with no credentials. The backend adds the API key and forwards to Datadog. The key never reaches the device.

iOS / macOS app → OTLP (no key) → Your proxy → + dd-api-key → Datadog

App (no credentials in binary)Swift

OTelObservabilityEngine.datadog(
    apiKey: "",   // proxy adds the key server-side
    service: "my-app",
    endpoint: URL(string: "https://your-proxy.example.com/v1/traces")!
)

Option A: Lambda proxy (serverless)

Lambda handler (Node.js)javascript

export const handler = async (event) => {
  const body = event.isBase64Encoded
    ? Buffer.from(event.body, "base64")
    : Buffer.from(event.body || "");

  const response = await fetch("https://otlp.datadoghq.com/v1/traces", {
    method: "POST",
    headers: {
      "Content-Type": event.headers["content-type"] ?? "application/x-protobuf",
      "dd-api-key": process.env.DD_API_KEY,  // set in Lambda env, not in app
      "dd-otlp-source": "llmobs",
    },
    body,
  });

  return {
    statusCode: response.ok ? 200 : response.status,
    headers: { "Access-Control-Allow-Origin": "*" },
    body: "",
  };
};

Option B: Datadog DDOT Collector

otel-config.yamlyaml

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

exporters:
  datadog:
    api:
      key: ${env:DD_API_KEY}
      site: datadoghq.com

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [datadog]

Manual OTel setup (any backend)

Swift

import OpenTelemetrySdk
import OpenTelemetryProtocolExporterHttp

let exporter = OtlpHttpTraceExporter(
    endpoint: URL(string: "https://collector.yourbackend.com/v1/traces")!
)
let provider = TracerProviderBuilder()
    .add(spanProcessor: BatchSpanProcessor(spanExporter: exporter))
    .build()
OpenTelemetry.registerTracerProvider(tracerProvider: provider)

let tracer = provider.get(instrumentationName: "my-app", instrumentationVersion: "1.0")
let observability = OTelObservabilityEngine(tracer: tracer)

Run Metrics

Every AgentResult carries metrics for the full run and per-cycle breakdowns with no extra configuration:

Swift

let result = try await agent.run("Summarize the quarterly report")

print(result.metrics.cycleCount)            // number of reasoning cycles
print(result.metrics.totalLatencyMs)        // wall-clock time
print(result.metrics.outputTokensPerSecond)
print(result.usage.inputTokens)
print(result.usage.outputTokens)

for cycle in result.metrics.cycles {
    print("Cycle \(cycle.cycleNumber): \(cycle.modelLatencyMs)ms, \(cycle.usage.outputTokens) tokens")
}

Lifecycle Hooks

Hooks let you observe or react to agent events without modifying the agent:

Swift

agent.hookRegistry.addCallback(BeforeModelCallEvent.self) { event in
    print("Sending \(event.messages.count) messages to model")
}

agent.hookRegistry.addCallback(AfterModelCallEvent.self) { event in
    print("Model responded: \(event.usage?.totalTokens ?? 0) tokens")
}

agent.hookRegistry.addCallback(AfterToolCallEvent.self) { event in
    print("Tool \(event.toolUse.name): \(event.result.status)")
}

agent.hookRegistry.addCallback(MetricsEvent.self) { event in
    myMetrics.record(
        cycles: event.metrics.cycleCount,
        tokens: event.metrics.totalUsage.totalTokens,
        latency: event.metrics.totalLatencyMs
    )
}

← PreviousModel Providers Next →Session Persistence