Observability

Full-stack LLM observability: trace every agent run with prompts, completions, token counts, tool calls, and latency -- visible in Datadog LLM Observability or any OTel backend.

How It Works

The SDK instruments every agent run using OpenTelemetry GenAI semantic conventions v1.37+. No configuration is needed to collect metrics -- attach an observability engine and every run is traced automatically.

Trace structure

Each run produces a connected trace tree:

invoke_agent root span -- input prompt and final response
execute_event_loop_cycle one per reasoning cycle
chat model call -- full conversation + response + token counts
execute_tool <name> tool input and output captured per call

What Datadog LLM Observability shows

The trace list shows the user prompt and final agent response for each run. Clicking into a trace shows the full per-span breakdown:

SpanInput MessagesOutput Messages
invoke_agentOriginal user promptFinal agent response
chat (cycle 1)Conversation at the time of this model callModel response for this cycle (may be a tool call decision)
chat (cycle 2+)Full conversation history including prior cyclesFinal model response
execute_toolTool input argumentsTool result

Span attributes

AttributeWhere setNotes
gen_ai.input.messagesinvoke_agent, chatJSON array: [{"role":"user","parts":[{"type":"text","content":"..."}]}]
gen_ai.output.messagesinvoke_agent, chatJSON array: [{"role":"assistant","parts":[...],"finish_reason":"end_turn"}]
gen_ai.systemchataws.bedrock, anthropic, openai, etc.
gen_ai.request.modelchatModel ID as passed to the provider
gen_ai.request.max_tokenschatFrom provider config
gen_ai.request.temperaturechatFrom provider config, if set
gen_ai.usage.input_tokensgen_ai.choice event
gen_ai.usage.output_tokensgen_ai.choice event
gen_ai.tool.nameexecute_tool
gen_ai.tool.call.idexecute_tool
enduser.idResource attribute (all spans)Set via userId: parameter -- see below
ml_appResource attribute (all spans)Routes trace to correct Datadog LLM Obs application

Sending to Datadog LLM Observability

The SDK emits OTel GenAI v1.37+ conventions that Datadog LLM Observability reads natively. Before choosing a setup, read the section below on API key safety.

⚠️

Do not embed a Datadog API key in a shipped app. An API key extracted from your binary gives full write access to your Datadog account. See API Key Safety below.

Minimal setupSwift
import StrandsAgents

let agent = Agent(
    model: provider,
    tools: [myTool],
    observability: OTelObservabilityEngine.datadog(
        apiKey: "...",   // see API Key Safety below
        service: "my-app"
    )
)
All optionsSwift
OTelObservabilityEngine.datadog(
    apiKey:                 "your-dd-api-key",
    service:                "my-app",         // service name / LLM Obs application
    version:                "2.1.0",          // optional, defaults to "1.0"
    site:                   "datadoghq.eu",   // optional, defaults to "datadoghq.com"
    endpoint:               URL(string: "https://your-collector.example.com/v1/traces"),
    userId:                 "user-123",       // see User Identity below
    extraResourceAttributes: ["tenant.id": "acme-corp"]
)

User Identity

Pass userId to attach the authenticated user to every trace. It becomes the enduser.id resource attribute, stamped on all spans for that session. In Datadog LLM Observability you can then filter traces by user, or ask "what did this user send?" directly.

With Cognito / Amplify auth

Create the observability engine after sign-in using the Cognito userId (the user's unique sub). Recreate it per user session -- this is lightweight and correct.

Swift
// Create or update after login
let user = try await Amplify.Auth.getCurrentUser()

let observability = OTelObservabilityEngine.datadog(
    apiKey: "",
    service: "my-app",
    endpoint: URL(string: "https://your-collector.example.com/v1/traces")!,
    userId: user.userId      // Cognito sub -- stable unique ID per user
)

// Reset to anonymous on sign-out
let observability = OTelObservabilityEngine.datadog(
    apiKey: "",
    service: "my-app",
    endpoint: URL(string: "https://your-collector.example.com/v1/traces")!
    // userId omitted = anonymous
)
ℹ️

Create OTelObservabilityEngine once per user session -- not once per agent call. The internal BatchSpanProcessor needs to persist between calls to flush spans reliably. Store it as a @State var (SwiftUI) or instance property, and replace it on login/logout.

With a local model (no auth)

For apps using MLX local inference with no authentication, pass any stable identifier -- a stored username, a device-specific UUID generated once at install, or any string that identifies the user in your system.

Swift
// Username from local preferences
OTelObservabilityEngine.datadog(
    apiKey: "", service: "my-app", endpoint: collectorURL,
    userId: UserDefaults.standard.string(forKey: "username") ?? "anonymous"
)

// Stable per-device ID (generate once, store in Keychain)
OTelObservabilityEngine.datadog(
    apiKey: "", service: "my-app", endpoint: collectorURL,
    userId: UIDevice.current.identifierForVendor?.uuidString ?? "anonymous"
)

// Any custom string
OTelObservabilityEngine.datadog(
    apiKey: "", service: "my-app", endpoint: collectorURL,
    userId: "alice"
)

Custom trace attributes

Use extraResourceAttributes to attach any metadata that should appear on every trace -- tenant, environment, app version, device model, etc.

Swift
OTelObservabilityEngine.datadog(
    apiKey: "", service: "my-app", endpoint: collectorURL,
    userId: user.userId,
    extraResourceAttributes: [
        "tenant.id":    "acme-corp",
        "device.model": ProcessInfo.processInfo.machineModel,
        "app.version":  Bundle.main.shortVersionString ?? "unknown",
    ]
)

API Key Safety

Datadog's OTLP endpoint requires an API key. Unlike Datadog's native mobile SDK which uses restricted client tokens, the OTLP intake has no client-safe credential type.

ContextApproach
Development / internal toolsEnvironment variable or .xcconfig (gitignored)
Server-side agentEmbed directly via environment variable on the server
Shipped iOS / macOS appCollector proxy -- never embed the key in the binary

Development: environment variable

Swift
OTelObservabilityEngine.datadog(
    apiKey: ProcessInfo.processInfo.environment["DD_API_KEY"] ?? "",
    service: "my-app"
)

Production: proxy backend

Put a lightweight backend between your app and Datadog. The app sends OTLP with no credentials. The backend adds the API key and forwards to Datadog. The key never reaches the device.

iOS / macOS app → OTLP (no key) → Your proxy → + dd-api-key → Datadog
App (no credentials in binary)Swift
OTelObservabilityEngine.datadog(
    apiKey: "",   // proxy adds the key server-side
    service: "my-app",
    endpoint: URL(string: "https://your-proxy.example.com/v1/traces")!
)

Option A: Lambda proxy (serverless)

Lambda handler (Node.js)javascript
export const handler = async (event) => {
  const body = event.isBase64Encoded
    ? Buffer.from(event.body, "base64")
    : Buffer.from(event.body || "");

  const response = await fetch("https://otlp.datadoghq.com/v1/traces", {
    method: "POST",
    headers: {
      "Content-Type": event.headers["content-type"] ?? "application/x-protobuf",
      "dd-api-key": process.env.DD_API_KEY,  // set in Lambda env, not in app
      "dd-otlp-source": "llmobs",
    },
    body,
  });

  return {
    statusCode: response.ok ? 200 : response.status,
    headers: { "Access-Control-Allow-Origin": "*" },
    body: "",
  };
};

Option B: Datadog DDOT Collector

otel-config.yamlyaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

exporters:
  datadog:
    api:
      key: ${env:DD_API_KEY}
      site: datadoghq.com

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [datadog]

Manual OTel setup (any backend)

Swift
import OpenTelemetrySdk
import OpenTelemetryProtocolExporterHttp

let exporter = OtlpHttpTraceExporter(
    endpoint: URL(string: "https://collector.yourbackend.com/v1/traces")!
)
let provider = TracerProviderBuilder()
    .add(spanProcessor: BatchSpanProcessor(spanExporter: exporter))
    .build()
OpenTelemetry.registerTracerProvider(tracerProvider: provider)

let tracer = provider.get(instrumentationName: "my-app", instrumentationVersion: "1.0")
let observability = OTelObservabilityEngine(tracer: tracer)

Run Metrics

Every AgentResult carries metrics for the full run and per-cycle breakdowns with no extra configuration:

Swift
let result = try await agent.run("Summarize the quarterly report")

print(result.metrics.cycleCount)            // number of reasoning cycles
print(result.metrics.totalLatencyMs)        // wall-clock time
print(result.metrics.outputTokensPerSecond)
print(result.usage.inputTokens)
print(result.usage.outputTokens)

for cycle in result.metrics.cycles {
    print("Cycle \(cycle.cycleNumber): \(cycle.modelLatencyMs)ms, \(cycle.usage.outputTokens) tokens")
}

Lifecycle Hooks

Hooks let you observe or react to agent events without modifying the agent:

Swift
agent.hookRegistry.addCallback(BeforeModelCallEvent.self) { event in
    print("Sending \(event.messages.count) messages to model")
}

agent.hookRegistry.addCallback(AfterModelCallEvent.self) { event in
    print("Model responded: \(event.usage?.totalTokens ?? 0) tokens")
}

agent.hookRegistry.addCallback(AfterToolCallEvent.self) { event in
    print("Tool \(event.toolUse.name): \(event.result.status)")
}

agent.hookRegistry.addCallback(MetricsEvent.self) { event in
    myMetrics.record(
        cycles: event.metrics.cycleCount,
        tokens: event.metrics.totalUsage.totalTokens,
        latency: event.metrics.totalLatencyMs
    )
}