Skip to main content

Comparisons and Benchmarks

Compare the performance of your agents, flows, or content over time — and identify what truly works.

What This Section Covers

Optimly lets you benchmark agents, prompts, topics, and timeframes side by side.

This allows teams to:

Identify the most effective agent configurations
Monitor improvements (or regressions) after updates
Validate changes using real-world usage data

What You Can Compare

Agents

Evaluate different agents across key metrics:

Resolution rate
Abandonment rate
Average session duration
Tool and document usage
Flag frequency

Useful for comparing:

A/B tests of tone or prompt strategy
Agents deployed in different regions or channels
Internal vs. customer-facing agents

Timeframes

Track how your metrics evolve over time to:

Measure the impact of updates to prompts, documents, or model versions
Monitor adoption after a launch
Compare before/after data when new content or features are added

Example: Has the abandonment rate dropped since your last RAG update?

User Segments

Break down performance by:

Channel (web, WhatsApp, email, etc.)
User type (lead vs. customer)
Stage in the customer journey (onboarding vs. support)

This helps tailor content and agent behavior for different use cases.

Topics or Intents

Compare how well your agents perform across types of user requests:

Are billing questions being resolved faster than technical ones?
Which intents trigger the most flags or human takeovers?

This is key for prioritizing improvements.

Visualizations and Metrics

Side-by-side metric tables by agent or version
Trend lines and deltas over time
Bar and radar charts for performance breakdowns
Flag distribution per agent
Success rate by intent or topic

Use Cases

Validate a new prompt style before rolling it out to all agents
Demonstrate ROI of an improved knowledge base
Identify agents or segments that need retraining
Optimize agent strategies per use case

Next: Exports and Reports

What This Section Covers
What You Can Compare
Visualizations and Metrics
Use Cases