Comparisons and Benchmarks
Compare the performance of your agents, flows, or content over time — and identify what truly works.
What This Section Covers
Optimly lets you benchmark agents, prompts, topics, and timeframes side by side.
This allows teams to:
- Identify the most effective agent configurations
- Monitor improvements (or regressions) after updates
- Validate changes using real-world usage data
What You Can Compare
Agents
Evaluate different agents across key metrics:
- Resolution rate
- Abandonment rate
- Average session duration
- Tool and document usage
- Flag frequency
Useful for comparing:
- A/B tests of tone or prompt strategy
- Agents deployed in different regions or channels
- Internal vs. customer-facing agents
Timeframes
Track how your metrics evolve over time to:
- Measure the impact of updates to prompts, documents, or model versions
- Monitor adoption after a launch
- Compare before/after data when new content or features are added
Example: Has the abandonment rate dropped since your last RAG update?
User Segments
Break down performance by:
- Channel (web, WhatsApp, email, etc.)
- User type (lead vs. customer)
- Stage in the customer journey (onboarding vs. support)
This helps tailor content and agent behavior for different use cases.
Topics or Intents
Compare how well your agents perform across types of user requests:
- Are billing questions being resolved faster than technical ones?
- Which intents trigger the most flags or human takeovers?
This is key for prioritizing improvements.
Visualizations and Metrics
- Side-by-side metric tables by agent or version
- Trend lines and deltas over time
- Bar and radar charts for performance breakdowns
- Flag distribution per agent
- Success rate by intent or topic
Use Cases
- Validate a new prompt style before rolling it out to all agents
- Demonstrate ROI of an improved knowledge base
- Identify agents or segments that need retraining
- Optimize agent strategies per use case
Next: Exports and Reports