Response Quality
Measure how well your agents are answering user questions — even when users don’t give direct feedback.
What This Section Covers
In Optimly, every agent response is evaluated to estimate its effectiveness and clarity. Rather than relying solely on thumbs-up/thumbs-down feedback (which most users don’t give), we analyze various signals to determine response quality.
This helps you continuously improve agent performance without needing manual reviews for every conversation.
How We Estimate Quality
Completion and Relevance
We assess whether the agent’s response:
- Fully answered the user's intent
- Was relevant to the user's original message
- Avoided hallucinations or off-topic content
Responses are classified as:
- Complete
- Partially Complete
- Incomplete
Document Usage (RAG Context)
If the agent is connected to knowledge sources:
- Did the response reference any documents?
- Was the document appropriate for the topic?
- Was the response grounded in facts from the knowledge base?
This helps evaluate how well the RAG setup is performing.
Fallbacks and Escalations
Signals that reduce quality score:
- Response triggered a fallback or default message
- Agent prompted the user to rephrase
- The user repeated the same question
- The user abandoned the session shortly after the reply
These behaviors suggest that the response did not satisfy the user.
Manual Flags
Agents or admins may also manually flag responses as:
- Correct / Incorrect
- Misleading
- Incomplete
- Off-brand
- Needs Review
These manual tags are incorporated into the overall quality model.
Visualizations and Metrics
- % of responses marked as complete / partial / incomplete
- % of responses using connected documents
- Fallback rate per agent or per topic
- Repetition after response (user asks the same thing again)
- Quality trends over time or after prompt changes
Use Cases
- Identify agents or flows that consistently fail to resolve questions
- Evaluate how well new knowledge content is being used
- Spot low-quality patterns before they affect CSAT or churn
- Benchmark agent performance without needing user thumbs
Next: Anomalies and Flags