How To Evaluate Agents
This guide is for domain experts who are evaluating the performance of their Agent and covers how to review feedback, review the automated evaluation metrics, and how to diagnose any issues.
1. Accessing the Agent
To access your Agent, head to your URL for the Great Wave AI Platform and login via your username and password or via Microsoft single sign-on. The URL may be different for each organisation and may be versions of:
app.greatwave.ai
[domain].greatwave.ai
greatwave.[domain].com
Once you've logged in:
On the left-hand side of the page, set the "Agents by User" filter to "All".

Either Search for the Agent you want to investigate or filter via the tag under the "Filter Agents".

Find the Agent you want to investigate and click "Select".

Navigate to the Evaluation page using the panel on the left hand-side of the screen.

2. Filtering for Issues
2.1 Filtering for Live vs Test Usage
You can filter on Live vs Tests usage.
Live usage is from anyone using the Agent via the API endpoints or via the live chat front-end.
Tests usage is from anyone using the Agent via the Great Wave AI Platform (e.g., via the Instruct screen or the Design screen).

2.2 Filtering for Red Flags
If people have given any negative feedback (see How To Give Agent Feedback), these will show up within the Evaluation page. You can filter for all Red Flags.

Red flags also show up highlighted red on the Evaluation screen.

2.3 Filtering on Automated Evaluation
You can also filter across the automated evaluation metrics (see Evaluate for more details on the metrics) by clicking on that metric. This will order the scores high-to-low or low-to-high.

Recommendation on which automated evaluation metrics to check:
Relevance: Things that score low on relevance (did it answer the question) will give you an idea of the things that people have asked that may not be in your source information. Here people are looking for information that they are struggling to find. This is a powerful flag to help you create additional content in the future.
Data Quality: Things that score low on data quality might indicate discrepancies in the data set which are worth investigating (i.e., contradictory information). This is a powerful flag to help you understand whether data needs reviewing and fixing.
3. Understanding the Issues
Once you've filtered and understood which Question & Response pairing you want to investigate, first click into the Question & Response pairing by clicking on it.

Here you can see:
Question: the Question asked by the user
Response: the Response given by the Agent
Agent Context (Source data): this is the information the Agent used to give it's Response. This can be:
Knowledge that you've attached to the Agent (see Knowledge Domain)
Responses from other Agents
Responses from an API call (see API (Agent))
Related Agent Queries: these are other Agents in the chain that were called.
Guardrules: gives detail on whether any Guardrules were invoked (see Guardrules)
Scoring: gives further detail on the automated evaluation metrics.
User Feedback: shows any user feedback given (see How To Give Agent Feedback)
Review: allows you, as an evaluator, to tag and keep track of any issues once you've understood them
3.1 Diagnosing Knowledge
Within the "Agent Context (Source data)" section, you can delve into the chunks of knowledge that the Agent used to generate its Response.
You can:
Open up the chunks of knowledge that were retrieved.
Search through the chunks of knowledge that were retrieved.

This allows you to understand whether the Agent was wrong because (a) the Knowledge/Documents themselves are wrong, or (b) the Knowledge/Documents were right and something else is going wrong.
3.2 Diagnosing Other Agents
You can understand which Agents responded to a query and what their response was by clicking into the Agent under the "Related Agent Queries".

This will open up that Agent and you can diagnose issues in the same way.
Last updated

