Multi-Hop & Agentic Retrieval
Seer supports logging and evaluating multi-step retrieval workflows, from decomposed queries to agentic RAG patterns.Overview
Many real-world queries can’t be answered with a single retrieval. Consider:“What awards did the director of Inception win?”This requires:
- First, find who directed Inception → Christopher Nolan
- Then, find what awards Christopher Nolan won
Key Fields
task: The Original Query
Always pass the original user query in task. This is what Seer evaluates against for end-to-end relevance.
subquery: The Decomposed Question
The subquery is what this specific retrieval hop is trying to answer. A query rewriter or planner typically generates these.
is_final_context: Final Evidence for the LLM
Mark the retrieval step whose context is passed to the LLM or agent for final answer synthesis. Seer uses this span for trace-level metrics.
Complete Example: Query Decomposition
What Seer Evaluates
For each hop, Seer computes:| Evaluation | Against | Purpose |
|---|---|---|
| Task Recall | Original task | Is this hop contributing to the end goal? |
| Subquery Recall | subquery | Did this hop answer its specific question? |
Example Metrics
| Span | Subquery | Subquery Recall | Task Recall |
|---|---|---|---|
| Hop 1 | ”Who directed Inception?“ | 100% (found Nolan) | 50% (partial answer) |
| Hop 2 | ”What awards has Nolan won?“ | 100% (found awards) | 80% (most of answer) |
| Final Context | — | — | 100% (complete) |
is_final_context=True span (the joined context).
Trace-Level vs Span-Level Metrics
| Metric Level | Scope | Use Case |
|---|---|---|
| Span-level | Individual retrieval step | Debug which hop failed |
| Trace-level | Final context | End-to-end quality for the user |
Trace-Based Sampling
When you provide atrace_id (auto-detected from OTEL), Seer ensures all spans in the trace get the same sampling decision. You’ll never see partial traces.
Agentic RAG Patterns
For agent loops where the number of retrievals is dynamic:More Examples
Parallel Retrieval
When you search multiple sources in parallel:Iterative Refinement
When you re-retrieve based on LLM feedback:Best Practices
1. Always Set is_final_context for the Last Hop
This enables trace-level metrics that reflect end-user experience:
2. Keep task Consistent Across Hops
The original query should stay the same. That’s what you’re ultimately trying to answer:
3. Use Subqueries for Decomposition
Subqueries help diagnose which step failed:4. Use Consistent Span Names
| Pattern | Span Name |
|---|---|
| Sequential hops | retrieval_hop_1, retrieval_hop_2 |
| Parallel sources | retrieval_wiki, retrieval_kb |
| Agent iterations | agent_retrieval_0, agent_retrieval_1 |
| Final merged | final_context |