Reformulation	What it actually retrieves
"luxury hotels in Tokyo"	drifts toward expensive — wrong for most users
"highly rated hotels in Tokyo"	usually most useful
"popular hotels in Tokyo"	drifts toward tourist traps
"hotels near Tokyo Disneyland"	drifts away from the actual intent

Per-query, online, from reranker feedback:
Which relevance signals to trust → ORE
Which reformulations to keep → ReformIR
Which neighbors to expand → QUAM, SUNAR

Avishek Anand · TU Delft

References (cont.)

[9] Yoon, Kim, Kwon, Anand, Hwang. On Listwise Reranking for Corpus Feedback. WSDM 2026, pp. 1273–1277. DOI: 10.1145/3773966.3779404.

[10] Anand, Saha, Venktesh V. Explainable Information Retrieval. ECIR 2025, pp. 254–261. DOI: 10.1007/978-3-031-88720-8_40.

[11] Chungkham, Venktesh V, Setty, Anand. Think Right, Not More: Test-Time Scaling for Numerical Claim Verification. Findings of ACL: EMNLP 2025, pp. 24345–24363.

[12] Heuss, de Rijke, Anand. RankingSHAP — Faithful Listwise Feature Attribution Explanations for Ranking Models. SIGIR 2025, pp. 381–391. DOI: 10.1145/3726302.3729971.

[13] Nanhekhan, Venktesh V, Martin, Vatndal, Setty, Anand. FlashCheck: Exploration of Efficient Evidence Retrieval for Fast Fact-Checking. ECIR 2025, pp. 385–399. DOI: 10.1007/978-3-031-88717-8_28.

[14] Saha, Agarwal, Venktesh V, Anand et al. ir_explain: A Python Library of Explainable IR Methods. SIGIR 2025, pp. 3563–3572. DOI: 10.1145/3726302.3730343.

[15] Venktesh V, Rathee, Anand. SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA. NAACL 2025, pp. 5818–5835. DOI: 10.18653/V1/2025.NAACL-LONG.300.

[16] Wallat, Heuss, de Rijke, Anand. Correctness is not Faithfulness in Retrieval Augmented Generation Attributions. ICTIR 2025, pp. 22–32. DOI: 10.1145/3731120.3744592.

Towards Self-Improving Retrieval Augmented Systems

Avishek Anand · TU Delft

My research — retrieval-augmented AI systems

I'm known for Explainable IR

But I have a split personality

What goes wrong in production RAG today

Failure 2 — The drift problem

Reformulating "good hotels in Tokyo"

Failure 3 — The cost wall

The principle

The power of feedback

The reranker is a better estimator of relevance

Different queries need different signals

Algorithm 3 — ORE — Online Relevance Estimation

But signals aren't the only thing we can pick per query

ReformIR — picking reformulations per query

ReformIR — same idea as ORE, one level up

This is the paradigm shift

Bandits give us the language for all of it

The science: subset selection under uncertainty

Explainable Information Retrieval

Ways in which you can use explanations

EXPLORA — Choosing Explanations

CASE — what EXPLORA leaves open

Explain-and-predict isn't always perfect

RAG faithfulness isn't perfect either

Towards AutoIR

Personas — generating behavior, not labels

Where this leaves us

References

References (cont.)

Thats it !!!

First-generation RAG — where it breaks

CASE — what EXPLORA leaves open

EXPLORA — learning to score subsets

Algorithm 1 — QUAM

Towards Self-Improving Retrieval Augmented Systems

Avishek Anand · TU Delft

My research — retrieval-augmented AI systems

I'm known for Explainable IR

But I have a split personality

What goes wrong in production RAG today

Failure 2 — The drift problem

Reformulating "good hotels in Tokyo"

Failure 3 — The cost wall

What these failures share

The principle

The power of feedback

The reranker is a better estimator of relevance

Different queries need different signals

Algorithm 3 — ORE — Online Relevance Estimation

But signals aren't the only thing we can pick per query

ReformIR — picking reformulations per query

ReformIR — same idea as ORE, one level up

This is the paradigm shift

Bandits give us the language for all of it

The science: subset selection under uncertainty

Explainable Information Retrieval

Ways in which you can use explanations

EXPLORA — Choosing Explanations

CASE — what EXPLORA leaves open

Explain-and-predict isn't always perfect

RAG faithfulness isn't perfect either

Towards AutoIR

Personas — generating behavior, not labels

Where this leaves us

References

References (cont.)

Thats it !!!

First-generation RAG — where it breaks

CASE — what EXPLORA leaves open

EXPLORA — learning to score subsets

Algorithm 1 — QUAM