Keeping Your AI Chatbot on a Leash: Tiers of Agency

Keeping Your AI Chatbot on a Leash: Tiers of Agency

Tiered Querying: The Story of How AI Karl Popper Knows When to Ask for Help

Imagine AI Karl Popper (or your own AI Agent) sitting at his virtual desk, faced with a user’s question. Does he immediately summon the full power of Gemini and ask it to scour the web? Not quite. Instead, he follows a series of ever‑stronger “escalations,” trying the cheapest, fastest trick first, and only calling in the big guns when simpler steps fail. Here’s the heart of that logic:

Tier 1: Raw search results from our Haystack datastore

First AI Karl will try to just work with the user's query directly and see if it gives him a good hit.

retrieved_docs, all_docs = self._doc_pipeline.generate_response(message)
max_score = self.get_max_score(retrieved_docs)

If one of those results earns at least 50 percent confidence, he feels comfortable using them directly.

Tier 2: Let Gemini Rewrite the Query

But if even the best match scores below 0.50, it’s a sign that the user’s phrasing might be muddy. Rather than immediately plunging into expensive LLM research, Karl slips in a polite request to Gemini: “Could you reword this question for me?” With a cleaner search phrase, he tries once more against the same archive and only adopts the new results if they clearly outperform the first batch.

For example, suppose someone asks AI Karl Popper something like this:

My grandma once went to go see Bozo the clown. He told my grandma that he believed in Induction. There was Bozo, with his big shoes and red nose and all of Bozo's friends were getting out of a tiny car. And then he whispered to my grandma "See, you can tell by the first clown that another will follow! So induction is correct! Hume was off base! Honk Honk" And so my grandma told my father who told my mother who told my hairdresser who told my sister who told me that Bozo was right! Induction is correct!

There is a legitimate question hidden there among a lot of ridiculous narrative. There is no way the semantic search is going to find much at all based on the user's query in this case. We offer something like this:

# If nothing scores above 0.50, ask the LLM to rewrite a clearer query
if max_score is not None and max_score < 0.50:
    improved_query = self.ask_llm_for_improved_query(message, gemini_chat_history)
    if improved_query:
        new_docs, _ = self._doc_pipeline.generate_response(improved_query)
        new_max = self.get_max_score(new_docs)
        if new_max > max_score + 0.05:
            retrieved_docs, max_score = new_docs, new_max

Tier 3: Still weak? (max_score < 0.30) Let Gemini itself pick the relevant quotes

Next, if the top score still hasn’t broken 0.30, Karl recognizes that neither the original nor the rewritten query is yielding solid matches. He switches tactics—from refining the question to refining the answers themselves. Here he asks Gemini to look at the low‑scoring candidates and pick out the handful of quotes it deems most relevant. If that succeeds, Karl moves forward with those hand‑picked snippets; if not, he applies a simple filter, discarding any quotes below a modest threshold (0.20 if there are at least three solid contenders, or a lower bar of 0.10 otherwise).

if max_score is not None and max_score < 0.30:
    response_text = self.ask_llm_for_quote_relevance(message, retrieved_docs)
    relevant_ids = [int(x) for x in response_text.split(',') if x.strip().isdigit()]
    ranked_docs = [doc for i, doc in enumerate(retrieved_docs, 1) if i in relevant_ids]
else:
    threshold = 0.20 if sum(d.score >= 0.20 for d in retrieved_docs) >= 3 else 0.10
    ranked_docs = [d for d in retrieved_docs if d.score >= threshold]

Tier 4: If we still lack at least five strong hits or max_score < 0.60,

Finally, even after these steps, there may still not be enough high‑quality material: fewer than five quotes or a top score under 0.60. In that case, Karl waves the white flag on simple retrieval and says, “Alright, Gemini, do your own research.” At this stage, the agent spins up a full ReAct‑style investigation—calling functions, mining external sources (including Wikipedia), and looping until a coherent answer emerges.

#    trigger a full LLM‑powered research sprint
if len(ranked_docs) < 5 or max_score < 0.60:
    research_response, research_docs = self.ask_llm_to_research(message)

By layering these tiers—raw search, query rewriting, quote relevance, and only then full research—AI Karl Popper strikes a balance between efficiency and depth. Most well‑phrased questions end in Tier 1 or Tier 2, saving precious API calls and tokens. Messier queries get gently guided back on track, and only the truly stubborn cases marshal the agent’s full scholarly might. This tiered approach ensures that every question gets the right level of attention, no more and no less.

SHARE


comments powered by Disqus

Follow Us

Latest Posts

subscribe to our newsletter