AI Tutorial: Function Calling for a ReAct Agent

AI Tutorial: Function Calling for a ReAct Agent

In a previous post, we looked at how to give your Large Language Model (LLM) a bit of agency. In that post, we only allowed it to take the user’s query and make it concise so that it had a realistic chance of finding a relevant document in its datastore. That gave the model a tiny bit of agency, but can we take that idea and expand on it?

In fact, we can! In this post we’ll start to look at a (once) popular form of AI Agent built with an LLM called the ReAct (Reason + Act) framework. The ReAct framework comes from a famous 2022 paper. It turns your LLM into a true agent that can make choices and even call functions to accomplish something. For my demo, this will be querying Wikipedia for more specific information to find an answer to the user’s query. But it could also call an API to get traffic or weather information or even to automatically turn something on or off.

For this post, we’ll give an overview of the ReAct framework and explain how it works and then go over how to do function calling, where we allow the LLM to call functions we give it. In the next post we’ll build an actual ReAct agent using what we learned.

A Brief Overview of ReAct

In a nutshell, ReAct works by telling the LLM that it has a set of options available to it and instructing it how to utilize the knowledge it obtains via its actions. It might be easier to show an example of how it works.

Suppose you wanted to ask your LLM “Who was the mayor of Reykjavik in 2015 and what political party did they represent?” Even if your LLM doesn’t know the answer to that question (it might!) that information should be available on Wikipedia. We’ll instruct the LLM to alternate between Thoughts (where it thinks about how to approach the problem), Actions (where it calls a function to take an action), and Observations (where it summarizes what it found via the action in preparation for the next Though).

The LLM has been given 3 functions it can call as an Action: <search>, <lookup>, and <answer>.

Search tries to find a specific Wikipedia page. If it fails to find that page, it returns a list of similar pages. Lookup allows the LLM to search through a page for specific keywords. Answer is the LLMs way of giving the answer back to the user.

So here is how the LLM might go about discovering the desired information:

Thought 1: Okay, here's my plan to answer the question:
1.  **Search for "Mayor of Reykjavik 2015"** to find the relevant Wikipedia page.
2.  **Lookup the name of the mayor** within the search result.
3.  **Lookup the political party** of the mayor.
4.  **Answer** with the mayor's name and political party.
Now, let's execute the plan:

Action 1: <search>Mayor of Reykjavik 2015</search>

Observation 1: Could not find ["Mayor of Reykjavik 2015"]. Similar: ['Reykjavík', 'Timeline of Reykjavík', 'Jón Gnarr', 'Heiða Björg Hilmisdóttir', 'Reykjavík Mosque', 'Árni Sigfússon', 'Hanna Birna Kristjánsdóttir', 'Einar Örn Benediktsson', 'List of people from Reykjavík', 'The Settlement Exhibition']. You should search for one of those instead.

So, for the first iteration, it came up with a plan, tried to search based on its plan, and failed to find what it was hoping for. But Wikipedia did give back a list of similar pages to search. The LLM then continues:

Thought 2: Okay, the search for "Mayor of Reykjavik 2015" was not successful. I will try searching for "Reykjavík" and then look for information about the mayor in 2015.

Action 2: <search>Reykjavík</search>
Information Source: https://en.wikipedia.org/wiki/Reykjav%C3%ADk

Observation 2: Reykjavík is the capital of, and largest city in Iceland. It is located in southwestern Iceland, on the southern shore of Faxaflói Bay. With a latitude of 64°08′ N, the city is the world's northernmost capital of a sovereign state.

The ReAct Agent decided to look up the city itself on Wikipedia and managed to find the page. Now it needs to find the required information on the page. This takes a few tries:

Thought 3: Okay, I have the page for Reykjavik. Now I need to find who was the mayor in 2015.

Action 3: <lookup>mayor</lookup>
Result: ields the executive rights along with the City Mayor. The City Mayor is the senior public official and also the director of city operations. Other public officials control city institutions under the mayor's authority. Thus, the administration consists of two different parts:  The political power of City Council cascading down to other boards Public officials under the authority of the city mayor who a
Information Source: https://en.wikipedia.org/wiki/Reykjav%C3%ADk

Observation 3: ields the executive rights along with the City Mayor. The City Mayor is the senior public official and also the director of city operations. Other public officials control city institutions under the mayor's authority. Thus, the administration consists of two different parts:  The political power of City Council cascading down to other boards Public officials under the authority of the city mayor who a


Thought 4: The lookup for "mayor" is not giving me the specific name and year I need. I will try a more specific search using the year "2015".

Action 4: <lookup>2015</lookup>
Result: Reykjavík is the capital of, and largest city in Iceland. It is located in southwestern Iceland, on the southern shore of Faxaflói Bay. With a latitude of 64°08′ N, the city is the world's northernmost c
Information Source: https://en.wikipedia.org/wiki/Reykjav%C3%ADk

Observation 4: Reykjavík is the capital of, and largest city in Iceland. It is located in southwestern Iceland, on the southern shore of Faxaflói Bay. With a latitude of 64°08′ N, the city is the world's northernmost c

That was a dead end, so the ReAct agent tries another approach:

Thought 5: Still no luck. I'll try searching for a list of mayors of Reykjavik.

Action 5: <search>List of mayors of Reykjavík</search>
Information Source: https://en.wikipedia.org/wiki/Mayor_of_Reykjav%C3%ADk

Observation 5: The post of Mayor of Reykjavík (Icelandic: Borgarstjóri Reykjavíkur) was created in 1907 and advertised in 1908. Páll Einarsson and Knud Zimsen applied for the job and Páll got the position for a period of six years, at the end of which he did not wish to renew his tenure.

Thought 6: Okay, I have the page for the list of mayors. Now I need to find the mayor in 2015.

Action 6: <lookup>2015</lookup>
Result: The post of Mayor of Reykjavík (Icelandic: Borgarstjóri Reykjavíkur) was created in 1907 and advertised in 1908. Páll Einarsson and Knud Zimsen applied for the job and Páll got the position for a period 
Information Source: https://en.wikipedia.org/wiki/Mayor_of_Reykjav%C3%ADk

Observation 6: The post of Mayor of Reykjavík (Icelandic: Borgarstjóri Reykjavíkur) was created in 1907 and advertised in 1908. Páll Einarsson and Knud Zimsen applied for the job and Páll got the position for a period 

Still nothing. So the LLM tries another name from its history (though it wasn’t visible on one of the thoughts):

Thought 7: Still not providing the answer. I will try searching for the mayor's name directly, assuming it might be Dagur B. Eggertsson based on a previous search.

Action 7: <search>Dagur B. Eggertsson</search>
Information Source: https://en.wikipedia.org/wiki/Dagur_B._Eggertsson

Observation 7: Here are the first two sentences from the provided text:
Dagur Bergþóruson Eggertsson (born 19 June 1972) is an Icelandic politician who was the Mayor of Reykjavík from 2007 to 2008 and again from 2014 to 2024. He was the vice-chairman of the Social Democratic Alliance from 2009 until 2013.

At last, we found the answer! Now the LLM is ready to give the answer:

Thought 8: Okay, the search result confirms that Dagur Bergþóruson Eggertsson was the Mayor of Reykjavík in 2015. The text also states that he was associated with the Social Democratic Alliance.

Action 8: <answer>Dagur B. Eggertsson was the mayor of Reykjavik in 2015. He represented the Social Democratic Alliance.</answer>
Information Sources: ['https://en.wikipedia.org/wiki/Reykjav%C3%ADk', 'https://en.wikipedia.org/wiki/Mayor_of_Reykjav%C3%ADk', 'https://en.wikipedia.org/wiki/Dagur_B._Eggertsson']

Final Answer: Dagur B. Eggertsson was the mayor of Reykjavik in 2015. He represented the Social Democratic Alliance.

That is the correct answer! So, the ReAct agent was able to track down the answer and return it to us using Wikipedia as a resource.

Notice how it alternated between a Thought, an Action, and then an Observation of what information came back from the Action. Note also how it tracks the Wikipedia pages it used to find the answer to source its findings.

What is Function Calling?

To give the ReAct agent the ability to take actions requires that we make functions available for it to call. How do we do that? Isn’t an LLM just a text generation engine? How can it call a function?

In the early days of LLMs they did not have the ability to call functions. But you could tell them to write out a specific sort of tag that you could then parse yourself. That tag might contain the name of the function to be called. You’re then responsible to actually call the function with the parameters the LLM said to use. If you want to see how this was done, I wrote some code the old-style way so that you can see how it was done. Here is the relevant bit:

# Extract the function name (e.g., search, lookup, answer) from the response.
cmd = re.search(r'<(\w+)>', response_cmd).group(1)
# Extract the parameter after the function tag.
query = response_cmd.split(f'<{cmd}>')[-1].strip()

# Dynamically call the function based on the extracted command.
observation = self.__getattribute__(cmd)(query)

Note how I get a response back from the LLM (response_cmd) and we search for a tag include of < and > and slice out the name of the tag. We then grab the parameter next. Finally, we dynamically call a function by that name.

Function Calling (That Actually Works) for Gemini

But this is maybe not the most convenient. So, they started to give Large Language Models a built-in ability to call functions that you pass to them. Google’s Gemini includes built-in function calling, so let’s use that instead. (Note, that even what I’m about to show you is already out of date. This is a fast-moving field.)

Unfortunately – and this seems to be a reoccurring theme for my blog posts – the Google instructions for how to do function calling are out of date and don’t work. But no worries, I figured out the (current) best way to make it work.

Let’s take the ‘answer’ function as a simple example:

def answer(self, final_answer: str) -> Dict[str, str]:
    self.should_continue_prompting = False
    sources: str = ", ".join(self._search_urls) if self._search_urls else "None"
    result: str = f"Final Answer: {final_answer}\nInformation Sources: {sources}"
    print(f"Information Sources: {self._search_urls}")
    return {"result": result}

This function let’s the React class know that it’s now given an answer and needs to exist (should_continue_prompting = False) then creates a final answer string (that includes a list of sources) and returns it.

But how do we get Gemini to call it? We need to create a function declaration like this:

# Function declaration for providing the final answer.
answer_declaration: Dict[str, Any] = {
    "name": "answer",
    "description": (
        "Provides the final answer to the question and ends the conversation, while listing "
        "all the used information sources."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "final_answer": {
                "type": "string",
                "description": "The final answer produced by the model."
            }
        },
        "required": ["final_answer"],
    },
}

This is a dictionary that describes the function. Gemini will actually use these instruction to know how to call a function. It will ‘call’ that function by passing back a function call as part of the return object.

We then need to create a set of tools for Gemini to use:

# Define the tools with our function declarations
self.tools = [
    Tool(
        function_declarations=[
            FunctionDeclaration(**search_declaration),
            FunctionDeclaration(**lookup_declaration),
            FunctionDeclaration(**answer_declaration),
        ]
    )
]

Then we will pass the tools in as part of the Gemini send_message() function call.

return self.chat.send_message(message,
                              generation_config=config,
                              tools=self.tools)

That will return a response that we’ll then handle like this:

candidate = response.candidates[0]
text_parts: List[str] = []
func_call = None

for part in candidate.content.parts:
    if getattr(part, 'text', None):
        text_parts.append(part.text)
    if getattr(part, 'function_call', None):
        func_call = part.function_call

if text_parts:
    print(f"\nThought {iteration}: {' '.join(text_parts)}")

if func_call:
    name: str = func_call.name
    args: Dict[str, Any] = func_call.args or {}
    print(f"\nAction {iteration}: <{name}>{list(args.values())[0] if args else ''}</{name}>")

    try:
        method = getattr(self, name)
        # Dynamically call the method with the provided arguments
        result: Dict[str, str] = method(**args)
        if name == "answer":
            return result['result']
    except AttributeError:
        result = {"result": f"Unknown function: {name}"}

Note that ‘response’ has a ‘candidate’ attribute. That is what we need. From there we take ‘content’ and then ‘parts’. We iterate over the parts which will consist of either ‘text’ or ‘func_call’. That last will contain the function that Gemini is trying to call along with the ‘arg’ parameters. Finally, we call the function dynamically:

        result: Dict[str, str] = method(**args)

That will invoke the answer method that we wrote.

What is particularly cool about this is that we don’t have to give detailed instructions to in our prompt to Gemini because a description of what the answer does exists in the function declaration dictionary we created.

You can find my full code here.

Conclusions

So that is, at a high level, what a ReAct agent is and how to do function calling for Gemini. In the next post we’ll see how to use these ideas to create the actual ReAct agent.

Links

SHARE


comments powered by Disqus

Follow Us

Latest Posts

subscribe to our newsletter