Google AI Integration with Haystack
- By Bruce Nielson
- ML & AI Specialist
Artificial Intelligence and Large Language Models like ChatGPT can be quite expensive, even when you are paying a small amount per token it can still add up to big dollars very quickly. Mindfire TECH is committed to providing cost efficient Artificial Intelligence solutions to our customers, and in today’s blog we’ll explore one of the ways to get into AI without breaking the bank. The solution: Google Gemini.
Google’s Gemini
Google Gemini is a competitor to ChatGPT provided by Google. One very nice aspect of Gemini is that they (at least currently as of this writing) have a rate-limited free tier. I don’t know how long Google will keep up their free tier, but Google’s Gemini is an important tool in the toolbox for those that need cost efficient AI solutions.
There is a good chance you are already familiar with Google’s Gemini due to it being integrated into Google’s search engine. What you may not know is that Google’s Gemini Large Language Model (LLM) has an api that allows you to use Gemini in your own applications. (See Google’s blog post introducing Gemini models here).
You can find Google’s API reference for Gemini here. Instructions for how to get an API key for Gemini are found here. Getting started docs for Gemini are found here.
Gemini is a set of models that are multi-modal, meaning they can (collectively) handle both text and images. The performance of Gemini has been comparable to ChatGPT.
Haystack’s Problematic Integration with Google’s Gemini
Unfortunately, as far back on April of 2024 I wrote a blog post that reported there was a problem with the Google AI integration for Haystack, i.e. the GoogleAIGeminiGenerator component simply didn’t work correctly even if you copy and pasted their sample code from Haystack’s own Google AI documentation. I mentioned I posted on Stack Overflow to see if anyone else could get the Haystack code samples to work and I was told by someone from Deepset (the makers of Haystack) that there was a bug in the GoogleAIGeminiGenerator component.
Today, however, I’m happy to report that this bug has been fixed! Or, at least to the point where I can now run the basic examples on their documentation after fixing some of the bugs in their code. (More on this below).
Testing (and fixing) the Haystack Code Samples for the GoogleAIGeminiGenerator Component
Sadly, the examples still have bugs in them that keep some of the code samples from working. But those are minor problems that are easy enough to fix. I created a Google Colab that goes over the Haystack Gemini code samples in both the of the pages that describe how to use GoogleAIGeminiGenerator. (As linked above). I’ve fixed each of these code examples to work and proved the original bug is now fixed.
Implementing a Gemini Large Language Model for RAG
Now that the GoogleAIGeminiGenerator is working, it’s time to give Gemini integration with Haystack another test run to as a cheaper way of doing Retrieval Augmented Generation (RAG) example code built in our past posts.
If you need to setup your environment for this example, this post explains how to do that. The starter code we’ll be adding to is found here. This whole post is built by modifying that starter code. The previous two posts explained how the starter code (HaystackPgvector class) works. (Part 2).
Let’s start by installing the Google AI module for Haystack:
pip install google-ai-haystack
This is, of course, on top of everything else installed in the environment setup post. I tested to be sure that this does not cause a problem for the Hugging Face generator component that we’re already using. (i.e. you should be able to run HaystackPgvector class with both Gemini and Hugging Face after the modifications I describe).
I wrote the code for the HaystackPgvector class under the assumption that I would be using Hugging Face ecosystem with this class. So, the parameters for the class sort of assume that is the case. This means we really need to rewrite my code to abstract away the details of the Large Language Model (LLM) such that a Hugging Face LLM and Gemini look identical from the point of view of the HaystackPgvector class. But that would be a lot more complicated than we want for a short blog post like this. So, I’m going to just kludge it for today. (I’ll likely rewrite this in a better way later).
Updating HaystackPgvevctor Class to Use Gemini
Let’s go over the (minimal) changes I’ve made to the HaystackPgvector class to get it to work with Gemini instead of a Hugging Face model. Here is the revised RAG Pipeline we’re going to build:
First, adjust this import to include the Union component because we’re going to need it:
from typing import List, Optional, Dict, Any, Tuple, Union
I decided to rename the ‘hugging face password’ (i.e. hfpassword) to instead be ‘llmpassword’. I did this to show that the password may or may not be specific to Hugging Face going forward.
Next, we need to adjust the login in to Hugging Face so that it won’t be called if we’re using Google’s Gemini:
if llm_password is not None and llm_model_name != "google-gemini":
hf_hub.login(llm_password, add_to_git_credential=False)
You can probably see that I’m going to specify I want to use Google Gemini by passing in a model name of “google-gemini”. This is basically a ‘magic number’ but let’s not worry about this for now for this simple demo. I’ll improve it later.
Next, we’re going to instantiate the Google Gemini generator instead of the Hugging Face generator if the user specifies they are using the Gemini mode (i.e. “google-gemini”):
if self._llm_model_name == "google-gemini":
self._llm_generator: GoogleAIGeminiGenerator = GoogleAIGeminiGenerator(
model="gemini-pro",
api_key=Secret.from_token(llm_password)
)
else:
self._llm_generator: HuggingFaceLocalGenerator = HuggingFaceLocalGenerator(
model=self._llm_model_name,
task="text-generation",
device=ComponentDevice(self._component_device),
generation_kwargs={
"max_new_tokens": self._max_new_tokens,
"temperature": self._temperature,
"do_sample": True,
})
self._llm_generator.warm_up()
The llm_context_length and llm_embed_dims properties will not work properly with Gemini so we’ll just have them return “None” if we’re using Gemini:
@property
def llm_context_length(self) -> Optional[int]:
if self._llm_model_name == "google-gemini":
return None
return HaystackPgvector._get_context_length(self._llm_model_name)
@property
def llm_embed_dims(self) -> Optional[int]:
if self._llm_model_name == "google-gemini":
return None
return HaystackPgvector._get_embedding_dimensions(self._llm_model_name)
Now comes the tricky parts. Gemini doesn’t return results in the same format as Hugging Face. So we need to improve our custom merger component to handle both the Hugging Face format and Gemini’s. So, change it to look like this:
class _MergeResults:
@component.output_types(merged_results=Dict[str, Any])
def run(self, documents: List[Document], replies: List[Union[str, Dict[str, str]]]) -> Dict[str, Dict[str, Any]]:
return {
"merged_results": {
"documents": documents,
"replies": replies
}
}
In this change ‘replies’ went from List[str] to List[Union[str, Dict[str, str]]]. This allows the merger to work with either type of generator.
Next, we have to adjust the RAG pipeline to connect properly for how the Gemini Generator works:
def _create_rag_pipeline(self) -> None:
prompt_builder: PromptBuilder = PromptBuilder(template=self._prompt_template)
rag_pipeline: Pipeline = Pipeline()
# Use Cuda is possible
rag_pipeline.add_component("query_embedder", SentenceTransformersTextEmbedder())
rag_pipeline.add_component("retriever", PgvectorEmbeddingRetriever(document_store=self._document_store, top_k=5))
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", self._llm_generator)
# Add a new component to merge results
rag_pipeline.add_component("merger", self._MergeResults())
rag_pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
if self._llm_model_name == "google-gemini":
rag_pipeline.connect("prompt_builder", "llm")
else:
rag_pipeline.connect("prompt_builder.prompt", "llm.prompt")
# Connect the retriever and llm to the merger
rag_pipeline.connect("retriever.documents", "merger.documents")
rag_pipeline.connect("llm.replies", "merger.replies")
self._rag_pipeline = rag_pipeline
Here is the main change:
if self._llm_model_name == "google-gemini":
rag_pipeline.connect("prompt_builder", "llm")
else:
rag_pipeline.connect("prompt_builder.prompt", "llm.prompt")
Gemini doesn’t pass a specific property called ‘prompt’ like Hugging Face does.
And, finally, I rewrote ‘main’ as follows:
def main() -> None:
# secret: str = HaystackPgvector.get_secret(r'D:\Documents\Secrets\huggingface_secret.txt')
secret: str = HaystackPgvector.get_secret(r'D:\Documents\Secrets\gemini_secret.txt')
epub_file_path: str = "Federalist Papers.epub"
rag_processor: HaystackPgvector = HaystackPgvector(table_name="federalist_papers",
recreate_table=False,
book_file_path=epub_file_path,
llm_model_name="google-gemini",
llm_password=secret)
# Draw images of the pipelines
rag_processor.draw_pipelines()
print("LLM Embedder Dims: " + str(rag_processor.llm_embed_dims))
print("LLM Context Length: " + str(rag_processor.llm_context_length))
print("Sentence Embedder Dims: " + str(rag_processor.sentence_embed_dims))
print("Sentence Embedder Context Length: " + str(rag_processor.sentence_context_length))
query: str = "What is the difference between a republic and a democracy?"
rag_processor.generate_response(query)
I’m now pulling a different file to get the secret password for Google Gemini instead of the Hugging Face one. And, obviously, I’m specifying “google-gemini” as the model – which as you just saw changes how the code works to work with Gemini instead.
That’s really all there is to get my HaystackPgvector class to work with Gemini! You can find a completed version of the code from this blog post in my git hub repo. And I keep the most up-to-date version of the free HaystackPgvector class in this repo found here.
Results
Let’s run our RAG query on the Federalist papers using Gemini as the LLM and see how it performs. Here are the results of one of my runs: LLM's Response:
“The main difference between a republic and a democracy is the method of exercising government power. In a democracy, citizens directly participate in decision-making and exercise government power, while in a republic, citizens elect representatives to exercise government power on their behalf.<end_of_turn>”
This is a pretty good response, and you may also notice that the response came back much faster than with a Hugging Face local model. Or at least it did on my not very good laptop. If you have a powerful GPU it may have been comparable speeds.
Conclusions
Google’s Gemini is an important tool in our AI toolbox especially if you are cost constrained. At Mindfire Technology we have used Google’s Gemini as an LLM in some of our small-scale products, but it was inconvenient that there was no working Haystack integration. We’re excited that Deepset has now fixed the problem. In this post, we provided an update to our free HaystackPgvector class to demonstrate how to use Gemini in a RAG pipeline.