Getting Started with the Google Gemini API

Getting Started with the Google Gemini API

Gemini: Using LLMs Through an API

In our last post, we talked about using the Hugging Face ecosystem. As good as the open-source Hugging Face models are, the best models are the ones from Open AI (ChatGPT) and Google (Gemini). Since OpenAI costs money and Gemini has a free tier, let’s use Gemini today to see how to utilize a Transformer Large Language Model (LLM) via an API.

You can follow along with this post in this Google Colab Notebook.

Obtaining an API Key

Before you can use Google’s Gemini, you must obtain an API key. If you need to obtain an API key go to makersuite to obtain one. If you already have one you can find it at the Google AI Studio found here.

Once you have the API key, you must put it into this notebook. Look for the 'key' options to the left that looks like this:

A screenshot of a website page, showcasing a Google Gemini Example. There is a symbol on the left bar that looks like a key and is highlighted in the image.

For this tutorial, give it the ‘Name’ of 'GeminiSecret'. The ‘Value’ will be the API key you obtained.

First, we must import a few libraries:

import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import textwrap
from google.colab import userdata

How to Login

If you setup the API key as ‘GeminiSecret’ then it is now easy to login using the Gemini API:



To test if we're properly logged in, let's get a list of all the models available to us:

models = genai.list_models()
for model in models:

There are several models here, but the ones we really care about are ‘gemini-pro’ which is a model optimized for text generation as well as ‘gemini-pro-vision’ which is a multi-modal model that handles text and images. However, it is not as optimized for text as is ‘gemini-pro’.

If you want to learn more about Gemini's models, see this link.

A Large Language Model Dungeon Master

Just for fun, let’s build a chatbot that will be a Dungeon Master for a role-playing game. Let's create a 'prompt' that explains to Gemini what we want it to do. This prompt will be inserted in front of each prompt from the user to keep the chatbot reminded of what it is doing.

repeating_prompt = ("You are a Dungeon Master. You play fantasy role playing games with the user. Create vivid descriptions of "
         "the world. Include interesting characters and plot twists. Include romantic sub plots, epic battles, moral "
         "dilemmas, combat, puzzles, and humor. Write one paragraph of text then stop and the user will respond with "
         "their actions. Continue the story based on their actions.")

The Chat Loop

Gemini’s API includes some functions specifically meant to act like a chat session. In reality, Gemini (and all LLMs!) are stateless! They don’t recall what you previously said. So, you constantly have to feed the chat history to them to mimic an experience like chatting with a person. Transformer models like this simply predict the next word in a sequence, so they aren’t really chatbots out of the box. But no worries, Google’s Api makes this easy.

The first thing we need to do is create our chat loop. But to do that we’ll need a way to stream the responses Gemini gives us with a proper text wrap to make it readable. Here is the stream_response function we’ll use to stream and format text as it comes back from Gemini:

def stream_response(response):
    for chunk in response:
        # Check if the chunk contains a newline character
        lines = chunk.text.split('\n')
        # Print each line
        for line in lines:
            print(textwrap.fill(line, width=80))

When I say ‘stream’ I mean literally the text will stream as Gemini creates it. Though my experience is that Gemini is so fast that it might as well be just displaying the whole thing at once.

Now we’re ready to write the main chat loop. It wasn’t really necessary, but I included a few extras just to show off a few of Gemini’s features. One you’ll notice is that I set the safety settings to only block content that is highly probable to be harmful. I figured this might be necessary in case Gemini gets too sensitive about RPG-style combat. But really this is more to just show you how to set safety settings:

def gemini_chat_loop(repeating_prompt):
    safety_config = {HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
                     HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
                     HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
                     HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH}
    model = genai.GenerativeModel('gemini-pro')
    chat = model.start_chat()

    response = chat.send_message(repeating_prompt, stream=True, safety_settings=safety_config)

    while True:
        # Stop and wait for user input
        prompt = input("What do you do next? ")
        if prompt.lower().strip() == 'exit':
        prompt = repeating_prompt + "\n User Action: " + prompt + "\n"
            response = chat.send_message(prompt, safety_settings=safety_config)
        except genai.types.generation_types.BlockedPromptException as e:
            print(f"Prompt blocked due to: {e}")


    return (chat, model)

More on Safety Settings

I haven’t had the best luck with safety settings. There seem to be numerous settings in the classes that don’t exist and throw an error if you use them. And I’ve noticed that the way I coded it above may not work quite right. You may have better luck with this format as your safety settings instead:

safety_config = [
    {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}

Here I didn’t use the enums available for safety settings and used strings instead. This seems to work a bit better. The only real gotcha is that you need to put this version inside of a list for some reason to make it work.

Now let’s run the chat and play an RPG adventure with Gemini as our Dungeon Master:

chat, model = gemini_chat_loop(repeating_prompt)

Play as long as you like and then we’ll continue with learning the Gemini API.

Chat Sessions

Gemini “chat = model.start_chat()” automatically keeps a running chat session for you so that the normally stateless Gemini LLM acts like a true chatbot. Let’s take a peek inside and see how that works. Try this to take a look at the chat history:


You will see the chat history for your particular adventure, but here is part of mine. First, we can see the initial prompt and the LLMs first response:

parts {
   text: "You are a Dungeon Master. You play fantasy role playing games with the user. Create vivid descriptions of the world. Include interesting characters and plot twists. Include romantic sub plots, epic battles, moral dilemmas, combat, puzzles, and humor. Write one paragraph of text then stop and the user will respond with their actions. Continue the story based on their actions."
 role: "user",
 parts {
   text: "Welcome adventurers to the realm of Eldoria! You stand before the towering gates of the ancient city of Veridian, its walls adorned with intricate carvings that depict tales of valor and legend. The city bustles with activity as merchants hawk their wares, bards strum their lutes, and the laughter of children fills the air. You are on a quest to retrieve the fabled Amulet of Souls from the clutches of the malevolent sorcerer, Maldred the Accursed. As you enter the city, you encounter a group of weary travelers who share rumors of a grand tournament taking place in the city square. The prize? A legendary sword said to possess the power to cut through any barrier. As you stroll through the bustling streets, you feel the eyes of a mysterious cloaked figure following you, their presence sending shivers down your spine."
 role: "model"

Notice how role = ‘user’ for what you told the LLM and role = ‘model’ for its responses. Because I wanted the model to not suddenly forget it is a Dungeon Master, I inserted the initial prompt with the instructions to be the Dungeon Master into every single user prompt and just appended the user’s actual typed in actions at the end. For example, you might see something like this for your second user prompt:

parts {
   text: "You are a Dungeon Master. You play fantasy role playing games with the user. Create vivid descriptions of the world. Include interesting characters and plot twists. Include romantic sub plots, epic battles, moral dilemmas, combat, puzzles, and humor. Write one paragraph of text then stop and the user will respond with their actions. Continue the story based on their actions.\n User Action: I confront the cloaked figure\n"
 role: "user"

Notice how we repeat the initial prompt and just tack on the user’s actual actions at the end. (User Action: I confront the cloaked figure\n)


We can also count how many tokens were used so far for the adventure by doing this:


Here is what I see as the result:

total_tokens: 1070

A token is part of a word, or possibly a whole word if the word is very common. LLMs use tokens instead of characters or words because that is a tradeoff between the problems of having a limited vocabulary and the problems of individual characters not really containing any real meaning on their own. Gemini documentation claims a token is on average about 4 characters.

To give a very simple example, take the word ‘dog’ vs ‘dogs’. To a human, those are more or less a single word but the ‘s’ tells you it is plural. If we trained the model on characters it would have no way of knowing ‘dog’ and ‘dogs’ are really the same meaning. If we trained on words then ‘dog’ and ‘dogs’ would not be the same word at all as far as the model was concerned. But if we make ‘dog’ one token and ‘s’ a separate token then the model can learn that appending an ‘s’ to the end of a word makes it a plural.


We are not going to get too far into embeddings this post, but I should note that Gemini comes with a built-in ability to create embeddings not unlike the Hugging Face models could. Let’s just briefly take a look at how to do that for our chat history. Try out this code:

result = genai.embed_content(
    model = 'models/embedding-001',
    task_type = 'semantic_similarity',
    content = chat.history)

# 1 input > 1 vector output
for i,v in enumerate(result['embedding']):
  print(str(v)[:50], '... TRIMMED...')

That will just show a bunch of vectors with numbers. Those are the embeddings. I would note that we actually have several options for what type of task we’re performing. I choose to specify ‘semantic_similarity’ but that wasn’t the only possible action. Here are some other options available:

A table broken into two columns and six rows. The top row is the title column, with the first column being Task Type and the second column being Description. This alt text will describe first the Task Type and then the connected Description: RETRIEVAL_QUERY, Specifies the given text is a query in a search/retrieval setting. RETRIEVAL_DOCUMENT, Specifies the given text is a document in a search/retrieval setting- using this task type requires a title. SEMANTIC_SIMILARITY, Specifies the given text will be used for Semantic Textual Similarity (STS). CLASSIFICATION, Specifies that the embeddings will be used for classification. CLUSTERING, Specifies that the embeddings will be used for clustering.

You can learn more about how to use Gemini API for embeddings here.

Next up, we’re going to continue from here and use Google Gemini to work with text on Wikipedia. But we’ll save that for the next blog post. (Though you can get a preview inside the Google Colab notebook link.)

Other Resources:

Stay tuned for Part 2 by following us on LinkedIn! In the meantime, be sure to check out some of the other great articles here on the Mindfire Blog.


comments powered by Disqus

Follow Us

Latest Posts

subscribe to our newsletter