Installing PostgreSQL in Preparation for Retrieval Augmented Generation

Installing PostgreSQL in Preparation for Retrieval Augmented Generation

In previous posts, we’ve learned about cosine similarity as a way to compare embeddings of text, actually putting that idea to work doing an actual semantic search on the Federalist papers, and later looked at using Haystack as a tool to build an Artificial Intelligence (AI) pipeline for use with a Large Language Model (LLM).

One big problem with a Cosine Similarity search is how computationally intensive it is. You can embed the text all in advance, but you still need to embed the query and then compare the results (using the Cosine Similarity algorithm) to each and every text entry. This quickly gets too slow to be useful. Moreover, the current implementation I built involved storing everything in memory, which quickly became unworkable.

What we really need is a database that will allow us to store the embeddings and run the cosine similarities for us.

Luckily, plain old PostgreSQL Database has an extension, called Pgvector, that does nearly everything we need! Pgvector is an extension for PostgreSQL that enhances its capabilities with vector similarity search. It allows you to create database tables that contain vector data and to query your vector data quickly. In plain English, Pgvector adds vector search capabilities to PostgreSQL which is what we need to be able to use PostgreSQL as a document store to store the embeddings we created out of our documents.

Better yet, Pgvector has built-in support in Haystack! So, we can create a document store in Haystack and use it in a Haystack pipeline while all the data is stored in PostgreSQL and Pgvector does the Cosine Similarity search. Better yet, the Cosine Similarity search will be much faster than the one I wrote because Pgvector doesn’t do its cosine similarity search against everything in the database but instead speeds the search up using a nearest neighbor search (using HSNW) first. This drastically improves the retrieval speed. I’ll do future blog posts explaining nearest neighbor search and HSNW. For now, all you need to know is that it is much faster than if you compare every single vector like I did in my previous posts.

In this post we’re going to install the PostgreSQL database for Windows. I’ll also give links for how to do this for other operating systems or using Docker. In a future post we’ll then install Pgvector and utilize it with Haystack to do Semantic Searches using the built-in Cosine Similarity capabilities.

Installing PostgreSQL for Windows

PostgreSQL is an open-source object-relational database. You can find the page to download and install PostgreSQL for any operating system on this link. For Windows, we’ll use this link here. Click “Download the Installer.” I’ll be using this link to install version 16.3, which is (for me at the time of this writing) the most up-to-date version: (Found here)

Screenshot of the download site, the Windows x86-64 download button is circled. In the list, it is the top option, the button is five collumns to the right.

After the download, run the installer and go through the install process:

Screenshot of the installer or setup wizard.

I took the default install folder of “C:\Program Files\PostgreSQL\16”

You definitely want to install pgAdmin4, but I went ahead and installed everything available:

The wizard has an option to select which components to install. PostgreSQL Server, pgAdmin 4, Stack Builder, and Command Line Tools are selected. These are the only options available.

I took the default data directory as well: “C:\Program Files\PostgreSQL\16\data”

You will then need to select a password:

The wizard has a box for a password and a second box to retype the password.

I also took the default port of: 5432. And I selected English, United States as the Locale.

I then went on to let the install Stack Builder:

The wizard has a drop-down menu, the option PostgreSQL 16 (x64) on port 5432 is selected.

I went ahead and installed some additional utilities and drivers, but this should be optional:

The wizard has a list of applications with checkboxes next to them to choose which to install. pgAgent (64 bit) for PsotgreSQL 16 v4.2.2-1, pgBouncer v 1.22.1-1, Npgsql v3.2.6-3 (installed), pgJDBC v42.7.2-1, pgsqlODBC (32 bit) v13.02.0000.1, and psqlODBC (64 bit) v13.02.0000.1 are selected.

I won’t go over the details of the rest since it was optional anyhow. Your installation of PostgreSQL should now be done.

Installing PostgreSQL for Other Operating Systems

If you need to install PostgreSQL for another operating system you may find this link helpful.

Alternatively, you could just install PostgreSQL using Docker. The Haystack Pgvector document store install guide suggests the following:

docker run -d -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=postgres ankane/pgvector
Testing Out PostgreSQL Using pgAdmin

You can test that PostgreSQL was properly installed by running pgAdmin which should have been installed with the PostGreSQL installation. For Windows search for “pgAdmin4.”

Windows search bar is open, pgAdmin 4 is being searched for and appears in the list of installed apps.

Note: I seem to have two versions of each app due to having previously installed an older version of PostgreSQL in the past. You’ll probably just have one.

Then click on “pgAdmin4” to run and you’ll see something like this:

The pgAdmin 4 main tab or screen is shown. It has a drop-down list on the left starting with Servers. On the right and center there is a tab that shows a general welcome page, a Quick Links page, and a Getting Started page. On the top left there are generic File, Object, Tools, and Help tabs.

Click the arrow next to “Servers” and then select the database you just installed:

The servers drop-down is opened showing PostgreSQL 13 and PostgreSQL 16. PostgreSQL 16 has an arrow pointing at it.

Enter the password you created:

A pop-up menu with the option to insert a password There is also a check box to save password.

You should then be able to navigate to the tables (under ‘Schema’). At this point you will not have any tables yet:

A page is loaded on the right, chosen under Servers, PostgreSQL 16, Databases (1), postgres, Schemas, public, Tables. The page shows various tables, including Database sessions, Transactions per second, Tuples in, Tuples out, Block I/O, and Database activity.

Multiple Ways to Check Your Version

Let’s go over several ways to check your version. It isn’t that we care that much about the version (though this is a great way to be sure things are installed correctly) so much as I want to introduce to you several useful tools.

First, let’s check version using the PSQL Tool built into pgAdmin. You can check this by going to Tools -> PSQL Tool and the PSQL tool will come up in a tab. Enter:

select version();

The PSQL tool page/tab is open showing a console like page. The version of PostgreSQL is shown, it is 16.3 compiled by Visual C++ 1938, 64-bit.

It should show the version you just installed.

Another way to accomplish this is to run PSQL Shell:

The windows search tab is open, psql is being searched for and SQL Shell (psql) is listed under installed apps. It is circled in red.

Once you’ve run the shell you’ll need to login. For most of the options you can take the default by hitting Enter. For password, enter your password:

SQL Shell appears like a console, showing information such as Server, Database, Port, and Username. It requires a password.

Now type select version();

The version is shown, it is 16.3 compiled by Visual C++ build 1938 64-bit.

It should show the correct version that you just installed.

Or you can accomplish the same thing by just using the pgAdmin Query Tool (under Tools -> Query Tool):

pgAdmin is back and on the postgres page/tab, select version(); is circled in red, the play button on the tools tab above the page is circled in red, and the version near the bottom is circled in red. There are instructions that tell you to follow those steps in order of putting in select version, clicking the play button, and then reviewing the resulting version.

Enter “select version();”, then click the little arrow for the run button to run the query. You should get back the version as the result.

All three of these are useful ways to interact with the PostgreSQL server you’ve created.

Note: If you don’t have a default ‘postgres’ database, this tutorial will explain how to create it.

Creating A Table and Inserting Data

Now let’s follow this little tutorial on how to create a table in PostgreSQL. You should already have the Query tool up, but recall you can get there by Tools -> Query Tool. In the query tool type:

CREATE TABLE tutorials (id int, tutorial_name text);

Then, click the Run button. Now you’ll need to refresh your view to see the new table by right mouse clicking on your Server:

The drop-down option of Postgres 16 is right clicked showing a pop-up menu, Refresh is the third option and is circled in red.

You should now be able to see the new table under ‘Schema’ -> ‘Tables’:

Schemas drop-down option is circled in red and is opened, public below it is opened, the option Tables is circled in red and opened, tutorials is opened and circled in red.

There is now a ‘tutorials’ table that you have created. Let’s try to insert some data into it and then query it back following this tutorial. First run this command in the query tool:

INSERT INTO tutorials(id, tutorial_name) VALUES (1, 'postgre');

Then select it back out of the table with this command:

SELECT * FROM tutorials

You should see this result:

The public drop-down option is selected and on the right page at the bottom under Data Output, the id is 1 and the tutorial_name text shows postgre. These two latter items are circled in red.

Congratulations! You have a working PostgreSQL database running in Windows (or other operating systems).

In future posts we’ll go over tutorials of how to install Pgvector, how to query using vector tables, how to integrate with Haystack, and how to build a semantic search that is much faster than the one we previously built.

Links

SHARE


comments powered by Disqus

Follow Us

Latest Posts

subscribe to our newsletter