Category/Tag: Tutorials

Pulling It All Together: Docling for Loading PDFs

Pulling It All Together: Docling for Loading PDFs

Over the course of the last few weeks, we have been releasing a number of articles on Docling and how it can be used to improve your AI tools. In this article, we will be putting together everything we have learned so far so you can start converting and loading your PDFs into machine readable text.


Finding Paragraphs in PDFs - Using IBM’s Docling

Finding Paragraphs in PDFs - Using IBM’s Docling

IBM's Docling is a fantastic tool and resource that allows for easier conversions of PDF documents into machine-readable text. However, sometimes it can fall short when it comes to grouping text into paragraphs. In this article, we'll discuss how we can better improve our text conversion process so we can keep our paragraphs together.


IBM’s Docling for Superior Text Loading from PDFs

IBM’s Docling for Superior Text Loading from PDFs

Converting documents such as PDFs into clean and accurate machine-readable text can sometimes be more difficult than expected. This is why we use IBM's Docling to help us with our PDF to Markdown conversions. In today's article, we'll be talking a bit more about how to use Docling and why it should always be a consideration for your AI toolbox.


Using NLTK to Improve RAG (Retrieval Augmented Generation) Text Quality

Using NLTK to Improve RAG (Retrieval Augmented Generation) Text Quality

Last week, we discussed Docling and how it can easily allow you to turn your PDFs and other documents into machine-readable text. Sometimes, this conversion can result in errors like broken lines or misplaced hyphens. In today's tutorial, Bruce Nielson will walk us through NLTK, a library and tool that will help us automatically fix those mistakes.


Docling for PDF to Markdown Conversion

Docling for PDF to Markdown Conversion

Docling is IBM's open-source library for reading popular document formats (such as PDF) and exporting them into Markdown. In this article, we'll be looking at how to use it and how it performs compared to other options.


Llama.cpp for Large Language Models

Llama.cpp for Large Language Models

In a previous article, our in-house expert on AI, Bruce Nielson, explained how to set up a LLM using Ollama. This week, we'll be going over Llama.cpp, a similar system to Ollama but running on C++ and using some very efficient techniques to improve performance.


Explaining DeepSeek R1 (and How to Use It)

Explaining DeepSeek R1 (and How to Use It)

China's new AI, DeepSeek R1, is causing all sorts of upset in the AI market right now, and so we've decided to break down exactly what it is, why it's causing such an uproar, and how you can try it out for yourself.


Installing Ollama for Large Language Models (LLM) in Windows

Installing Ollama for Large Language Models (LLM) in Windows

Interested in getting your own LLM up and running? Ollama, an alternative to software like LM Studio, is a relatively easy and simple tool to get started with. In this article, we'll be going through how to get it set up on Windows.


Reranking Documents Using Cross-Encoders for Retrieval Augmented Generation (RAG)

Reranking Documents Using Cross-Encoders for Retrieval Augmented Generation (RAG)

It's important to ensure that your AI is presenting the absolute best results from your searches with it. In today's AI/ML tutorial by Bruce Nielson, we'll be looking at something called a Reranker, which will allow us to turn good results into best results.


Using Neo4j Graph Database for Retrieval Augmented Generation (RAG)

Using Neo4j Graph Database for Retrieval Augmented Generation (RAG)

In two previous posts, we talked about how to install Neo4j Graph Database for use with the Book Search Archive, our sample project using the Mindfire Technology open-source AI stack. In this post we'll cover the code necessary to use Neo4j as a document store for Retrieval Augmented Generation (RAG).