Building a Local DeepSeek R1 Chatbot with Chainlit

Building a Local DeepSeek R1 Chatbot with Chainlit

In the previous post we used Streamlit and Ollama to build a local Deepseek R1 chatbot. Let's now do the same thing using Chainlit as our UI to try out Chainlit. Chainlit provides an elegant, real-time chat interface out of the box, and it works beautifully with models running through Ollama.

Just like before, DeepSeek R1 will stream both its thinking process and its final answer. Chainlit is dedicated specifically to building chatbots and doesn't seem quite as flexible as Streamlit. This led to at least one problem I'll explain below.

The full working code is included in this post and in my GitHub repo.

What You'll Need

  • Python 3.11+ installed
  • Ollama installed (see this post)
  • DeepSeek R1 pulled locally:
    ollama pull deepseek-r1:1.5b
  • Chainlit:
    pip install chainlit
  • Ollama Python library:
    pip install ollama

The Complete Code

Below is the full app.py file we'll walk through:

import chainlit as cl
import ollama


def convert_latex_delimiters(text):
    """Convert LaTeX delimiters from backslash-bracket to dollar signs"""
    if not text:
        return text
    # Replace display math delimiters
    text = text.replace(r'\[', '$$')
    text = text.replace(r'\]', '$$')
    # Replace inline math delimiters
    text = text.replace(r'\(', '$')
    text = text.replace(r'\)', '$')
    return text


@cl.on_message
async def on_message(message: cl.Message):
    """
    Handles incoming messages from the user, sends them to the LLM,
    and streams the response back to the Chainlit interface with thinking process.
    """
    # System prompt for the AI
    system_message = {
        "role": "system",
        "content": """You are an advanced AI assistant powered by the deepseek-r1 model.

Guidelines:
- If you're uncertain about something, acknowledge it rather than making up information
- Format your responses with markdown when it improves readability
"""
    }

    # Create a step for thinking and messages for streaming
    thinking_step = cl.Step(name="💭 Thinking", type="tool")
    final_answer = cl.Message(content="")

    accumulated_thinking = ""
    accumulated_answer = ""

    try:
        # Request completion from the model with streaming and thinking enabled
        stream = ollama.chat(
            model="deepseek-r1:1.5b",
            messages=[
                system_message,
                {"role": "user", "content": message.content}
            ],
            stream=True,
            think=True,  # This is the critical parameter for Ollama native API
        )

        thinking_started = False
        answer_started = False
        answer_buffer = ""

        # Stream the response to the UI
        for chunk in stream:
            chunk_msg = chunk.get("message", {})

            # Handle thinking content
            if chunk_msg.get("thinking"):
                if not thinking_started:
                    thinking_started = True
                    await thinking_step.send()

                thinking_text = chunk_msg["thinking"]
                accumulated_thinking += thinking_text
                thinking_step.output = convert_latex_delimiters(accumulated_thinking)
                await thinking_step.update()

            # Handle answer content
            if chunk_msg.get("content"):
                if not answer_started:
                    answer_started = True
                    if thinking_started:
                        # Finalize the thinking step
                        await thinking_step.update()
                    await final_answer.send()

                answer_text = chunk_msg["content"]
                accumulated_answer += answer_text
                answer_buffer += answer_text

                # Only update every 10 characters or so to avoid overwhelming the socket
                if len(answer_buffer) >= 10:
                    await final_answer.stream_token(answer_buffer)
                    answer_buffer = ""

        # Send any remaining buffered content
        if answer_buffer:
            await final_answer.stream_token(answer_buffer)

        # Update final answer with LaTeX conversion
        final_answer.content = convert_latex_delimiters(accumulated_answer)
        await final_answer.update()

    except Exception as e:
        error_msg = cl.Message(content=f"❌ Error generating response: {str(e)}")
        await error_msg.send()


@cl.on_chat_start
async def start():
    """
    Sends a welcome message when the chat starts.
    """
    await cl.Message(
        content="👋 Hello! I'm powered by **DeepSeek R1**. I'll show you my thinking process before answering.\n\n"
                "Try asking me a math problem or reasoning question!"
    ).send()

Breaking Down the Code

Chainlit Event Hooks

Chainlit uses decorators such as:

@cl.on_message
@cl.on_chat_start

These act as event listeners.
- on_chat_start fires once when the UI loads.
- on_message fires every time the user sends a message.

LaTeX Conversion Helper

DeepSeek R1 frequently uses LaTeX but with delimiters that Chainlit doesn't render by default. We fix that:

def convert_latex_delimiters(text):
    text = text.replace(r'\[', '$$')
    text = text.replace(r'\]', '$$')
    text = text.replace(r'\(', '$')
    text = text.replace(r'\)', '$')

Same logic as the Streamlit version, just applied before displaying anything.

Thinking Step

Chainlit allows you to create "steps" that appear in the UI:

thinking_step = cl.Step(name="💭 Thinking", type="tool")

As the model streams reasoning content, we update this step live. The user can watch DeepSeek R1 deliberate.

Unfortunately, I couldn't figure out (in time for this post) how to get the thinking to sit on top of the answer. So if you are watching it think the spot where the final answer goes scrolls off the top of the screen. I tried several ideas on how to fix this and none worked. Good job Chainlit getting your app that has only one job to not do it right! (At least not easily.)

Streaming with Ollama

The heart of the system:

stream = ollama.chat(
    model="deepseek-r1:1.5b",
    messages=[...],
    stream=True,
    think=True,
)
  • stream=True → tokens arrive as a generator
  • think=True → reasoning is delivered separately from the final answer

Streaming Back to Chainlit

We listen for both thinking and final answer tokens:

if chunk_msg.get("thinking"):
    ...
if chunk_msg.get("content"):
    ...

Thinking updates go to thinking_step.update(), while answer tokens are streamed via:

await final_answer.stream_token(...)

This gives a smooth, real-time chat experience.

Welcome Message

When the chat starts, we show a friendly introduction:

await cl.Message(
    content="👋 Hello! I'm powered by **DeepSeek R1**..."
).send()

Error Handling

Any unexpected issues are passed back to the UI:

await cl.Message(content=f"❌ Error generating response: {str(e)}").send()

Running the Application

Save the file as app.py and run:

chainlit run chainlit_example.py -w

Your browser will open automatically with the Chainlit interface. Try a math or logic problem, and you'll see DeepSeek R1 stream its thoughts step-by-step before answering.

Final Result: enter image description here

Key Features

  • Runs entirely locally—no external API calls
  • Chainlit displays the reasoning process as a dedicated live-updating step
  • Smooth streaming output
  • Proper math rendering
  • Clean and modern chat UI with minimal setup

Concerns

I never did get Chainlit to work as well as Streamlit. It tended to error out due to 'too many tokens' errors and the Latex support was hit and miss at best. I'll have to circle back on this and see if I can improve it.

Conclusion

Using Chainlit with DeepSeek R1 makes it incredibly simple to build a polished chatbot interface. With around 100 lines of code, you get live thinking visualization, Markdown rendering, history management, and a production-ready UI—all running locally via Ollama.

Feel free to adapt this example and extend it for your own experiments!

SHARE


comments powered by Disqus

Follow Us

Latest Posts

subscribe to our newsletter