Building a Local DeepSeek R1 Chatbot with Chainlit
- By Bruce Nielson
- ML & AI Specialist
In the previous post we used Streamlit and Ollama to build a local Deepseek R1 chatbot. Let's now do the same thing using Chainlit as our UI to try out Chainlit. Chainlit provides an elegant, real-time chat interface out of the box, and it works beautifully with models running through Ollama.
Just like before, DeepSeek R1 will stream both its thinking process and its final answer. Chainlit is dedicated specifically to building chatbots and doesn't seem quite as flexible as Streamlit. This led to at least one problem I'll explain below.
The full working code is included in this post and in my GitHub repo.
What You'll Need
- Python 3.11+ installed
- Ollama installed (see this post)
- DeepSeek R1 pulled locally:
ollama pull deepseek-r1:1.5b - Chainlit:
pip install chainlit - Ollama Python library:
pip install ollama
The Complete Code
Below is the full app.py file we'll walk through:
import chainlit as cl
import ollama
def convert_latex_delimiters(text):
"""Convert LaTeX delimiters from backslash-bracket to dollar signs"""
if not text:
return text
# Replace display math delimiters
text = text.replace(r'\[', '$$')
text = text.replace(r'\]', '$$')
# Replace inline math delimiters
text = text.replace(r'\(', '$')
text = text.replace(r'\)', '$')
return text
@cl.on_message
async def on_message(message: cl.Message):
"""
Handles incoming messages from the user, sends them to the LLM,
and streams the response back to the Chainlit interface with thinking process.
"""
# System prompt for the AI
system_message = {
"role": "system",
"content": """You are an advanced AI assistant powered by the deepseek-r1 model.
Guidelines:
- If you're uncertain about something, acknowledge it rather than making up information
- Format your responses with markdown when it improves readability
"""
}
# Create a step for thinking and messages for streaming
thinking_step = cl.Step(name="💭 Thinking", type="tool")
final_answer = cl.Message(content="")
accumulated_thinking = ""
accumulated_answer = ""
try:
# Request completion from the model with streaming and thinking enabled
stream = ollama.chat(
model="deepseek-r1:1.5b",
messages=[
system_message,
{"role": "user", "content": message.content}
],
stream=True,
think=True, # This is the critical parameter for Ollama native API
)
thinking_started = False
answer_started = False
answer_buffer = ""
# Stream the response to the UI
for chunk in stream:
chunk_msg = chunk.get("message", {})
# Handle thinking content
if chunk_msg.get("thinking"):
if not thinking_started:
thinking_started = True
await thinking_step.send()
thinking_text = chunk_msg["thinking"]
accumulated_thinking += thinking_text
thinking_step.output = convert_latex_delimiters(accumulated_thinking)
await thinking_step.update()
# Handle answer content
if chunk_msg.get("content"):
if not answer_started:
answer_started = True
if thinking_started:
# Finalize the thinking step
await thinking_step.update()
await final_answer.send()
answer_text = chunk_msg["content"]
accumulated_answer += answer_text
answer_buffer += answer_text
# Only update every 10 characters or so to avoid overwhelming the socket
if len(answer_buffer) >= 10:
await final_answer.stream_token(answer_buffer)
answer_buffer = ""
# Send any remaining buffered content
if answer_buffer:
await final_answer.stream_token(answer_buffer)
# Update final answer with LaTeX conversion
final_answer.content = convert_latex_delimiters(accumulated_answer)
await final_answer.update()
except Exception as e:
error_msg = cl.Message(content=f"❌ Error generating response: {str(e)}")
await error_msg.send()
@cl.on_chat_start
async def start():
"""
Sends a welcome message when the chat starts.
"""
await cl.Message(
content="👋 Hello! I'm powered by **DeepSeek R1**. I'll show you my thinking process before answering.\n\n"
"Try asking me a math problem or reasoning question!"
).send()
Breaking Down the Code
Chainlit Event Hooks
Chainlit uses decorators such as:
@cl.on_message
@cl.on_chat_start
These act as event listeners.
- on_chat_start fires once when the UI loads.
- on_message fires every time the user sends a message.
LaTeX Conversion Helper
DeepSeek R1 frequently uses LaTeX but with delimiters that Chainlit doesn't render by default. We fix that:
def convert_latex_delimiters(text):
text = text.replace(r'\[', '$$')
text = text.replace(r'\]', '$$')
text = text.replace(r'\(', '$')
text = text.replace(r'\)', '$')
Same logic as the Streamlit version, just applied before displaying anything.
Thinking Step
Chainlit allows you to create "steps" that appear in the UI:
thinking_step = cl.Step(name="💭 Thinking", type="tool")
As the model streams reasoning content, we update this step live. The user can watch DeepSeek R1 deliberate.
Unfortunately, I couldn't figure out (in time for this post) how to get the thinking to sit on top of the answer. So if you are watching it think the spot where the final answer goes scrolls off the top of the screen. I tried several ideas on how to fix this and none worked. Good job Chainlit getting your app that has only one job to not do it right! (At least not easily.)
Streaming with Ollama
The heart of the system:
stream = ollama.chat(
model="deepseek-r1:1.5b",
messages=[...],
stream=True,
think=True,
)
stream=True→ tokens arrive as a generatorthink=True→ reasoning is delivered separately from the final answer
Streaming Back to Chainlit
We listen for both thinking and final answer tokens:
if chunk_msg.get("thinking"):
...
if chunk_msg.get("content"):
...
Thinking updates go to thinking_step.update(), while answer tokens are streamed via:
await final_answer.stream_token(...)
This gives a smooth, real-time chat experience.
Welcome Message
When the chat starts, we show a friendly introduction:
await cl.Message(
content="👋 Hello! I'm powered by **DeepSeek R1**..."
).send()
Error Handling
Any unexpected issues are passed back to the UI:
await cl.Message(content=f"❌ Error generating response: {str(e)}").send()
Running the Application
Save the file as app.py and run:
chainlit run chainlit_example.py -w
Your browser will open automatically with the Chainlit interface. Try a math or logic problem, and you'll see DeepSeek R1 stream its thoughts step-by-step before answering.
Final Result:

Key Features
- Runs entirely locally—no external API calls
- Chainlit displays the reasoning process as a dedicated live-updating step
- Smooth streaming output
- Proper math rendering
- Clean and modern chat UI with minimal setup
Concerns
I never did get Chainlit to work as well as Streamlit. It tended to error out due to 'too many tokens' errors and the Latex support was hit and miss at best. I'll have to circle back on this and see if I can improve it.
Conclusion
Using Chainlit with DeepSeek R1 makes it incredibly simple to build a polished chatbot interface. With around 100 lines of code, you get live thinking visualization, Markdown rendering, history management, and a production-ready UI—all running locally via Ollama.
Feel free to adapt this example and extend it for your own experiments!