How DSPy Builds Prompts
- By Bruce Nielson
- ML & AI Specialist
In previous posts, we talked about what DSPy, a tool you can use to treat Large Language Model (LLM) interactions as functions written in code rather than via prompting. In a follow-on post, we saw examples of how to use it out of the box.
Okay, that is all nice and good, but how exactly does it work? I mean ultimately you interact with an LLM via prompts, right? So what is all this code doing?
How Classify Works
In this post, we're going to answer that question. You can find my code for this post in my github repo. Let's start with the same Classify signature as from our previous post:
class Classify(dspy.Signature):
"""Classify sentiment of a given sentence."""
sentence: str = dspy.InputField()
sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
confidence: float = dspy.OutputField()
We then turn this into a real Classify function:
model = setup_model()
classify = dspy.Predict(Classify)
result = classify(
sentence="This book was super fun to read, though not the last chapter."
)
Recall that the LLM produces a structured output like this:
Prediction(
sentiment='neutral',
confidence=0.8
)
How it Works
But how does it do it? Obviously it is building a prompt behind the scenes. Can we see what it is doing? Yes, by inspecting the history:
print("\n--- Prompt Built by DSPy ---")
print(model.inspect_history(n=1))
Where:
- .inspect_history() is a method that keeps a record of every request DSPy has sent to the model in order.
- The parameter n=1 means “show me the last 1 interaction.”
Here is the (somewhat long) result:
Classification Result:
Prediction(
sentiment='neutral',
confidence=0.8
)
--- Prompt Built by DSPy ---
[2025-11-20T10:39:12.956671]
System message:
Your input fields are:
1. `sentence` (str):
Your output fields are:
1. `sentiment` (Literal['positive', 'negative', 'neutral']):
2. `confidence` (float):
All interactions will be structured in the following way, with the appropriate values filled in.
[[ ## sentence ## ]]
{sentence}
[[ ## sentiment ## ]]
{sentiment} # note: the value you produce must exactly match (no extra characters) one of: positive; negative; neutral
[[ ## confidence ## ]]
{confidence} # note: the value you produce must be a single float value
[[ ## completed ## ]]
In adhering to this structure, your objective is:
Classify sentiment of a given sentence.
User message:
[[ ## sentence ## ]]
This book was super fun to read, though not the last chapter.
Respond with the corresponding output fields, starting with the field `[[ ## sentiment ## ]]` (must be formatted as a valid Python Literal['positive', 'negative', 'neutral']), then `[[ ## confidence ## ]]` (must be formatted as a valid Python float), and then ending with the marker for `[[ ## completed ## ]]`.
Response:
[[ ## sentiment ## ]]
neutral
[[ ## confidence ## ]]
0.8
[[ ## completed ## ]]
What we see here illustrates how DSPy builds structured prompts:
System message: DSPy tells the model exactly what the input and output fields are, and how the response should be formatted.
Field markers ([[ ## sentence ## ]], etc.): These placeholders map your Python class fields to text in the prompt. The LLM fills them in with actual values (sentence input, sentiment and confidence output).
[[ ## sentence ## ]]
{sentence}
[[ ## sentiment ## ]]
{sentiment} # note: the value you produce must exactly match (no extra characters) one of: positive; negative; neutral
Docstring usage: The docstring from Classify (“Classify sentiment of a given sentence.”) becomes part of the instructions, guiding the model on the task.
[[ ## completed ## ]]
In adhering to this structure, your objective is:
Classify sentiment of a given sentence.
This structure ensures the LLM returns a predictable, typed response — which is exactly why we get Prediction(sentiment='neutral', confidence=0.8) instead of an unstructured string.
Conclusion
In this post, we saw how DSPy automatically converts a Python class into a fully structured prompt for Gemini. By inspecting the history, we can see exactly how the class name, fields, and docstring are used to generate instructions that guide the model’s response.
This approach gives you predictable, type-safe outputs without manually writing prompts. You can experiment by tweaking input/output fields or the docstring and immediately see how the prompt — and the model’s behavior — changes.
Ultimately, DSPy lets you treat LLMs like functions in your code, while still giving full transparency into the underlying prompts. It’s a powerful way to combine the flexibility of LLMs with the reliability of typed, structured programming.