DSPy: A Powerful (But Sometimes Dangerous) Prompting Tool
- By Bruce Nielson
- ML & AI Specialist
In our last post, we introduced DSPy, a tool to treat prompt building for your Large Language Model (LLM) like it is a Python function. This time we're going to use another modified example off their 'getting started' page and then play with it a little and get a feel for how DSPy works.
Writing a "Classify" Function
Let's use DSPy to write a sentiment classification function. Here is the code (found in this github repo):
class Classify(dspy.Signature):
"""Classify sentiment of a given sentence."""
sentence: str = dspy.InputField()
sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
confidence: float = dspy.OutputField()
Explanation of the Classify function, line by line:
Let's analyze what is going on here.
-
class Classify(dspy.Signature):
This defines a new DSPy Signature class namedClassify. By subclassingdspy.Signature, you declare that this class specifies the input/output contract for a DSPy module. (DSPy signatures docs) -
"""Classify sentiment of a given sentence."""
A docstring that gives a human-readable description of the task: sentiment classification of a sentence. -
sentence: str = dspy.InputField()
Declares an input field namedsentenceof typestr. This tells the module it will receive a sentence as input. -
sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
Declares an output field calledsentimentwhose type is aLiterallimited to the values"positive","negative", or"neutral". This instructs DSPy to parse and return one of those labels. -
confidence: float = dspy.OutputField()
Declares another output field namedconfidenceof typefloat. This tells DSPy to return a numeric confidence score alongside the label.
What this signature does overall:
Together, the fields form a function-like contract: given a sentence string, the module will return a sentiment label and a confidence float. DSPy uses this signature to (1) build the underlying prompt, (2) call the LLM, and (3) parse & cast the model output into the declared typed fields — returning a structured result instead of a raw string.
Running the Code
Now we'll use this function:
if __name__ == "__main__":
set_model()
dspy.configure()
classify = dspy.Predict(Classify)
result = classify(sentence="This book was super fun to read, though not the last chapter.")
print("\nClassification Example:")
print(result)
We first use the DSPy Predict module and pass in our Classify class to it and we get back a new classify function:
classify = dspy.Predict(Classify)
Now we can use the classify method by calling it and passing in a sentence parameter, just like we specified in our class.
result = classify(sentence="This book was super fun to read, though not the last chapter.")
The sentence we're trying to classify is "This book was super fun to read, though not the last chapter." This is positive for the first half and negative for the second half. So, unsurprisingly, when I run this code I get back a result like this:
Classification Example:
Prediction(
sentiment='neutral',
confidence=0.8
)
Gemini is 80% confident that this is a neutral sentence.
However, let's play around with this a bit and we'll see the dangers of not fully controlling your prompts. Let's intentionally try to screw things up a bit by redoing Classify like this:
class Classify(dspy.Signature): """Classify sentiment of a given sentence.""" sentence: str = dspy.InputField() sentiment: Literal["positive", "negative", "neutral", "boo!"] = dspy.OutputField() confidence: float = dspy.OutputField()
I added "boo!" as a possible sentiment, which makes no real sense. Now run the code again and we get:
Classification Example:
Prediction(
sentiment='positive',
confidence=0.75
)
By adding "boo!" as an option, we went from 80% confident this was a neutral statement to 75% confident it is positive. Why? I have no idea and I doubt the LLM does either.
How Prompts Are Built
You might, at this point, be wondering how this function is turned into a prompt that is sent to Gemini. The answer is that it uses the name of the class, the class properties, and even the docstring to come up with a prompt. To prove this, let's change the Classify method to instead be:
class Classify(dspy.Signature):
"""Return "boo!" as sentiment every time"""
sentence: str = dspy.InputField()
sentiment: Literal["positive", "negative", "neutral", "boo!"] = dspy.OutputField()
confidence: float = dspy.OutputField()
Notice that all I did was change the docstring to tell it to return "boo!" every time. And we now get:
Classification Example:
Prediction(
sentiment='boo!',
confidence=1.0
)
Nice, eh? The docstring isn't just a comment any more, it's part of how you code the function!
Conclusion
Hopefully this short demo will help you understand how DSPy turns its classes/functions into prompts. There is clearly a lot of power here, but also some danger. The best reason to do this is if you plan to use DSPy's optimizers to let your software come up with the testably best prompts. Imagine how you might simply unplug one model, plug in another, then rerun the optimizer. You could easily move from one LLM to another using the same code base!