Machine Learning 101: The Key Concepts Behind Every Learning Algorithm

Machine Learning 101: The Key Concepts Behind Every Learning Algorithm

Machine learning textbooks have their own vocabulary. But behind the jargon lies a process that would be deeply familiar to Karl Popper: conjecture and refutation. This post is a short reference guide to the key terms from Tom Mitchell's Machine Learning -- a foundational textbook that I also draw on in my post on the futility of unbiased learning. For each term, I will give Mitchell's definition and then show how it maps onto Karl Popper's logic of scientific discovery.

To make this concrete, imagine a 19th-century physician trying to figure out what causes a mysterious illness sweeping through a city. Patients come in with various combinations of age, diet, water source, neighborhood, and occupation. Some get sick, others do not. The physician is trying to discover the underlying rule from these observations.

Here is the truth the physician does not yet know: the illness strikes patients who drink from the river and live in the low-lying district near the tannery due to contamination from the tannery. Both conditions must be present -- river drinkers in the hills stay healthy, and tannery district residents who drink from wells stay healthy. Only the combination is deadly.

This is, as Mitchell would put it, a concept learning task (Mitchell, p. 22). The physician's job is to discover that two-attribute rule from a handful of patients -- without knowing in advance how many attributes matter or which ones.

Instance Space

In machine learning, the instance space (denoted X) is the set of all possible examples the learner could encounter (Mitchell, p. 22). In our medical example, it is the set of all possible patients -- every combination of age, diet, water source, neighborhood, occupation, and so on. A young well-drinking hillside baker. An old river-drinking tannery district laborer. Every combination, whether or not the physician has actually seen such a patient.

In Popper's framework, the instance space is the set of all possible observations or experiments. It defines the scope of what the theory is about. Most of these possible observations will never actually be made. But they all matter, because a good theory must make predictions about all of them -- not just the ones we happen to have seen.

Target Concept

The target concept (denoted c) is the true rule the learner is trying to discover (Mitchell, p. 22). It is a function that correctly classifies every instance. In our example, the target concept is: a patient gets sick if and only if they drink from the river AND live in the tannery district. The target concept is unknown to the physician. The whole point of learning is to figure out what it is.

In Popper's terms, the target concept is the law of nature we are searching for. We do not have direct access to it. We can only approach it indirectly through conjecture and refutation -- proposing theories and testing them against observations. We may never know for certain that we have found it, but we can know when we have not found it, because our conjecture will be refuted by the evidence.

Hypothesis

A hypothesis (denoted h) is a candidate theory -- one possible answer to the question "what is the target concept?" (Mitchell, p. 23). The physician might conjecture "patients who drink from the river get sick" or "patients who live in the tannery district get sick" or "old patients get sick." Each of these is a hypothesis. Some are too broad, some are too narrow, and one -- "river drinkers in the tannery district get sick" -- happens to be correct. But the physician does not know that yet.

A hypothesis is exactly what Popper calls a conjecture. It is a bold guess about the structure of reality. It may be right or wrong. What matters is that it is testable -- it makes predictions that can be checked against observations.

Hypothesis Space

The hypothesis space (denoted H) is the set of all hypotheses the learner is willing to consider (Mitchell, p. 23). This is not the set of all possible explanations -- it is the set of explanations the learner can express given its representation.

It is intractable to consider every possible hypothesis, as this would be an infinite set. But this is unnecessary. We have a lot of background knowledge that lets us constrain the possible hypotheses we will consider. So typically we start with an already partially constrained hypothesis space based on our background knowledge. This is part of our own human "inductive bias."

But suppose our physician only considers single-attribute hypotheses -- "it is the water source" or "it is the neighborhood" -- then the correct two-attribute answer is not even in the hypothesis space. The physician could examine every patient in the city and still never find the answer, because the target concept cannot be expressed within the hypothesis space he is considering. The choice of hypothesis space determines what the learner can and cannot discover.

In Popper's framework, the hypothesis space corresponds to the theoretical framework within which the scientist operates. No scientist considers every conceivable theory. They work within a tradition, a paradigm, or a set of background assumptions that constrain which conjectures are even formulable. As I discussed in my post on [inductive bias][2], this constraint is not a flaw -- it is a prerequisite for learning anything at all. But it must be the right constraint, or the truth will be invisible to you.

Training Examples

Training examples (denoted D) are the actual observations available to the learner (Mitchell, p. 23). Each training example is an instance paired with its correct classification. A positive example is a patient who got sick. A negative example is a patient who stayed healthy.

Suppose the physician has seen five patients so far:

  • Old, river, tannery district, laborer -- sick
  • Young, river, tannery district, baker -- sick
  • Old, river, hillside, farmer -- healthy
  • Young, well, tannery district, tanner -- healthy
  • Old, well, hillside, laborer -- healthy

In Popper's framework, these are the observations against which we test our conjectures. The third patient -- a river drinker who stayed healthy -- is a falsification of the hypothesis "river water causes the illness." The fourth patient -- a tannery district resident who stayed healthy -- falsifies "living in the tannery district causes the illness." And the fifth patient -- an old laborer who stayed healthy -- falsifies "being a laborer causes the illness." Only the conjunction "river AND tannery district" survives all five observations.

As Popper argued, falsifications are where the real action is. A thousand sick river-drinking tannery residents do not prove that the combination is the cause -- they are merely consistent with it. But each negative example eliminates entire families of wrong theories in one stroke.

Concept Learning

Mitchell defines concept learning as "inferring a boolean-valued function from training examples of its input and output" (Mitchell, p. 21). That sounds narrow, but it is surprisingly general. Our physician is doing concept learning: he has patients (instances), he knows which ones got sick and which did not (positive and negative examples), and he is searching for the rule that separates them. But this is also what scientists do all the time. Any time we seek a causal explanation -- what causes this disease, what makes this material brittle, why do some stars explode and others do not -- we are doing concept learning. We have observations, we have outcomes, and we are searching for the rule.

But here is the key reframing that Popper would appreciate: Mitchell treats concept learning as a search problem. The learner is searching through a hypothesis space for a hypothesis consistent with the training data (Mitchell, p. 23). Our physician is not staring at patients and waiting for a pattern to emerge. He is starting with a space of possible theories and eliminating the ones that fail. The hypothesis "it is the water" fails when he sees a healthy river drinker in the hills. The hypothesis "it is the neighborhood" fails when he sees a healthy well drinker in the tannery district. What remains is not what the data induced but what the data failed to refute. This is exactly what Popper and Donald Campbell called "evolutionary epistemology" -- knowledge grows not by accumulation but by variation and selection. You generate candidate theories, test them against reality, and the ones that survive criticism are what remain. The data does not build the theory. The data selects among the theories.

Version Space

The version space is the set of all hypotheses in H that are consistent with the training data observed so far (Mitchell, p. 31). As training examples come in, hypotheses that are contradicted by the data get eliminated and the version space shrinks.

After the physician's first two patients (both sick, both river-drinking tannery district residents), many hypotheses are still alive: maybe it is the river, maybe it is the district, maybe it is the combination, maybe it is being a laborer. The version space is large. But after the third patient -- a healthy river drinker in the hills -- every hypothesis that blames the river alone is eliminated. After the fourth -- a healthy well drinker in the tannery district -- every hypothesis that blames the district alone is eliminated. The version space is closing in on the truth.

In Popper's terms, the version space is the set of conjectures that have survived all attempts at refutation so far. Each new observation either leaves it unchanged or shrinks it. The goal is to shrink it until only one hypothesis remains.

Consistent Hypothesis

A consistent hypothesis is one that correctly classifies every training example seen so far (Mitchell, p. 23). It has not been falsified by any observation.

After all five patients, the hypothesis "river AND tannery district" is consistent. But so are other, more specific hypotheses. Consider: we have not yet seen a river-drinking tannery district farmer. So the hypothesis "river AND tannery district AND NOT farmer" is equally consistent with everything we have observed -- maybe farmers are somehow immune. Likewise, both sick patients happened to be either laborers or bakers, so "river AND tannery district AND (laborer OR baker)" also fits the data. We simply have not seen enough patients to distinguish these hypotheses from each other.

So "consistent" does not mean "correct." It means "not yet refuted." This is precisely Popper's point about corroboration. A theory that has survived testing is corroborated, not confirmed. It has proven its mettle so far, but it remains permanently open to future refutation. The physician would need to find a river-drinking tannery district farmer to tell these hypotheses apart.

The General-to-Specific Ordering

Mitchell observes that hypotheses can be naturally ordered from general to specific (Mitchell, p. 24). A more general hypothesis classifies more instances as positive. The hypothesis "anyone who drinks from the river gets sick" is more general than "river drinkers in the tannery district get sick" -- the first covers a superset of the cases covered by the second.

This ordering has a direct Popperian interpretation. Popper argued that more general theories are more falsifiable -- they make bolder claims about the world, and therefore there are more ways they could be wrong. "All river drinkers get sick" is easier to refute than "river drinkers in the tannery district get sick," because the first makes predictions about a far wider range of patients. Popper considered this a virtue. Bolder theories, when they survive testing, tell us more about the world.

The Candidate-Elimination algorithm exploits this ordering to efficiently search the hypothesis space -- tracking the most general and most specific surviving hypotheses and using each new observation to tighten the bounds. As I described in my post on the futility of unbiased learning, this is falsification implemented as a computer program.

The Takeaway

Machine learning is not induction. It is search -- a search through a space of conjectures, guided by observations, eliminating the hypotheses that fail. The vocabulary is different, but the logic is the same logic Karl Popper described: bold conjectures, tested against experience, with the wrong ones ruthlessly discarded.

The key terms from Mitchell map cleanly onto this process:

  • The instance space is the domain of possible observations
  • The target concept is the unknown law we seek
  • A hypothesis is a conjecture
  • The hypothesis space is the set of conjectures we are willing to consider
  • Training examples are the observations that test our conjectures
  • Concept learning is the search for surviving conjectures
  • The version space is the set of conjectures not yet refuted
  • A consistent hypothesis is a conjecture that has survived all tests so far
  • The general-to-specific ordering reflects Popper's insight that bolder theories are more falsifiable

All page references to Mitchell are from Machine Learning (McGraw-Hill, 1997).

If you need help with your Artificial Intelligence solutions, we're here to help.

SHARE


comments powered by Disqus

Follow Us

Latest Posts

subscribe to our newsletter