Consider trying to determine the probability of getting a hit when randomly selecting from two possible batters.  Each has 100 at bats, while A has 26 hits and B has 36 hits.  In probability notation

P(A gets a hit) = 26/100 = 0.260

P(B gets a hit) = 36/100 = 0.360

and the probability that a random selection between the batters produces a hit is

P(hit) = (1/2)(0.260) + (1/2)(0.360) = 0.310.

But suppose each batter’s performance against left-handed and right-handed pitchers is as follows.

against                        against

batter               left-handers                right-handers              overall

A                     24/80 = 0.300                2/20 = 0.100              26/100 = 0.260

B                     12/50 = 0.240              24/50 = 0.480              36/100 = 0.360

Now the probabilities that a random selection between the batters, conditioned on the throwing hand of the pitcher, produces a hit are

P(hit|left-hander) = (1/2)(0.300) + (1/2)(0.240) = 0.270

P(hit|right-hander) = (1/2)(0.100) + (1/2)(0.480) = 0.290

How can the unconditional probability of getting a hit be larger than both of the conditional probabilities?  That is, how can the probability of getting a hit be 0.310 when the throwing hand of the opposing pitcher is not known but drop to 0.270 if the pitcher is known to be left-handed and drop to 0.290 if the pitcher is known to be right-handed?

(1) If the throwing arm of the opposing pitcher is not known, or is not considered relevant, then P(hit) is properly 0.310.

(2) If the throwing arm of the opposing pitcher is known, and considered relevant, then P(hit) is properly either 0.270 or 0.290.

The probabilities in (1) and (2) are obtained under two different assumptions and consequently cannot be directly compared.

This is an example of how pooling results from two fundamentally different data sets (left-handed and right-handed pitchers) to create a single data set (overall performance) can be artificial and give inappropriate results.  It is not correct to say, for example, that A is a 0.260 hitter: he is either a 0.300 hitter or a 0.100 hitter, depending on whether he is facing a left-hander or a right-hander – and his “overall” performance can be made to be anything between 0.300 and 0.100 by manipulating his number of at bats before each type of pitcher.

Another example of this principle is the relationship between human weight and time to run a 100 yard dash.  For 100 college students selected at random there will likely be a significant negative correlation between weight and time (i.e., the heavier students will have the shorter times).  But when the males and females are considered separately, there will likely be a significant positive correlation between weight and time for each gender (i.e., the heavier students within each gender will have the longer times).  Because males tend to be heavier than females and because males tend to be faster than females, pooling the two genders into a single date set gives artificial and inappropriate results.

Loyer’s Paradox and its resolution are discussed at length in the article “Can the Probability of an Event Be Larger or Smaller Than Each of Its Component Conditional Probabilities?” by Loyer and Sprechini in CHANCE Vol. 24, No. 1 [Winter 2011] pages 44-53.  CHANCE is a magazine of the American Statistical Association.

Note: Simpson’s Paradox also involves the pooling of data sets to produce a seeming inconsistency.  The CHANCE article demonstrates that Loyer’s Paradox is conceptually different from Simpson’s Paradox, and that no set of data can exhibit both paradoxes simultaneously.