See Train Ch.3 and PyMC example. Thanks to (mostly) Paula Navarrete Díaz for writing these notes.
setup
Decision maker
— representative utility (observed, parameterised: ) — unobserved (random from researcher’s perspective)
Only the difference in observed utility,
logit
Assume
Gumbel is an extreme value distribution — natural for argmax (largest of many unobserved factors). The difference of two Gumbels is logistic:
which is the standard logistic CDF. This gives closed-form choice probabilities, unlike normal errors (probit, which needs simulation).
Derivation of
Integrate (Train 3.10) over all
i.e. softmax over representative utilities.
IIA and limitations
Independence of irrelevant alternatives. The ratio
Follows directly from i.i.d. assumption —
i.i.d. Gumbel fails with correlated alternatives (close substitutes), taste variation across individuals (random coefficients needed), repeated choices / panel data (unobserved individual effects). These require nested logit (correlated errors within nests), mixed logit (random coefficients,
interpretation
The point where an increase in
which is maximised at
reconstructing the counterfactual
We only observe the decision maker’s choice from a constrained set but want counterfactual propensities over the full set. e.g. true choice set
In general: full set
The revealed-but-rejected alternatives
counterfactual posterior
Define the ex-ante softmax (as above) over the full set:
Conditioning on the observed choice
The shown-but-not-chosen alternatives drop to zero; their mass transfers entirely to
When
proof via Gumbel max-stability
Intuitively: max-stability means the maximum of Gumbel variables is itself Gumbel. Splitting the choice set into
Fact 1 (Max-stability).
Fact 2 (Softmax).
Unshown best is independent of constrained best. For
Proof. The joint event is
Condition on
Substituting
Deriving the counterfactual posterior. The independence result gives
So unshown alternatives are unaffected by the constrained observation.
Since
Finally, conditional probabilities over the full set
Substituting
Partition
i.e. the chosen alternative absorbs all the mass from the shown-but-not-chosen alternatives.