See Train Ch.3 and PyMC example. Thanks to (mostly) Paula Navarrete Díaz for writing these notes.

setup

Decision maker faces alternatives. Utility of alternative for person :

  • representative utility (observed, parameterised: )
  • unobserved (random from researcher’s perspective)

chooses iff for all .

Only the difference in observed utility, , matters, not absolute values. Same for .

logit

Assume are i.i.d. Gumbel (type I extreme value):

Gumbel is an extreme value distribution — natural for argmax (largest of many unobserved factors). The difference of two Gumbels is logistic:

which is the standard logistic CDF. This gives closed-form choice probabilities, unlike normal errors (probit, which needs simulation).

Derivation of . Condition on , is chosen if every other . Since are independent Gumbel:

Integrate (Train 3.10) over all weighted by its density:

i.e. softmax over representative utilities.

IIA and limitations

Independence of irrelevant alternatives. The ratio depends only on and — not on any other alternative.

Follows directly from i.i.d. assumption — have no correlation structure across alternatives.

i.i.d. Gumbel fails with correlated alternatives (close substitutes), taste variation across individuals (random coefficients needed), repeated choices / panel data (unobserved individual effects). These require nested logit (correlated errors within nests), mixed logit (random coefficients, varies across ), or normal errors (probit (multivariate normal ).

interpretation

The point where an increase in has the greatest effect on is when . At high probabilities further increases in representative utility have little effect on the choice probability. Marginal effect:

which is maximised at .

reconstructing the counterfactual

We only observe the decision maker’s choice from a constrained set but want counterfactual propensities over the full set. e.g. true choice set but only shown; chosen. What are the propensities under the full set?

In general: full set ; shown set . Decision maker chooses . Unshown alternatives .

The revealed-but-rejected alternatives are like Monty Hall doors opened to reveal goats — except here the mass flows to the chosen door rather than away from it (Backwards Monty Hall). absorbs all probability mass from the shown-but-not-chosen alternatives . (unshown) retain their ex-ante probability. Consequence of Gumbel max-stability.

counterfactual posterior

Define the ex-ante softmax (as above) over the full set:

Conditioning on the observed choice and the shown-but-not-chosen alternatives being rejected:

The shown-but-not-chosen alternatives drop to zero; their mass transfers entirely to . The unshown alternatives are unaffected — their conditional probability equals their ex-ante probability. This is a special property of the Gumbel: the distribution of maxima over disjoint subsets factorises (max-stability), i.e. the joint probability splits into the product of marginals — so knowing beat tells you nothing about how compares to .

When (unconstrained), and .

proof via Gumbel max-stability

Intuitively: max-stability means the maximum of Gumbel variables is itself Gumbel. Splitting the choice set into and , the events “unshown is overall best” and “shown is best among shown” involve maxima over disjoint sets. Max-stability makes these factorise (, the events are independent).

with i.i.d.

Fact 1 (Max-stability). for independent .

Fact 2 (Softmax). .

Unshown best is independent of constrained best. For , the events (unshown is overall best) and (shown is best among shown) are independent:

Proof. The joint event is where , . By Fact 1, and .

Condition on then :

Substituting , the Gumbel densities, and , factors the integral into:

Deriving the counterfactual posterior. The independence result gives for . By the definition of conditional probability:

So unshown alternatives are unaffected by the constrained observation.

Since was chosen over every other shown alternative, for all , giving — shown-but-not-chosen get zero mass.

Finally, conditional probabilities over the full set must sum to 1:

Substituting for :

Partition into and : (one alternative in is chosen), so :

i.e. the chosen alternative absorbs all the mass from the shown-but-not-chosen alternatives.