Two Very Different Questions


"Which make of car (Ford or Honda) is currently costing more to maintain?"
"Which make of car (Ford or Honda) costs more to maintain?"


There's a subtle difference between these two questions. Indeed, for the motorpool dataset, the answers are different – "Ford" for the first question, and "Honda" for the second.

One way to see the difference is to ask yourself whether you'd approach the problem by comparing two distinct vehicles in order to answer, or by taking a single vehicle and imagining a single change in some aspect of that vehicle.

For the first question, we'd want to compare the "typical" Ford in the motorpool with the "typical" Honda - two distinct vehicles, which differ in make, and possibly in other important ways as well. Indeed, the typical Ford is currently being driven several thousand miles further in the course of a year than is the typical Honda. This extra mileage is increasing the maintenance expenses of Fords, to the point where those expenses are typically higher than for Hondas. We see this directly by making two predictions – one for a Ford, and the other for a Honda, knowing only the make of each car – and then comparing the predictions. The predictions come from a regression of Cost onto Make alone.

In contrast, for the second question we'd want to know the pure effect of make on annual maintenance costs. We can estimate this effect by taking a single hypothetical car – of a fixed age, driven a fixed number of miles – and then predicting the cost of maintaining this car for a year if it were a Ford, and if it were a Honda, and comparing the predictions. Both predictions would come from a regression of Cost onto all of the relevant explanatory variables, i.e., from the "most complete" model. In this way we can tweak just the make of this single car, while holding everything else about the car constant, and see the effect of the tweak.

It is of particular importance that a manager understand (and learn to "hear") the difference between these two questions, since typically only one of the two will be presented. Someone seeking to influence you to buy Hondas will raise the first question (and use the answer to argue against buying Fords). Someone seeking to influence you to buy Fords will raise the second question. [Assuming that, whatever cars are in the fleet, the same total mileage will be put on the cars in the course of a year, and that your primary goal is maintenance-cost minimization, it is the second question that should be raised (and Fords that should be favored).]