Abusing Correlations

Question
(received via email from a former student, all employer references removed)

“One of the pieces of the research is to identify key attributes that drive customers to choose a vendor for buying office products.

“The market research guy that we have hired (he is an MBA/PhD from Wharton) says the following:

“‘I can determine the relative importance of various attributes that drive overall satisfaction by running a correlation of each one of them against overall satisfaction score and then ranking them based on the coefficient scores.’

“I am not really certain if we can do that. I would tend to think we should run a regression to get relative weightage.”

Answer

I worked up an example (click here) by modifying the motorpool data. It examines overall customer satisfaction (on a 100-point scale) as the order leadtime (in days), product price, and availability of online order-tracking (0 = not available, 1 = available) vary. Here are the correlations:

satisfaction
leadtime -0.76592
ol-track -0.24214
price 0.09742

The correlations, showing only two-dimensional shadows, "suggest" (if incorrectly interpreted) that online tracking is a bad thing, and is more important than price. They also suggest that price matters hardly at all, and to the extent that it matters, higher prices lead to greater satisfaction.

And here's the full regression:

Regression: satisfaction

constant leadtime price ol-track
coefficient 336.645 -6.88555 -14.4199 8.55990
std err of coef 31.1953 0.55354 2.50985 4.07288
t-ratio 10.7915 -12.4391 -5.7453 2.1017
significance 0.0000% 0.0000% 0.0000% 4.0092%
beta-weight -1.0879 -0.4571 0.1586
 
standard error of regression 13.929
coefficient of determination 75.03%
adjusted coef of determination 73.70%

The regression estimates show (very sensibly) that customers like short leadtimes and low prices, and that online tracking is a good thing (but the least important of the three explanatory variables in helping to explain the observed variance in customer satisfaction scores).


The "trick" to this data - not really a trick, since it is a natural phenomenon - is that leadtime and price are negatively correlated, i.e., generally, you have to pay more to get quicker delivery. The correlations between price and satisfaction, and online order tracking and satisfaction, are distorted by the masking effect of leadtime.