The Fundamental Concept

Consider a population of 70 individuals, for which you wish to estimate the mean annual income, μ. Assume that you have decided to use the following estimation procedure to make your estimate:

                  

After you carry out this procedure, the box will contain a number, e.g., $47,530.

However, imagine yourself standing at a particular moment in time, after you have committed to carrying out the procedure but before it is actually implemented. Peer into your future, and look inside the box. What do you see?

You don't see a specific number, since the procedure hasn't yet been carried out. But you see more than an empty box, since you know that there will soon be a specific number there: You see the potential for a number to appear. Indeed, there are 705 = 1,680,700,000 different ways the procedure might eventually play out, and each will yield a specific number in the box. Therefore we can assert that the eventual content of the box has probability 1/1,680,700,000 of being each of many different values.

Such a potential, having specific probabilities of yielding specific values, is called a random variable.


The Fundamental Concept (underlying all statistical analysis): Anything of interest that can be said about a statistical procedure is equivalent to some mathematical statement concerning the random variable which represents the end result of the procedure. In particular, our exposure to sampling error when using a procedure can be measured by studying the associated random variable.


At some point in your past, you have likely heard the fields of probability and statistics glued together, as “probabilityandstatistics.” The two fields are actually quite different: The domain of probability is the study of uncertainty, and the domain of statistics is the use of sample data to make inferences about a population. However, sampling involves randomnesss, i.e., uncertainty, and therefore the tools from probability can be applied to help us understand our exposure to sampling error.

Imagine, for example, that we had planned to make our estimate of the population mean annual income using the following alternative procedure:

                  

Undoubtedly you would be somewhat uncomfortable about using this estimation procedure. But why? It's not that the procedure will yield too low an estimate: If the sample happens, by chance, to consist of five individuals who all earn more than the actual population mean, this procedure will yield too high an estimate (and indeed will yield an estimate closer to the truth than the procedure listed earlier, which uses the sample mean as the estimate). Rather, your discomfort is because you'd expect the procedure to yield an underestimate. Let Z be the random variable corresponding to the end result of the procedure. Applying the Fundamental Concept, the flaw in this procedure can be stated very precisely: As long as there are at least two individuals in the population with different incomes, E[Z] < μ.

Call the end result of the original procedure X. In developing the language of estimation, we'll see that X has an expected value of precisely μ, no matter what the distribution of incomes across the population might be. The Fundamental Concept thus helps us to see when one statistical procedure might have more-desirable properties than some other procedure.

How subject are we to sampling error when we use the original procedure to make an estimate? We “expect” the procedure to give us the correct result. But any one time that the procedure is carried out, the actual result might differ somewhat from μ. The standard deviation of X, called the standard error of the estimate, measures how far from μ the procedure’s result will “typically” be, and hence is a direct measure of our exposure to sampling error.

Return to the main discussion.