Fundamentals of Statistics

Lesson Number One:

There is good statistics and bad statistics.

As with most thing in life, what is bad can only be appreciated by
understanding what is good. An example of a good statisticis is “the average
weight of the Raider’s offensive line 268 pounds”. This is known as a
Descriptive Statistic. It is a measurement that tell you something about a
group of objects or events.


I have always been amazed at how few good statistics there are. There is the
mean (very popular), the standard deviation ( the root of all bad statistics
), and a couple special ratios know as percentages and probabilities ( useful,
but poorly understood).

This is certainly not the case with bad statistics ( also known as Inferential
Statistics). There are hundreds of them, and more being invented every day. In
fact, in my youth, before I knew better, I devised several myself.
Like most good things Descriptive Statistics are simple, straightforward and
inherently honest. Note:
- there is no sample and therefore no sample size, no sampling error, no
confidence intervals, or any of those other things you find in bad statistics.

There are only six guys on the Raiders offensive line. Why sample, just
weigh them all.

- no universe to be concerned about either. There are other Raiders on
the team, and, of course, other teams with their own offensive lines, but that
is not my problem. Let somebody else weight them. I am making no ” inference”
beyond my six linemen.

- and finally, no guesses and no predictions. What did they use to
weight? What will they weight tomorrow? I don’t know. They averaged 268 pound
when I weighed them — period.

Like most good things Descriptive Statistics are also boreing, of limited use,
and seldom get asked out.

Inferential Statistics, on the other hand, are wildly inaccurate, totally
misleading, and immensely popular — and the subject of Lesson
Number Two.

Lesson Number Two

I was about to introduce Statistical Inference. Before I can do so though, I
must define a few terms.
A statistic, I have already stated, is a measure which tells you something
about a group of objects or events.
A Sample is a portion of a larger group, usually selected to be
representative.
The larger group is know as the Universe. Just as a statistic is a measure of
the smaller group or of a sample, the measure of the universe is called a
Parameter.
Statistical Inference ( which is the basis of market analysis ) rests on a
construct known as the Central Limit Theory which simply, but very powerfully,
states that
” a statistic is the best estimate of the parameter.”
It is the word “best” that causes all the trouble. It is a very old theory
and best does not mean good or superior as we now use it.
It means best as compared to all others — “closest estimate” would probably
be today’s way of stating it. Thus, even though a particular statistic might
be a terrible estimate of a parameter — completely wrong and misleading, it
is still the “best.”

Now why is this so important? It is because nobody cares about statistic by
themselves — they are next to useless. All they tell us about is the sample
or, in the case of the Raider’s line, a non-generalize-able group.

It is the parameters we are after.
But since universes and their parameters can not be measured directly
(too big, too clumsy, it is too expensive, might even be hypothetical - not
even real, perhaps it is in the future - often the case when predicting, or
maybe lost in the the past - and no one bothered to measure it before it
disappeared)
all we have to work with are those erstwhile statistics.

It is this relationship between a statistic and a parameter that is the engine
behind all that we commonly call STATISTICS, including opinion polls, surveys,
market research, advertising claims, weather forecasts, test costruction,
practically all medical research, most agricultural research, economic
forecasting — in short, every type of numerical analysis that examines a few
thing and then concludes about all things. Or, for your purposes, examines
yesterday’s market and predicts tomorrow’s.

Thus, the process of making statistical inferences (regardless of the
scientific discipline or the the subject matter) consists of :
1.–sampling ( or examining or experimenting ) as carefully as we can,
2. –measuring something and calculating our statistic, and
3.–determining the accuracy of our statistic ( how good an estimate is
it of the parameter).

In a nutshell — we do all we can to make the ” best estimate” better by
careful sampling , and then we calculate how good a job we did (that
is, how close we are to the parameter) by reference to probability
distributions.

The rest is normal curves, confidence intervals, significant tests, regression
analysis, etc, etc –

Comments are closed.