Unit 6
Data Analysis and Statistics

For most of us, the word “statistics” triggers a mental image of lots and lots of numbers. In a sense, the image is not far removed from reality. The first thing a statistician does in his or her effort to analyse a given problem is to collect lots and lots of numerical facts, called raw data. But the collection of raw data is only the beginning—numbers alone do not make statistics, it is what you do with them that matters. After collecting the raw data, a statistician must organize them in an orderly fashion and present them in a meaningful way, so that some coherent, relevant information about the problem can emerge. The process of collecting, organizing, and presenting is called “descriptive statistics” and is the subject of this unit.

We will begin by discussing two major ways of presenting statistical data—in a table or graphically (e.g., in a line, bar, or pie chart). Next, we deal with measures that summarize and describe the properties of a given set of data. It is customary to provide two such measures: a measure of average and then a measure of dispersion. There are three measures of average, also called measures of centre—the mean, the median, and the mode. The simplest measures of dispersion are the range and the quartile deviation, which depend on only some of the data. Measures of dispersion that depend on all of the data are the average deviation and the standard deviation, of which the latter is the most commonly used. Finally, we will discuss frequency distributions and measures of dispersion.

Objectives

After completing this unit, you should be able to perform the following tasks.

  1. Organize statistical data, and present them in the form of tables and graphs, including line, bar, and pie charts.
  2. Define and distinguish among the terms “mean,” “median,” and “mode,” and calculate each of these common measures of central tendency.
  3. Define the terms “range” and “standard deviation,” and compute each of these measures of dispersion.
  4. Prepare a frequency distribution table, and calculate the mean and standard deviations using such a table.