scatter diagram or scattergram is used to display the results when two sets of data are compared to see if there is a relationship between them.

Some examples:

The heights of people against their weight.

The size of icebergs and their distance from the South Pole.
The age of kiwifruit plants against the number of kiwifruit produced by the plants.

See if you can work out which value goes on which axis. Does it matter?

If all of the points lie on or near a straight line there is said to be a linear correlation between the two sets of data.

The first graph above has a positive linear correlation, as one quantity increases so does the other.
The second graph above has a negative linear correlation, as one quantity increases the other decreases.

The line that best fits the points is called the trend line, the line of best fit or the regression line.

There are mathematical methods for drawing this line but a simple approximate method is to have half of the points above the line and half below the line.


Graphical calculators, such as the Casio CFX-9850G, shown in the picture, can plot a scatter diagram, draw the regression line, give its equation and calculate the correlation coefficient, r.

The closer r is to 1 or -1 the better the fit of the line to the data and the stronger the relationship between the two sets of data.

The formal study of correlation and regression is outside the scope of the Bursary Statistics course.

Plotting a Scatter Diagram

Suppose a teacher wants to see if there is a connection between the exam marks of students studying both mathematics and physics. The table shows the results of 20 students.

Student number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Mathematics mark 45 67 93 45 56 67 68 34 54 89 59 60 43 90 41 30 56 76 89 65
Physics marks 56 69 89 39 52 61 69 43 59 94 60 52 41 84 41 39 60 73 92 62

These results could be put into two columns of a spreadsheet and the following scatter diagram would result.


From the graph there would appear to be a positive linear correlation. This means that from the data it looks like there is a relationship between the two sets of marks. A student scoring well in mathematics is also likely to score well in physics.

Note that on a scatter diagram it does not matter which variable is placed on a particular axis.

Care! must be taken when reaching conclusions from scatter diagrams. Reasons other than mathematical ones, may need to be considered. For example, there may be a mathematical correlation between the number of road deaths and the number of four wheel drive vehicles on the road, but it would be dangerous to say that there is a connection.

Is the recent fall in the number of road fatalities because of the increase in the number of four wheel drive vehicles?