Many people require statistics for a wide range of purposes.
- A school is keen to know whether boys or girls do best in certain subjects.
- Doctors may need to know what causes a particular illness or disease.
- A conservationist wants to find out how many different types of whales there are in a certain location.
Designing a Statistical Survey
The first task is to plan the investigation. This will require thinking about lots of issues:
What are the aims of your investigation?
What questions is your investigation trying to answer?
How are you going to display the results?
Who are you surveying?
How will you collect the data?
Will your sample be random?
How big will your samples be?
Who will be using your final report?
Will your raw data be easy to process, display and analyse?
All of these questions have to be answered before starting an investigation.
The collection of data takes time and is therefore expensive. A balance must be found between having a sample large enough to be representative and the excessive cost of taking a large sample.
A population in statistics is all of the group being considered in an investigation or survey. Note that in statistics the term population does not necessarily mean the entire population of a town or country, it refers to the overall group being studied. The people who are being surveyed can be a small group, for example, students at a school or a large group such as all of the people in part of a country or state. The group being investigated is called the target population. Once the target population has been decided the next task is to try to make sure that the results of the survey are as representative as possible of this population.
Two ways that data can be collected are by a questionnaire or an interview.
A questionnaire is a form with questions designed to obtain information. Careful preparation of questionnaires is essential and may require special training. Questions which are hard to understand, are ambiguous or lead the respondent to give a particular answer must be avoided. Questionnaires that are given or posted to people often result in a low return rates.
An interview could be carried out by stopping people in the street or ringing people to ask the questions. Problems with this form of data collection include resistance by people to give up time to answer questions.
If questionnaires and interviews are not carefully designed and administered they can often produce biased samples.
The New Zealand Census, which collects information about the entire population of the country is held every five years and thousands of people are employed to collect and analyse the data. The last New Zealand census was in 1996 and the next on will be in 2001. Governments use a census to help plan for the future.
In Australia, the Census, is also held every 5 years and the last two were held in 1996 and 2001.
A sample is obtained when only part of the population is surveyed. Most statistical investigations do not set out to obtain data about every item in a population, as in a census, but rely on a sample from the population.
Examples Every person in a particular electorate could not be asked how they intended to vote before an election.
A wine company could not taste every single bottle of wine!
Taking a sample is usually much cheaper, quicker and more convenient than a census.
The aim when taking a sample is to obtain information which is representative of the whole population. If it is not, then the sample is said to be biased.
Choosing a sample
- The first thing to decide is the size of the sample. This would need to be large enough to be truly representative but not too large as this would be too expensive and time-consuming.
- A sample should be evenly spread over the population.
- A sample should be as random as possible. In a random sample every member of the target population has an equal chance of being chosen. A calculator or a spreadsheet can produce random numbers and there are tables of random numbers.
There are several ways to obtain a random sample:
Draw names out of a hat or balls out of a barrel, like a Lotto draw.
Give every person or item a number and choose the numbers at random, using special tables, a computer or a spreadsheet.
Example In a school of 900 students, a sample of 20 has to be chosen. Allocate a number from 1 to 900 to each student from a list of students.
Random number tables These contain strings of random digits. Start anywhere and select groups of three. If the number chosen is above 900 discard it.
Calculators Most calculators have a RND# button. When pressed it results in a three digit decimal number such as 0.439. Multiplying these numbers by 1000 will produce numbers from 1 to 999. Again, discard numbers over 900.
Spreadsheets A spreadsheet uses a function such as RAND( ), where a number is placed in the brackets and random numbers from 1 up to that number are given.
i.e. RAND(900) produces random numbers between 1 and 900.
Choosing names at regular interval, say every 100 names, from an alphabetical list such as a telephone book is quite representative of the people living in a certain area. However, it is not truly random as some people do not own phones, but it is easy to carry out.
Stratified and Cluster Sampling
Sometimes, it is desirable to ensure that there is sufficient representation in a sample from different groups within a population.
Example In a school there may be a need to get representation across different form levels. The number of students chosen from each level should be proportional to the total number of students at that level.
e.g. In a school of 1000, there are 220 Year 9 students, in a stratified sample of 50 for the whole school, how many Year 9 students should there be?
Example When a newspaper carries out an opinion poll, they often ask people from a country's biggest cities, i.e. in New Zealand: Auckland, Hamilton, Wellington and Christchurch. In Australia it could be the largest city in each state. This is an example of cluster sampling. This is obviously not very random as it does not consider all of the people living in smaller towns and cities or those living in rural areas. The number sampled in each city would be proportional to the population of that city and worked out in a manner similar to the example above.
Random methods of selecting the sample should still be used within each strata or cluster.
When the data has been collected, it is often summarised into table and graphical form.
Data can also be sorted using a stem and leaf diagram or using tallies in a frequency distribution and then displayed in a histogram or cumulative frequency graph. Statistical calculations can then be carried out to find information such as the mean, median and quartiles and these can then be shown on a box and whisker diagram. Calculators and computers can be used for calculating more complex statistics such as standard deviation.
These graphs and statistics are studied in more detail in the next few topics.
Displaying and Reporting the Investigation
Finally, and most importantly, the data, tables, graphs and statistics can be analysed and presented in a report. Pictographs, column and pie graphs are common ways of displaying the data along with those mentioned above.
Along with the data and analysis there should be an introduction listing the objectives of the investigation , a summary of the findings and conclusions, possibly with recommendations.