Dr. Mark Gardener 

GO.. 

Dr Mark Gardener Associate Lecturer Ecology and Environmental Science The Open University 
Choosing a stats testDon't Panic  it's just a matter of following a simple flowchartWhich stats test should I use?The decision about which stats test to use for your project or investigation can seem daunting as there are quite a few to choose from. However, by following a few simple steps you can pin down the correct one quite easily. It is important to put this decision into the planning stage of your investigation and not leave it until you have collected a pile of data! It is easy to break down the decision into a series of steps  at each stage you answer a simple question about your investigation. To do this you need to understand a very small amount of terminology. If you are familiar with some of the terms then you can go straight to the decision tree right now. Otherwize you may wish to work through the following paragraphs to familiarize yourself with some of the terms. See also: Training Courses  R and Excel Tips, Tricks & Hints  MonogRaphs  Writer's bloc New: Statistics – a guide 

Get a discount when you buy direct from Pelagic Publishing Enter the voucher code in the shopping basket: 
Have you seen?A fewof my textbooks that might help with statistics and data analysis: 

See my Amazon Author Page Back to Top. 

On Facebook:  
Back to Top. 
Type of investigation:The first decision is to decide what sort of investigation you are dealing with. We can split into two main types  differences or similarities. Differences: In all cases you have measure or counted one thing and wish to compare this factor across two samples. It is possible to have more than two samples, e.g. you may have looked at how many bluebells there were in different woods, oak, ash and beech. In general it's best to stick with just two things. These sorts of investigation can be best summed up using a bar chart. Similarities: Correlation: Association: 

Back to Top. 
Number of samples:Usually you will have two samples  you will be comparing two things. For example you might be comparing the size of oak leaves from trees in a plantation to trees out in the open (by themselves), you may be comparing the number of millipedes in leaf litter from conifer woods to deciduous woods. Sometimes you may wish to look at more than two things. You may want to look at how the type of soil affects plant growth. You might have peat, coir, compost and regular soil  four things! It is perfectly possible to do this but there are pitfalls. Your final analysis will firstly tell you if there is a difference between the four (in this case) but not necessarily which ones. You will have to do further analyses to pick out where the differences are. More importantly, your sampling effort will be spread out over four samples  you will have fewer data in each sample than if you stuck to two things. It would be a lot better to collect 10 readings from two of the soil types than to get 5 from each of the four. 

Back to Top. 
Matched pairs:Sometimes the readings/data you collect form what are known as matched pairs. This is where the reading from one sample is matched up with one specific reading from the second sample. An example might be where you looked at the abundance of moss on trees  you determined how much moss was on the northfacing side and the southfacing side of a number of trees. The readings from each tree naturally form matching pairs. Perhaps you looked at the amount of lichen on the to sides of gravestones in a church yard  the readings from the two sides are naturally closely connected. You may have looked at the size of ivy leaves on two sides of a wall (north and south facing). in this case the two samples are not matched pairs! There is no reason why any particular measurement on the north side should be tied to a particular measurement on the south. If you had several walls and each pair was at the same position along the wall then you'd be safe to say they were matched. With a single wall you are on dodgy ground (although it's a perfectly good example of a "differences" investigation). Sometimes you may look at a situation over time  that may be a matched pair too. For example, you might wish to look at butterfly visits to patches of flowers at different times of day (morning and afternoon). If you had several patches of flowers then your "morning" and "afternoon" readings could be considered as matched pairs (same place, different time). In general it's best to treat all situations as not matched unless you can be absolutely certain that your readings really "match up". 

Back to Top. 
Type of data:There are three main types of data (and you thought they were all just numbers!). They are: interval, ordinal and categorical. Interval: Ordinal: The bottom line is that you can order the data into size order but you cannot do proper maths on the data because the interval varies between the categories. Categorical: What you have is categorical data  the birds are one category and the habitats are the other category. It's the arrangement that makes the data categorical. Another example of this sort of arrangement comes when you are looking for the presence or absence of something. You may take a small square (a quadrat) and put it on the ground. There will be a number of plant species in the square  you mark a tick if a species is present, leave a blank if it's not. You do this lots more times. Later on you can determine how many quadrats contained each species. You can also determine how many quadrats contained two particular species at the same time (and how many contained neither). Your categories are now the presence and the absence of species one and the presence and absence of the species two. Another sort of categorical experiment crops up in genetics. You might expect a certain pattern to occur in a hybrid of two different plants for example. So, if you know the ratio of plant types (e.g. flower colour) expected by the genetics you can carry out a goodness of fit test on the actual ratio of observed plant types in your experiment. In this case the categories would be the plant types (there is only one category here as opposed to the previous examples). 

Back to Top.  Distribution of data:Whenever you collect some data you end up with a whole bunch of numbers. The distribution refers to how many times each number crops up  it's shorthand for frequency distribution. Often you don't count up how many times each individual number occurs but rather you cerate small categories or bins  each bin contains a range of numbers (e.g. 03, 47, 811 and so on). The usual thing is to create at least 910 of these bins. If you do a bar chart of the frequencies (properly called a histogram) you can see how many times each number (rather range of numbers) crops up. If you see a pattern where the middle of the graph shows a distinct hump, with the sides tailing off equally, you have probably got what is called normal distribution.The features of this sort of distribution are well know mathematically and some statistics tests make use of this (another name for the distribution is parametric). Things that tend to be normally distributed are weight and height. However, you may see that the hump where the greatest number of data items lie is not in the middle but rather neare to one end. This sample is showing a skewed distribution.The numbers are not distributed symmetrically around the middle point but lumped towards one end. This sort of distribution is also known as nonparametric. Of course you will not know in advance if your data are going to be normally distributed or not. However, it is possible to have a good guess. This is where a pilot study is useful as it can help to determine if your data are parametric or not. Some sorts of data are not usually parametric. If you were comparing the numbers of freshwater shrimps in open water to shady sites your data are not likely to be parametric. Counting of animals rarely is, you tend to get sudden "lumps". If you are measuring the coverage of plant species and using the % of the ground occupied by each plant you are also unlikely to get normally distributed data. In this case the % can never be less than zero or much above 100% (you can get >100% with overlapping plants)  this means that the ends of your distribution are "fixed". Ordinal data that have been converted to a rank order are also not normally distributed. You could convert an ACFOR scale to a numerical scale (05) with A = 5, C = 4 and so on but the data would not be parametric. In summary: you should check the frequency distribution of your data. Interval data may be normally distributed (i.e. symmetrical around a central hump) or may be skewed. Ordinal data are always nonparametric. 

Back to Top. 
Amount of data:In general the more data you collect the better. However, in practice you have limited time to spend collecting data. Each statistical test has a set of requirements, some tests need to have >30 measurements for example and so if you had less then you would have to select a different analysis. It's a good idea to estimate how long it's going to take you to collect some data and then work out how much you can do in the time available. If you do a pilot study you will have a good idea if your target is achievable. 

Back to Top.  Selecting the right statistical analysis  flowchart/decision treeStart with the top line  you will have a choice. Make your choice and go to the next line down. As you move down the table you will make more choices until you have settled on the best statistical approach for your investigation/project. 

Back to Top.  Start here ===> 
Is your analysis concerned with Differences or Similarities? 
<=== Start here 

Do you have two
samples to compare or more than two? 
Are your data categorical?
Are you looking for an association between categories or do you have
two factors (can you draw a scatter graph)? 

More than two samples 
Two samples only

Categorical  Association 
Two variables to correlate 

Are your data normally distributed or nonparametric? 
Do you have two sets of categories or only one? 
Are both variables normally distributed or nonparametric? 

Two

One 

You need ANOVA, analysis of variance 
Use the Kruskal Wallis test 
Use Chi Square analysis
to look for association between the various categories 
Use Goodness of fit test to look for a
fit between the expected ratio between the categories and the observed
ratio 
Use Spearman Rank correlation
to determine the strength of the relationship

Use regression analysis and Pearson's correlation coefficient
to determine the strength of the relationship and also to be able to use
one variable to predict the other 

Are your samples in matched pairs? 

Matched pairs 
Not in matched pairs 

Do you have >25
pairs of data? 
Do you have >25
data items in each sample? 

>25
data per sample 
<25
data items per sample 

Use the z test for
matched samples 
Are your data normally distributed or nonparametric? 
You need the z test
for unmatched samples 
Are your data normally distributed or nonparametric? 

Use the t test for matched pairs 
Use Wilcoxon matched pairs analysis 
Use the t test for unmatched samples 
Use the Mann Whitney U test 

Back to Top.  
GO... 
