Dr. Mark Gardener

GO...
Gardeners Own Home
Using R Introduction
Navigation Index
About Us

On this page...

Introduction to graphing

Scatter plots

Stem and leaf

Pie charts

Using R for statistical analyses - Graphs 2

This page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics (see here for details about those). If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going.

I run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. From 2013 courses will be held at The Field Studies Council Field Centre at Slapton Ley in Devon. Alternatively I can come to you and provide the training at your workplace. See details on my Courses Page.

On this page you can find out information on producing a range of graphs to illustrate your analyses. Specifically on this page find out about scatter plots, stem-leaf plots and pie charts. To find out about bar charts, histograms and box-whisker plots go to the graphs1 page.

See also: R Courses | R Tips, Tricks & Hints | MonogRaphs | Writer's bloc


My publications about R

See my books about R on my Publications page

Statistics for Ecologists | Beginning R | The Essential R Reference | Community Ecology | Managing Data

Statistics for Ecologists, cover Beginning R, coverEssentaial R Reference, coverCommunity Ecology, cover Managing Data Using Excel, cover

Statistics for Ecologists is available now from Pelagic Publishing. Get a 20% discount using the S4E20 code!
Beginning R is available from Wrox the publisher or see the entry on Amazon.co.uk.
The Essential R Reference is available from the publisher Wiley now (see the entry on Amazon.co.uk)!
Community Ecology is available now from Pelagic Publishing.

Managing Data Using Excel is available now from Pelagic Publishing. Get £5 discount using the MDUE20 code!

I have more projects in hand - visit my Publications page from time to time. You might also like my random essays on selected R topics in MonogRaphs. See also my Writer's Bloc page, details about my latest writing project including R scripts developed for the book.


Skip directly to the 1st topic

R is Open Source

R is Free

Get R at the R Project page.

What is R?

R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation.

R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes.

Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses.


Top

Navigation index

Introduction

Getting started with R:

Top
What is R?
Introduction
Data files
Inputting data
Seeing your data in R
What data are loaded?
Removing data sets
Help and Documentation


Data2

More about manipulating data and entering data without using a spreadsheet:

Making Data
Combine command
Types of Data
Entering data with scan()
Multiple variables
More types of data
Variables within data
Transposing data
Making text columns
Missing values
Stacking data
Selecting columns
Naming columns
Unstacking data


Help and Documentation

A short section on how to find more help with R

 

Basic Statistics

Some statistical tests:

Basic stats
Mean
Variance
Quantile
Length

T-test
Variance unequal
Variance Equal
Paired t-test
T-test Step by Step

U-test
Two sample test
Paired test
U-test Step by Step

Paired tests
T-test: see T-test
Wilcoxon: see U-test

Chi Squared
Yates Correction for 2x2 matrix
Chi-Squared Step by Step

Goodness of Fit test
Goodness of Fit Step by Step


Non-Parametric stats

Stats on multiple samples when you have non-parametric data.

Kruskal Wallis test
Kruskal-Wallis Stacked
Kruskal Post-Hoc test
Studentized Range Q
Selecting sub-sets
Friedman test
Friedman post-hoc
Rank data ANOVA

 

Correlation

Getting started with correlation and a basic graph:

Correlation
Correlation and Significance tests
Graphing the Correlation
Correlation step by step


Regression

Multiple regression analysis:

Multiple Regression
Linear regression models
Regression coefficients
Beta coefficients
R squared
Graphing the regression
Regression step by step


ANOVA

Analysis of variance:

ANOVA analysis of variance
One-Way ANOVA
Simple Post-hoc test
ANOVA Models
ANOVA Step by Step

 

Graphs

Getting started with graphs, some basic types:

Introduction
Bar charts
Multi-category
Stacked bars
Frequency plots
Horizontal bars

Histograms

Box-whisker plots
Single sample
Multi-sample
Horizontal plot


Graphs2

More graphical methods:

Scatter plot

Stem-Leaf plots

Pie charts


Graphs3

More advanced graphical methods:

Line Plots
Plot types
Time series
Custom axes

Bottom


R is not a point and click interface. However, it has great power and versatility.

Introduction to Graphing

R has great graphical power but it is not a point and click interface. This means that you must use typed commands to get it to produce the graphs you desire. This can be a bit tedious at first but once you have the hang of it you can save a list of useful commands as text that you can copy and paste into the R command line.


Top

Navigation Index

Scatter Plots

A scatter plot is used when you have two variables to plot against one another. R has a basic command to perform this task. The command is plot(). As usual with R there are many additional parameters that you can add to customise your plots.

The basic command is:

plot(x, y)

Where x is the name of your x-variable and y is the name of your y-variable. This is fine if you have two variables but if they are part of a bigger data set then you have to remember to attach(data.file) your data set. A more powerful command is:

plot(y ~ x, data= your.data)

Note the use of the model syntax. This model syntax is used widely in R for setting-up ANOVA and regression analyses for example (see also it's use in the box-whisker plot).

R comes with a number of data sets built-in; these are used in the examples and can be useful to 'play with'. For example the data set cars contains two variables speed and dist.

To see a basic scatter plot try the following:

plot(dist ~ speed, data= cars)

This basic scatter takes the axes labels from the variables and uses open circles as the plotting symbol. As usual with R we have a wealth of additional commands at our disposal to beef up the display. A useful additional command is to add a line of best-fit. This is a command that adds to the current plot (like the title() command). For the above example we'd type:

abline(lm(dist ~ speed, data= cars))

The basic command uses abline(a, b), where a= slope and b= intercept. Here we use a linear model command to calculate the best-fit equation for us (try typing the lm() command separately, you get the intercept and slope).

If we combine this with a couple of extra lines we can produce a better looking plot:

plot(dist ~ speed, data= cars, xlab="Speed", ylab="Distance", col= "blue")
title(main="Scatter plot with best-fit line", font.main= 4)
abline(lm(dist ~ speed, data= cars), col= "red")

 

This illustrates several of the additional commands. We have set the axis labels and the colour of the plotting symbols. Next we added a main title and set the font to bold italic (try other values). Finally we set the best-fit line and made it red.

We can alter the plotting symbol using the command pch= n, where n is a simple number. We can also alter the range of the x and y axes using xlim= c(lower, upper) and ylim= c(lower, upper). The size of the plotted points is manipulated using the cex= n command, where n = the 'magnification' factor. Here are some commands that illustrate these parameters:

plot(dist ~ speed, data= cars, pch= 19, xlim= c(0,25), ylim= c(-20, 120), cex= 2)
abline(lm(dist ~ speed, data= cars))
title(main="Scatter plot with altered y-axis")

Here the plotting symbol is set to 19 (a solid circle) and expanded by a factor of 2. Both x and y axes have been rescaled. The labels on the axes have been left blank and default to the name of the variable (which is taken from the data set).


Top

Navigation Index

Stem and leaf plots

A very basic yet useful plot is a stem and leaf plot. It is a quick way to represent the distribution of a single sample. The basic command is:

stem(variable)

Here is a vector of numbers saved as the variable test.data:

[1] 2.1 2.6 2.7 3.2 4.1 4.3 5.2 5.1 4.8 1.8 1.4 2.5 2.7 3.1 2.6 2.8

To see the stem plot of these data we type:

stem(test.data)

The decimal point is at the |


1 | 48
2 | 1566778
3 | 12
4 | 138
5 | 12

We can now see quite clearly that the data are not normally distributed. This is a useful command for moderately small samples as you can easily re-construct the original data from the plot. For other samples the barplot function may be used to create a frequency plot. Alternatively a histogram may be more useful.


Top

Navigation Index

Pie charts

Pie charts are not necessarily the most useful way of displaying data but they remain popular. We can produce pie charts easily in R using the basic command pie()

To start with get your data organised into a .CSV file. Make a file with multiple columns then each column can have a title and a single value (to plot). Here is a simple example file:

First
Second
Third
Fourth
Fifth
Sixth
12
16
25
11
6
4

To produce a simple pie chart we type the following:

pie(pie.data)

This is a basic chart; we can see that the names of the columns have been appended to each slice. We can add a title in the usual way using the title() command.

By default the slices are presented in anti-clockwise order, we can alter this by adding a simple command clockwise= TRUE

The colours are set to pastel shades by default, to alter them you can add a list of colours to the command line in the form col= c("col1", col2", col3"). Here is the finished article:

pie(pie.data, clockwise=TRUE, col= c("red", "orange", "yellow", "green", "blue", "purple"))
title(main="Clockwise Pie Chart with custom colours", font.main= 4)

Now we have clockwise slices with our own selection of colours. The title was set with a separate command and the font set to bold italic (try other values).


 
Gardeners Own Home
Top
Navigation Index