Dr. Mark Gardener

Gardeners Own Home
Navigation Index
Education Home
About Us

On this page...


Data files

Inputting data

Seeing your data in R

What data are loaded?

Removing data files

Help and Documentation

Using R for statistical analyses - Introduction

This page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics (see here for details about those). If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going.

I run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. From 2013 courses will be held at The Field Studies Council Field Centre at Slapton Ley in Devon. Alternatively I can come to you and provide the training at your workplace. See details on my Courses Page.

On this page learn how to create data files, read them into R and generally get ready to perform analyses. Also find out about getting further help and documentation.

See also: R Courses | R Tips, Tricks & Hints | MonogRaphs | Writer's bloc

My publications about R

See my books about R on my Publications page

Statistics for Ecologists | Beginning R | The Essential R Reference | Community Ecology | Managing Data

Statistics for Ecologists, cover Beginning R, coverEssentaial R Reference, coverCommunity Ecology, cover Managing Data Using Excel, cover

Statistics for Ecologists is available now from Pelagic Publishing. Get a 20% discount using the S4E20 code!
Beginning R is available from Wrox the publisher or see the entry on Amazon.co.uk.
The Essential R Reference is available from the publisher Wiley now (see the entry on Amazon.co.uk)!
Community Ecology is available now from Pelagic Publishing.

Managing Data Using Excel is available now from Pelagic Publishing. Get £5 discount using the MDUE20 code!

I have more projects in hand - visit my Publications page from time to time. You might also like my random essays on selected R topics in MonogRaphs. See also my Writer's Bloc page, details about my latest writing project including R scripts developed for the book.

Skip directly to the 1st topic

R is Open Source

R is Free

Get R at the R Project Page

What is R?

R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation.

R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes.

Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses.


Navigation index


Getting started with R:

What is R?
Data files
Inputting data
Seeing your data in R
What data are loaded?
Removing data sets
Help and Documentation


More about manipulating data and entering data without using a spreadsheet:

Making Data
Combine command
Types of Data
Entering data with scan()
Multiple variables
More types of data
Variables within data
Transposing data
Making text columns
Missing values
Stacking data
Selecting columns
Naming columns
Unstacking data

Help and Documentation

A short section on how to find more help with R


Basic Statistics

Some statistical tests:

Basic stats

Variance unequal
Variance Equal
Paired t-test
T-test Step by Step

Two sample test
Paired test
U-test Step by Step

Paired tests
T-test: see T-test
Wilcoxon: see U-test

Chi Squared
Yates Correction for 2x2 matrix
Chi-Squared Step by Step

Goodness of Fit test
Goodness of Fit Step by Step

Non-Parametric stats

Stats on multiple samples when you have non-parametric data.

Kruskal Wallis test
Kruskal-Wallis Stacked
Kruskal Post-Hoc test
Studentized Range Q
Selecting sub-sets
Friedman test
Friedman post-hoc
Rank data ANOVA



Getting started with correlation and a basic graph:

Correlation and Significance tests
Graphing the Correlation
Correlation step by step


Multiple regression analysis:

Multiple Regression
Linear regression models
Regression coefficients
Beta coefficients
R squared
Graphing the regression
Regression step by step


Analysis of variance:

ANOVA analysis of variance
Simple Post-hoc test
ANOVA Models
ANOVA Step by Step



Getting started with graphs, some basic types:

Bar charts
Stacked bars
Frequency plots
Horizontal bars


Box-whisker plots
Single sample
Horizontal plot


More graphical methods:

Scatter plot

Stem-Leaf plots

Pie charts


More advanced graphical methods:

Line Plots
Plot types
Time series
Custom axes



Navigation Index



R maintains a list of previous commands. Use the up and down arrows to scroll through them. You can then use the left and right arrows to edit and modify the command.


Once you have installed R and run the program you will see an opening window and a message along these lines:

R : Copyright 2006, The R Foundation for Statistical Computing
Version 2.3.1 (2006-06-01)
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]


The > is the "prompt", this is the point where you type in commands (or paste them in from somewhere else). The window you see is part of the GUI and some operations are possible from the menus (including quit). You will generally be asked if you wish to save the workspace. R stores a list of commands and any data sets that are loaded. It can be pretty useful to say "yes" and to save the workspace. The command history is available by using the up and down arrows. You can easily scroll back through previous commands and edit them if needed. You can copy items from previous commands or in fact from any window on the screen and paste them into the current command line. You can also use the left and right arrow keys to move through the current command.


Navigation Index

Data files

You are going to need some data to perform your analyses on. You can type your data into R directly but it is usually much better to use a separate program to hold the information. A spreadsheet is an invaluable tool for this as you can manipulate the data quite easily. R can read plain text files in various formats (e.g. tab delimited, space delimited, comma delimited) and most spreadsheets can save data in these ways. The most useful is comma delimited (.CSV), which R can handle quite easily.

The layout of the data file will depend upon the analysis you are going to run:

You can create a CSV file in a spreadsheet or a word processor. A spreadsheet is the most useful tool as you can easily manipulate the data later on.
In this case we have multiple variables arranged in columns. The rows are the replicates. This sort of arrangement is useful for analysis of variance and multiple regression. However, it can also be used for comparing just two factors (you don't need to use all the information) as in a t-test.
Multiple variables
In this case we have heading on both columns and rows. We have the same information as above and a bit extra. The data may be used for the same kinds of analysis as before but could also be used for tests of association (e.g. Chi-squared) or for ordination.
Rows and columns
Site a
Site b
Site c
Site d
Spp 1
In this instance we have two columns (samples) but the number of replicates is different. R reads the file as a rectangular frame and blank cells are recorded as NA. This may have to be taken account of in some analyses but for now we can assume it is not a problem.
Simple two-factor
You may also have data merely as numbers without any labels at all. This is not really to be recommended although R will assign row and column numbers to the data.


Navigation Index


R stores everything as variables. Your variable names can contain letters and numbers but the only puctuation mark allowed is a full stop.

Inputting data

The next step is to get your data into R. If you have saved your data in a .CSV file then you can use the read.csv(filename) command to get the information. You need to tell R where to store the data and to do this you assign it a name. All names must have at least one letter (otherwize it is a number of course!). You can use a period (e.g. test.1) but no other punctuation marks. R is case sensitive so the variable test is different from Test or teSt.

What you need to do is to copy the appropriate command into the clipboard. Then paste into R at the > prompt. You can then edit the command as you like and when ready press the enter key.

Reading data files
This command reads a .CSV file into R. You need to specify the exact filename. variable = read.csv(filename)
This command reads a .CSV file but the file.choose() part opens up an explorer type window that allows you to select a file from your computer. By default R will take the first row as the variable names. variable = read.csv(file.choose())
This reads a .CSV file, allowing you to select the file, the header is set explicitly. If you change to header=F then the first row will be treated like the rest of the data and not as a label. variable = read.csv(file.choose(), header=T)
In this case you can tell R that a specified column contains row names. This is likely to be the first so edit the # to 1. variable = read.csv(file.choose(), row.names=#)

To get a file into R with basic columns of data and their labels use:
variable = read.csv(file.choose(), header=T)

To get a file into R with column headings and row headings use:
variable = read.csv(file.choose(), row.names=1)

N.B. There are occasions when R won't like your data file. Check the file carefully. In some cases the addition of an extra linefeed at the end will sort out the issue. To do this open the file in a word processor and make sure that non-printing characters are displayed. Add the extra carriage return and save the file.

Seeing your data in R

Once you have persuaded R to read your data you will naturally want to check it is there! To view data stored in R you merely type the name of the variable that you stored it as.


Navigation Index

In the case on the right we had both row and column headers. When you type in the variable name you see the data framed more or less like this.

In this case we only had column headings. When displayed R adds a simple number to each row.

If we had neither row or column headings then the columns would also be numbered (in square brackets).


If you wish to view only a single variable (i.e. column) from your data set then you can. Simply add the variable name to the end of the data name along with a dollar sign so: bats$Hedge or field$Upper might be examples from the above two data sets.

It is not terribly convenient to have to append the $variable every time you want to do something on a data set. R provides a way to read these variables directly. Here is an example:

This reads in a .CSV file and assigns it to the variable field. The header is set to "True" be default and we don't have row names so we can use this short version. The file.choose() part opens up an explorer type window and allows us to pick the file from our computer. field = read.csv(file.choose())
This looks at the data set field and reads the names of the variables. It then sets each one as a variable in its own right. So in the example above we would now have new additional variables called Upper, Lower, Old, New. attach(field)

Now you can look at the overall data set e.g.
> field

You can look at a single factor e.g.
> Upper


So, it is a good habit to get into to read in your data set and then use the attach(data) function immediately. Use meaningful factor names and avoid single letters (e.g. x, y). If you already have a variable called by the same name it will be overwritten. You can avoid confusion by only working on one set of data at once.


Navigation Index

What data are loaded?

To see what data, variables etc. are loaded in R you can type a simple command:
> ls()

This lists the variables in memory.

In Windows you can list all the "objects" in memory from the Misc menu on the GUI toolbar.
On a Mac you can do something similar using the Workspace menu. The Mac version also has a "workspace browser". This shows all the variables and their properties (you can also view the items).

In both operating systems you can save the current workspace to a file (you can also read in a previously saved workspace). This will save any data and variables currently in memory (on Windows use the File menu and on the Mac use Workspace).

You can also get a list of the variables for each dataset by typing:

> names(dataset)


Navigation Index

Removing data sets

To remove a variable you can type a simple command:
> rm(variable)

This will remove the variable (in this case called variable) from the memory. If you have variables that are attached to your data they don't show up. You can do the opposite of attach(data) and detach(data), which removes them if and when the data are removed with rm(data).

In Windows you can remove all the "objects" in memory from the Misc menu on the GUI toolbar.
On a Mac you can do something similar using the Workspace menu.
This should be used with caution!
The Mac version also has a "workspace browser". This shows all the variables and their properties as well as allowing you to remove them.


Navigation Index

Help and Documentation

My Publications

See my books about R at my Publications Page:

Statistics for Ecologists using R and Excel. Published December 2011

Beginning R: The Statistical Progreamming Language. Available wherever great books are sold in June 2012

The Essential R Reference. Published November 2012


There are plenty of sources of help and information regarding R. Most are to be found on the R-Project website. Look under the 'Documentation' section. In the manuals section the "Introduction to R" document is a good start (available as HTML or a PDF). Also very good are:

“Using R for Data Analysis and Graphics - Introduction, Examples and Commentary” by John Maindonald [PDF].
“Simple R” by John Verzani [PDF]

These are available via the 'Contributed Documentation' section.


From 2009 I am going to be running a series of short courses in data analyses for conservation biologists. Some of these courses are based on use of R. The courses all run at the Preston Montford Field Centre in Shropshire, UK. More information can be found here or at the FSC website.

Help within R

The help system within R is comprehensive. There are several ways to access help:

Click on the 'Help' menu. There are a number of options available (depending upon your OS) but the main documentation is in the form of HTML.

If you want help on a specific command you can enter a search directly from the keyboard:


A shortcut is to type:


This is fine if you know the command you want. If you are not sure of the command you can try the following:


You type in a part.word and R will list all commands that contain that string of letters. For example:

[1] "count.rank" "dsignrank" "psignrank" "qsignrank" "rsignrank" "rank"

This shows that there are actually 6 commands containing "rank"; we can now type help() for any of those to get more detail.

If you run the HTML help you will see a heading entitled "Packages". This will list the packages that you have installed with R. The basic package is 'base' and comes with another called 'stats'. These two form the core of R. If you navigate to one of those you can browse through all the commands available.

R comes with a number of data sets. Many of the help topics come with examples. Usually these involve data sets that are already included. This allows you to copy and paste the commands into the console and see what happens.

Gardeners Own Home
Education Home
Navigation Index