Hogweed and environmental conditions
These data show counts of the plant hogweed (Heracleum sphondylium) on a road verge in the UK. One sample is adjacent to the road and the other is beside a small brook. There are also two environmental measures, the soil water content (%) and nitrate concentration (mg/l). Each sample has 15 observations.
Table 1. Hogweed and environmental data from two samples at Preston Montford, UK.
H2O | NO3 | Hog | Site |
17.02 | 1.7 | 0 | road |
15.16 | 1 | 12 | road |
17.38 | 0.8 | 0 | road |
18.31 | 1.4 | 0 | road |
18.41 | 1 | 7 | road |
18.83 | 1.5 | 16 | road |
17.88 | 1 | 12 | road |
17.89 | 1.2 | 8 | road |
18.4 | 0.8 | 2 | road |
20.16 | 1.7 | 1 | road |
17.01 | 0.8 | 1 | road |
19.13 | 1.4 | 0 | road |
17.31 | 1.3 | 1 | road |
20.65 | 0.7 | 3 | road |
21.31 | 1 | 3 | road |
20.66 | 0.6 | 3 | brook |
20.07 | 0.8 | 24 | brook |
23.35 | 1.7 | 5 | brook |
24.7 | 1.5 | 8 | brook |
20.9 | 0.7 | 9 | brook |
26.33 | 0.9 | 23 | brook |
22.37 | 1.3 | 25 | brook |
20.33 | 1.9 | 8 | brook |
24.09 | 1.3 | 6 | brook |
30.24 | 1.2 | 3 | brook |
25.25 | 1.3 | 6 | brook |
24.47 | 1.7 | 3 | brook |
21.51 | 1.8 | 9 | brook |
25.88 | 1.8 | 12 | brook |
22.74 | 1.4 | 1 | brook |
Download
You can download the dataset as a TXT file using this link: <hogweed-count-env.txt>. The file is Tab separated and will open in a text editor or a spreadsheet. Alternatively, you can copy/paste the data from the preceding table.
Usage
You can use these data to practice/illustrate various topics:
- Data distribution (e.g. are the data normally distributed?).
- Pivot Tables.
- Summary statistics.
- Differences hypothesis test (e.g. Wilcoxon rank sum test).
- Correlation (e.g. are soil water and nitrate content related?)
- Graphics (e.g. box-whisker plot).
Keywords:
Plant, hogweed, soil moisture, soil nitrate, water, count, U test, differences, data distribution, correlation.
Examples
The following examples will give you a few ideas about how you might explore or use these data.
Data distribution
There are three response variables to explore: H2O, NO3 and Hog. You can use histograms to visualise the distribution of the samples for each of the variables (there are two sample sites, road and brook). You can also test the hypothesis that the samples depart from normal distribution using a Shapiro-Wilk test.
Site H2O NO3 Hog 1 brook 0.3168185 0.2836113 0.005554544 2 road 0.7921649 0.1912800 0.004531264
The p-values from a Shapiro-Wilk test show that the environmental variables are not significantly different from normal distribution, but the count of individual plants does depart from normality.
Pivot Tables
If you open the data using a spreadsheet you could use a Pivot Table to summarise the data. However, the summary statistics you can apply are limited in a Pivot Table.
Data summary
Each of the three response variables can be summarised across the two samples. You will need parametric summaries for H2O and NO3, but non-parametric for the Hog data.
Differences test
A simple differences test can be used to explore differences between samples. For H2O and NO3 the Student’s t-test is appropriate. For the Hog data you will need a non-parametric alternative, such as the U-test (Wilcoxon Rank Sum).
Correlation
You might also consider looking at correlations between the variables, is there a link between the two soil factors for example?
Graphics
There are several potentially useful graphics you could employ:
- Histogram for data distribution.
- Box-whisker plot for differences between samples.
- Scatter plots for correlations between variables.
The scatter plot matrix shows each pairwise scatter plot. The red lines show a polynomial scatterplot smoother to help visualise the relationships. The lower triangle shows the correlation statistics (r = Spearman correlation coefficient, p = p-value, n = sample size).
References
Undergraduate field project, (2010). SXR216: Environmental Science, Open University.
Links
Data examples:
- Statistics for Ecologists: support files and example data.
- Statistics for Ecologists: exercises and notes.
- Community Ecology: support files and notes.
- Managing Data using Excel: support files and example data.
Custom R functions:
- Community Ecology: custom R functions.
General data science articles:
- DataAnalytics Knowledge Base. For general topics and articles about data science, including Learning R: the statistical programming language
- DataAnalytics Tips and Tricks. for articles covering a range of topics in data science, including Using R, Using Excel, quantitative data analysis, predictive data analysis and a lot more besides.
See our Publications Page for an overview of our book on Ecology, Environmental Science and R: the statistical programming language.