Hogweed and environmental conditions

These data show counts of the plant hogweed (Heracleum sphondylium) on a road verge in the UK. One sample is adjacent to the road and the other is beside a small brook. There are also two environmental measures, the soil water content (%) and nitrate concentration (mg/l). Each sample has 15 observations.

Table 1. Hogweed and environmental data from two samples at Preston Montford, UK.

H2O NO3 Hog Site
17.02 1.7 0 road
15.16 1 12 road
17.38 0.8 0 road
18.31 1.4 0 road
18.41 1 7 road
18.83 1.5 16 road
17.88 1 12 road
17.89 1.2 8 road
18.4 0.8 2 road
20.16 1.7 1 road
17.01 0.8 1 road
19.13 1.4 0 road
17.31 1.3 1 road
20.65 0.7 3 road
21.31 1 3 road
20.66 0.6 3 brook
20.07 0.8 24 brook
23.35 1.7 5 brook
24.7 1.5 8 brook
20.9 0.7 9 brook
26.33 0.9 23 brook
22.37 1.3 25 brook
20.33 1.9 8 brook
24.09 1.3 6 brook
30.24 1.2 3 brook
25.25 1.3 6 brook
24.47 1.7 3 brook
21.51 1.8 9 brook
25.88 1.8 12 brook
22.74 1.4 1 brook

Download

You can download the dataset as a TXT file using this link: <hogweed-count-env.txt>. The file is Tab separated and will open in a text editor or a spreadsheet. Alternatively, you can copy/paste the data from the preceding table.

Usage

You can use these data to practice/illustrate various topics:

  • Data distribution (e.g. are the data normally distributed?).
  • Pivot Tables.
  • Summary statistics.
  • Differences hypothesis test (e.g. Wilcoxon rank sum test).
  • Correlation (e.g. are soil water and nitrate content related?)
  • Graphics (e.g. box-whisker plot).

Keywords:

Plant, hogweed, soil moisture, soil nitrate, water, count, U test, differences, data distribution, correlation.

Examples

The following examples will give you a few ideas about how you might explore or use these data.

Data distribution

There are three response variables to explore: H2O, NO3 and Hog. You can use histograms to visualise the distribution of the samples for each of the variables (there are two sample sites, road and brook). You can also test the hypothesis that the samples depart from normal distribution using a Shapiro-Wilk test.

   Site       H2O       NO3         Hog

1 brook 0.3168185 0.2836113 0.005554544

2  road 0.7921649 0.1912800 0.004531264

The p-values from a Shapiro-Wilk test show that the environmental variables are not significantly different from normal distribution, but the count of individual plants does depart from normality.

Pivot Tables

If you open the data using a spreadsheet you could use a Pivot Table to summarise the data. However, the summary statistics you can apply are limited in a Pivot Table.

Data summary

Each of the three response variables can be summarised across the two samples. You will need parametric summaries for H2O and NO3, but non-parametric for the Hog data.

Differences test

A simple differences test can be used to explore differences between samples. For H2O and NO3 the Student’s t-test is appropriate. For the Hog data you will need a non-parametric alternative, such as the U-test (Wilcoxon Rank Sum).

Correlation

You might also consider looking at correlations between the variables, is there a link between the two soil factors for example?

Graphics

There are several potentially useful graphics you could employ:

  • Histogram for data distribution.
  • Box-whisker plot for differences between samples.
  • Scatter plots for correlations between variables.
Scatter plot matrix

Scatter plot matrix of correlations (Spearman Rho).

The scatter plot matrix shows each pairwise scatter plot. The red lines show a polynomial scatterplot smoother to help visualise the relationships. The lower triangle shows the correlation statistics (r = Spearman correlation coefficient, p = p-value, n = sample size).

References

Undergraduate field project, (2010). SXR216: Environmental Science, Open University.

Links

Data examples:

Custom R functions:

General data science articles:

  • DataAnalytics Knowledge Base. For general topics and articles about data science, including Learning R: the statistical programming language
  • DataAnalytics Tips and Tricks. for articles covering a range of topics in data science, including Using R, Using Excel, quantitative data analysis, predictive data analysis and a lot more besides.

See our Publications Page for an overview of our book on Ecology, Environmental Science and R: the statistical programming language.