Bees and Visits to Garden Flowers

These data are from a student undergraduate project (Open University, Environmental Science) from 2011. The data show counts of various bee species to several different garden flowers. The data were collected from observations over several days, at the same time of day (mid-morning).

 

B.ter

B.hor.pra

B.lap

A.mel

B.pas

Cir.vul

10

8

18

12

8

Ech.vul

1

3

9

13

27

Koe.pan

37

19

1

16

6

Med.fal

5

6

2

9

32

Rub.fru

12

4

4

10

23

In the data the species names are abbreviations of their scientific binomial names. You should be able to find the full names quite easily.

Download

You can download the dataset as a CSV file using this link: bees-garden-flowers. Alternatively, you might copy the table to the clipboard and paste into a spreadsheet.

Usage

You can use these data to practice/illustrate various topics:

  • Simple summary statistics.
  • Graphical summary (e.g. bar chart).
  • Test of association (e.g. chi-squared test).

Keywords

Invertebrate, pollinator, bee, association, chi squared, graphics, bar chart.

Examples

The following examples will give you a few ideas about how you might explore these data.

Summary statistics

The raw data show counts of bee visits to various plant species. Using averages is not really appropriate and the most “useful” thing you could do is to compute the row and column totals.

 

B.ter

B.hor.pra

B.lap

A.mel

B.pas

Sum

Cir.vul

10

8

18

12

8

56

Ech.vul

1

3

9

13

27

53

Koe.pan

37

19

1

16

6

79

Med.fal

5

6

2

9

32

54

Rub.fru

12

4

4

10

23

53

Sum

65

40

34

60

96

295

You could also calculate the proportions of visits, by row, by column, or overall.

Chi Squared test of association

These data are suitable for a test of association. The Pearson Chi Squared test is the analytical test of choice. The overall result could be summarized like so: X2 = 120.65, df = 16, p < 0.001. This shows that the visits are non-random and that certain bee species have preferences (both positive and negative). The Pearson residual values would show you the positive and negative associations.

 

B.ter

B.hor.pra

B.lap

A.mel

B.pas

Cir.vul

-0.67

0.15

4.54

0.18

-2.39

Ech.vul

-3.12

-1.56

1.17

0.68

2.35

Koe.pan

4.70

2.53

-2.69

-0.02

-3.89

Med.fal

-2.00

-0.49

-1.69

-0.6

3.44

Rub.fru

0.09

-1.19

-0.85

-0.24

1.39

Positive Pearson residuals indicate a positive association (bees visit more often), whilst negative values indicate negative association (bees visit less often). However, only values larger than 2.00 (plus or minus) are statistically significant.

You can see various interesting patterns in the results. For example, honeybees don’t seem to have any especial preferences, whereas all the bumblebees show at least some degree of choice.

Graphics

There are a number of ways to visualize count data. A bar chart (what Excel would call a column chart) is one sensible way. However, if there are large differences in count value, some bars end up very tall compared to others.

Figure 1. bees-garden-flowers-raw.png Bee visits to some garden flower species.

Rather than attempt to visualize the raw data, it may be better to look at the results. The Pearson residual values are a good indicator of the various associations; they are also more likely to be on a similar scale.

Figure 2. bees-garden-flowers-resid.png Pearson residuals for bee visits to garden flowers.

The residual plot for the bee data shows the positive and negative associations. Values greater than 2.00 (plus or minus) are statistically significant. The plot shown here shows the data grouped by bee species, but you might easily switch this around and show the data grouped by plant species.

References

Open University (2011), Environmental Science (S216). Undergraduate project.

Links

Data examples:

Custom R functions:

General data science articles:

  • DataAnalytics Knowledge Base. For general topics and articles about data science, including Learning R: the statistical programming language
  • DataAnalytics Tips and Tricks. for articles covering a range of topics in data science, including Using R, Using Excel, quantitative data analysis, predictive data analysis and a lot more besides.

See our Publications Page for an overview of our book on Ecology, Environmental Science and R: the statistical programming language.