Abbreviate Species Names

This function allows you to create simple abbreviations from species names. The default parameters split a scientific binomial name by taking 4 characters from the genus and 4 from the trivial name. The result is an 8-character name for each species. The function also checks that each abbreviation is unique.

Keywords

Abbreviate, species names, scientific binomial, genus.

Download

You can download the function code using this link (click to view the text, right-click to select download): abbreviate-species-names.R. The file is an R source code file, which is readable by any text editor.

To get the function working for your copy of R you’ll need to use the source() function. Put the abbreviate-species-names.R file in your working directory and type:

source("abbreviate-species-names.R")

Alternatively, you can use:

source(file.choose())

This will open a file browser so you can find and select the file. If you are using RStudio you can use menu: Code > Source File...

Description

This function allows you to make simple abbreviations for species names. The function splits a name into genus and trivial name components and returns a fixed number of elements from each. These are joined to form the final abbreviation. The results are checked to ensure that they are all unique.

Usage

abbrv(x, split = 4, length = 8)

Arguments

There are three arguments:

x A vector of species names.
split The required number of characters for each element of the split, the default is 4.
length The required total length, the default is 8.

The split argument controls the size of each of the components, whilst the length argument is a final “trim”. It is best if length is double the value of split, otherwise you will get either truncation or additional spaces.

Value

A named character vector. The names are the original species names and the results are the abbreviated values. A check for uniqueness is carried out and the result displayed. If this shows TRUE, then there are duplicates.

See Also

abbreviate(), which carries out more general abbreviation.

Code

Here is the code for the function in full.

## Abbreviations for species names V2
## Mark Gardener 2020
## www.dataanalytics.org.uk
abbrv <- function(x, split = 4, length = 8)
 {

 # x = vector of names
 # split = size of chunks
 # length = final name size

sp <- as.character(x) # make sure x is character vector
sp.spl <- strsplit(sp, split = " ") # chop into genus + trivial (makes list)
sp.ab <- lapply(sp.spl, FUN = substr, start = 1, stop = split) # cut down to chunks
       sp.abb <- lapply(sp.ab, paste, collapse = "") # combine genus+trivial
       sp.abs <- substr(sp.abb, start = 1, stop = length)
       sp.nam <- format(sp.abs, width = length) # make sure all same length
names(sp.nam) <- x # add original species names as names attribute
cat("Done. Duplicate names... ", any(duplicated(sp.nam)), "\n")
return(sp.nam)
 }
## END

Examples

Here are some examples of the function in operation.

head(spp)
[1] "Achillea millefolium"  "Aegopodium podagraris" "Agrostis capillaris" 
[4] "Agrostis stolonifera"  "Anthriscus sylvestris" "Arctium minus"       

A <- abbrv(spp, split = 4, length = 8)
Done. Duplicate names...  FALSE

head(A)
 Achillea millefolium Aegopodium podagraris   Agrostis capillaris
           "Achimill"            "Aegopoda"            "Agrocapi"
 Agrostis stolonifera Anthriscus sylvestris         Arctium minus
           "Agrostol"            "Anthsylv"            "Arctminu"

paste0(A, collapse = "")
[1] "AchimillAegopodaAgrocapiAgrostolAnthsylvArctminuArrhelatBidecernBracrutaBromhordCalysepiCapsbursCardpratCentnigrCerafontChamanguChenalbuCirsarveCirspaluCratmonoCynocrisDactglomDescflexElytrepeEpilhirsEpilmontFallconvFestrubrFiliulmaFraxexceGaliaparGaliveruGeracoluGeramollGlechedeHedeheliHerasphoHolclanaImpaglanJunceffuJuncinflLathpratLeucvulgLolipereLotucornPhalarunPlanlancPlanmajoPoaprat Poatriv PrunvulgQuerrobuQuerseedRanuacriRanurepeRubufrutRumeacetRumecrisRumeobtuRumesangSalicaprSalifragSiledioiSoncarveSoncaspeStacsylvSympoffiTanavulgTaraseedThuitamaTrifdubiTrifpratTrifrepeUrtidioiVeroarveVicihirs"

## Compare to abbreviate()
B <- abbreviate(spp, min = 8)
paste0(B, collapse = "")
[1] "AchllmllAgpdmpdgAgrstscpAgrstsstAnthrscsArctmmnsArrhntheBidnscrnBrchythrBrmshrdcClystgspCpslbrs-CrdmnprtCentrngrCrstmfntChmrnangChnpdmalCrsmarvnCrsmplstCrtgm(s)CynsrscrDctylsglDschmpsfElytrgrpEplbmhrsEplbmmntFllpcnvlFestcrbrFlpndlulFrxne(s)GalmaprnGalimvrmGrnmclmbGernmmllGlchmhdrHdrhl(g)HrclmsphHlcslntsImptnsglJncseffsJncsinflLthyrsprLcnthmmvLolmprnnLtscrnclPhlrsarnPlntglncPlntgmjrPpratnssPtrivilsPrnllvlgQrcsr(s)Qrsdlng/RnnclsacRnnclsrpRbsfa(g)RumxactsRmxcrspsRmxobtsfRmxsngnsSlxcp(s)Slxfr(s)SilendicSnchsarvSnchsaspStchyssySymphytoTnctmvlgTrsdlng/ThdmtmrsTrflmdbmTrflmprtTrflmrpnUrticdicVrncarvnVichirst"

This function is not really intended to be a replacement for abbreviate() entirely, but it can produce more “intuitive” abbreviations.

Links

Data examples:

Custom R functions:

General data science articles:

  • DataAnalytics Knowledge Base. For general topics and articles about data science, including Learning R: the statistical programming language
  • DataAnalytics Tips and Tricks. for articles covering a range of topics in data science, including Using R, Using Excel, quantitative data analysis, predictive data analysis and a lot more besides.

See our Publications Page for an overview of our book on Ecology, Environmental Science and R: the statistical programming language.