Abbreviate Species Names
This function allows you to create simple abbreviations from species names. The default parameters split a scientific binomial name by taking 4 characters from the genus and 4 from the trivial name. The result is an 8-character name for each species. The function also checks that each abbreviation is unique.
Keywords
Abbreviate, species names, scientific binomial, genus.
Download
You can download the function code using this link (click to view the text, right-click to select download): abbreviate-species-names.R. The file is an R source code file, which is readable by any text editor.
To get the function working for your copy of R you’ll need to use the source()
function. Put the abbreviate-species-names.R
file in your working directory and type:
source("abbreviate-species-names.R")
Alternatively, you can use:
source(file.choose())
This will open a file browser so you can find and select the file. If you are using RStudio you can use menu: Code > Source File...
Description
This function allows you to make simple abbreviations for species names. The function splits a name into genus and trivial name components and returns a fixed number of elements from each. These are joined to form the final abbreviation. The results are checked to ensure that they are all unique.
Usage
abbrv(x, split = 4, length = 8)
Arguments
There are three arguments:
x |
A vector of species names. |
split |
The required number of characters for each element of the split, the default is 4. |
length |
The required total length, the default is 8. |
The split
argument controls the size of each of the components, whilst the length
argument is a final “trim”. It is best if length
is double the value of split
, otherwise you will get either truncation or additional spaces.
Value
A named character
vector
. The names
are the original species names and the results are the abbreviated values. A check for uniqueness is carried out and the result displayed. If this shows TRUE
, then there are duplicates.
See Also
abbreviate()
, which carries out more general abbreviation.
Code
Here is the code for the function in full.
## Abbreviations for species names V2 ## Mark Gardener 2020 ## www.dataanalytics.org.uk abbrv <- function(x, split = 4, length = 8) { # x = vector of names # split = size of chunks # length = final name size sp <- as.character(x) # make sure x is character vector sp.spl <- strsplit(sp, split = " ") # chop into genus + trivial (makes list) sp.ab <- lapply(sp.spl, FUN = substr, start = 1, stop = split) # cut down to chunks sp.abb <- lapply(sp.ab, paste, collapse = "") # combine genus+trivial sp.abs <- substr(sp.abb, start = 1, stop = length) sp.nam <- format(sp.abs, width = length) # make sure all same length names(sp.nam) <- x # add original species names as names attribute cat("Done. Duplicate names... ", any(duplicated(sp.nam)), "\n") return(sp.nam) } ## END
Examples
Here are some examples of the function in operation.
head(spp) [1] "Achillea millefolium" "Aegopodium podagraris" "Agrostis capillaris" [4] "Agrostis stolonifera" "Anthriscus sylvestris" "Arctium minus" A <- abbrv(spp, split = 4, length = 8) Done. Duplicate names... FALSE head(A) Achillea millefolium Aegopodium podagraris Agrostis capillaris "Achimill" "Aegopoda" "Agrocapi" Agrostis stolonifera Anthriscus sylvestris Arctium minus "Agrostol" "Anthsylv" "Arctminu" paste0(A, collapse = "") [1] "AchimillAegopodaAgrocapiAgrostolAnthsylvArctminuArrhelatBidecernBracrutaBromhordCalysepiCapsbursCardpratCentnigrCerafontChamanguChenalbuCirsarveCirspaluCratmonoCynocrisDactglomDescflexElytrepeEpilhirsEpilmontFallconvFestrubrFiliulmaFraxexceGaliaparGaliveruGeracoluGeramollGlechedeHedeheliHerasphoHolclanaImpaglanJunceffuJuncinflLathpratLeucvulgLolipereLotucornPhalarunPlanlancPlanmajoPoaprat Poatriv PrunvulgQuerrobuQuerseedRanuacriRanurepeRubufrutRumeacetRumecrisRumeobtuRumesangSalicaprSalifragSiledioiSoncarveSoncaspeStacsylvSympoffiTanavulgTaraseedThuitamaTrifdubiTrifpratTrifrepeUrtidioiVeroarveVicihirs" ## Compare to abbreviate() B <- abbreviate(spp, min = 8) paste0(B, collapse = "") [1] "AchllmllAgpdmpdgAgrstscpAgrstsstAnthrscsArctmmnsArrhntheBidnscrnBrchythrBrmshrdcClystgspCpslbrs-CrdmnprtCentrngrCrstmfntChmrnangChnpdmalCrsmarvnCrsmplstCrtgm(s)CynsrscrDctylsglDschmpsfElytrgrpEplbmhrsEplbmmntFllpcnvlFestcrbrFlpndlulFrxne(s)GalmaprnGalimvrmGrnmclmbGernmmllGlchmhdrHdrhl(g)HrclmsphHlcslntsImptnsglJncseffsJncsinflLthyrsprLcnthmmvLolmprnnLtscrnclPhlrsarnPlntglncPlntgmjrPpratnssPtrivilsPrnllvlgQrcsr(s)Qrsdlng/RnnclsacRnnclsrpRbsfa(g)RumxactsRmxcrspsRmxobtsfRmxsngnsSlxcp(s)Slxfr(s)SilendicSnchsarvSnchsaspStchyssySymphytoTnctmvlgTrsdlng/ThdmtmrsTrflmdbmTrflmprtTrflmrpnUrticdicVrncarvnVichirst"
This function is not really intended to be a replacement for abbreviate()
entirely, but it can produce more “intuitive” abbreviations.
Links
Data examples:
- Statistics for Ecologists: support files and example data.
- Statistics for Ecologists: exercises and notes.
- Community Ecology: support files and notes.
- Managing Data using Excel: support files and example data.
Custom R functions:
- Community Ecology: custom R functions.
General data science articles:
- DataAnalytics Knowledge Base. For general topics and articles about data science, including Learning R: the statistical programming language
- DataAnalytics Tips and Tricks. for articles covering a range of topics in data science, including Using R, Using Excel, quantitative data analysis, predictive data analysis and a lot more besides.
See our Publications Page for an overview of our book on Ecology, Environmental Science and R: the statistical programming language.