This file describes the steps required to map the data to Darwin Core Taxon.

1 Setup

Load libraries:

2 Read data

Define data types
Read taxon data
Read literature references

3 Map taxon core

Preview of the data:

Start with record-level terms which contain metadata about the dataset (which is generally the same for all records).

3.1 language

3.2 license

3.3 rightsHolder

3.4 datasetID

3.5 institutionCode

3.6 datasetName

The following terms contain information about the taxon:

3.7 taxonID

3.8 scientificName

The information in scientificName will be a compilation of several fields: sp_genus, sp_species, sp_authority, sp_subtaxon and sp_subtaxon_authority. We paste this information together to generate the field dwc_scientificName. Before we concatenate this information, we clean the authorship information a bit:

Clean authorships:
Paste information together
Remove all NA
We use the GBIF nameparser to retrieve nomenclatural information for the scientific names in the checklist.
Show scientific names with nomenclatural issues, i.e. not of type = SCIENTIFIC or that could not be fully parsed (parsed = TRUE and parsedpartially = FALSE). Note: these are not necessarily incorrect:

Total amount of scientific names with nomenclatural issues:

## [1] 401

Cleaning of taxa with nomenclatural issues is not within the scope of this mapping. However, we can perform some rough cleaning to eliminate the INFORMAL taxa, by removing sp.:

Some other taxa need special inspection, especially the doubtful ones (probably due to UTF-8 issues)

All taxa should be unique. We here scan for duplicated taxa:

Specify the scientificnames and associated values of idspecies to be removed from the taxon core
Link those with the replacement values for idspecies
Remove duplicated taxa from taxon core:
Save remove_taxa to scan other extension files for the presence of duplicated taxa

3.9 kingdom

No kingdom information is provided. This is not an obligatory field but strongly recommended. It can easily be derived from information in phylum:

However, for 389 taxa have no phylum, there’s no information available. For these records, we try to derive phylum and kingdom information from class:

we complete phylum information

Some of these classes are not correct, e.g. Nematoda is a phylum, not a class. Cleaning this information is not within the scope of this mapping.

Not all phylum information is correct, e.g. Bacteria is a kingdom, not a phylum.

Trim whitespaces in phylum_complete
Phylum Labyrinthista does not exist. This should be phylum Bigyra (for Labyrinthula zosterae)
Based on this information, map kingdom:

3.10 phylum

3.11 class

3.12 order

3.13 family

3.14 genus

3.15 specificEpithet

3.16 infraspecificEpithet

3.17 taxonRank

Information for taxonRank is provided in the field subtaxon_rank, but is only given for varieties, aggregates, hybrids, subspecies or forms. Taxon rank information can also be retrieved by the GBIF nameparser function. This information was retrieved earlier in this script, in the dataframe parsed_names. We add the information to taxon.

Inspect rankmarker values generated by the GBIF nameparser and and compare with subtaxon_rank information from the DAISIE checklist:

We decided to use the information contained in rankmarker because GBIF rankmarker will provide cleaner information than subtaxon_rank, even if there might be some loss of information. However, rankmarker also contains NA. We inspect dwc_scientificName and subtaxon_rank for these values:

Concrete actions to undertake: - scientific names without subtaxon_rank: - Acaena anserinifolia x inermis: species - Dahlia coccinea x pinnata: species - Geoplana (=Australoplana) sanguinea: species - Hyalomma Scupense "Delpy, 1946": species - Rosa Hollandica': species - Rest: genera - scientific names with subtaxon_rank = agg.: genus - scientific names with subtaxon_rank = hyb: species - scientific name = Oidium Pseudoidium: wrong scientific name, refers to genus Oidium or Pseudoidium

Define taxa without subtaxon_rank which are in fact species
Map taxonRank
summarize mapping:

3.18 taxonRemarks

taxon includes a reference to the consulted source via sourceid. We map the sources under taxonRemarks.

Rename source to taxonRemarks:

4 Post-processing

Only keep the Darwin Core columns
Drop the dwc_ prefix
Sort on taxonID
Export all taxonID’s (required for filtering the records in the extensions):
Export core_taxa:
Preview data

7.Save to CSV:

Darwin Core mapping script for Taxon Core

For: Inventory of alien species in Europe (DAISIE)

Lien Reyserhove

David Roy

2021-03-19