Archive in biodiversity data platforms

Animal tracking can provide detailed information about the movements and behavior of individual animals. It is also a type of species occurrence data—information describing where and when species are located around the world. Darwin Core is the most widely used standard for storing and sharing information about biological diversity, including species occurrences. Data in Darwin Core format can be published to open-access repositories like the Ocean Biodiversity Information System (OBIS) and Global Biodiversity Information System (GBIF). By integrating animal tracking data stored in Movebank with GBIF and OBIS, researchers and organizations can contribute to novel research and applications, typically at scales and for purposes that do not conflict with those of the original animal tracking studies. For example, biodiversity data aggregated through GBIF are commonly used for large-scale species distribution modeling and by fields within and beyond biology (Heberling et al., 2021). Making animal tracking data available in GBIF and OBIS could fill data gaps in these platforms by documenting remotely-sensed occurrences in regions with few human observations or field sites. These contributions could impact predictions of future species distributions (e.g., Warren et al., 2018) and policy-relevant assessments such as those of the Intergovernmental Panel on Climate Change.

The steps and resources below offer guidance for data owners on publishing animal tracking data in Darwin Core format. Data can only be published to GBIF and OBIS by organizations registered with these platforms, and we advise publication of data from Movebank only by or in coordination with data owners or custodians, and in accordance with the general Movebank terms of use.

Preparing tracking datasets

Prior to translating tracking datasets to Darwin Core, they should be prepared to maximize usefulness and minimize possible misinterpretation for general use in biodiversity studies. Overall, we suggest that owners err on the side of a smaller, higher-quality set of species observations. Darwin Core is designed to describe organisms, the contexts in which they occur, and sampling or measurement events, with less emphasis on collection equipment and methods. With this in mind, we recommend the following steps:

  • Follow these general best practice tips for preparing archive-quality studies on Movebank.
  • Liberally flag low-quality or questionable records as outliers.
  • Provide thorough capture and animal descriptions in the reference data.
  • Exclude flagged outliers and data not associated with an animal of a known species, such as testing or calibration data.
  • Exclude data describing animals that have been experimentally manipulated in ways that affect their typical behavior or distribution.
  • If needed, remove or reduce the precision of occurrences that could pose a threat to animals if exposed to the public (read more below).

These recommendations are designed for GPS data in Movebank format—that is, tabular .csv sensor and reference data downloaded from Movebank—that have been published with a DOI or other persistent link to the files. Such data packages are published through the Movebank Data Repository, as well as by agencies such as the Research Institute for Nature and Forest in Belgium and the US Geological Survey Alaska Science Center. Animal tracking data with locations estimated using other methods, such as Argos Doppler or acoustic telemetry, could use these recommendations as a framework, but should consider what other information might need to be incorporated, in particular in regards to location accuracy.

Terms of use and attribution

The platforms we are aware of that publish data using Darwin Core are open access, meaning data are publicly available. While this ensures that data can be accessed by others, it can pose concerns about how data are used by others and how data owners are acknowledged for these uses. When publishing data in GBIF and OBIS, owners choose from Creative Commons licenses to define terms of use, and have the option to require attribution anytime data are used. This can be done by citing the DOI for the original dataset in Movebank format, or a platform-assigned DOI, which may include data from many datasets included in a query result. The citation guidelines from GBIF and OBIS support attributions, and the GBIF infrastructure tracks citations of data from GBIF. See this example with the results of a query including many source datasets cited in Davidson and Ruhs, 2021.

Technical implementation

The R package movepub (Desmet, 2022) offers functions to transform GPS tracking data from Movebank to Darwin Core, developed for high-resolution GPS avian tracking data. There are many possible approaches to this transformation. We chose a relatively simple data transformation that includes commonly available and biologically important details to encourage wider adoption, enable an automated procedure, and provide a procedure that can be applied across most GPS tracking datasets. This also offers an easy-to-understand introduction for both movement ecologists new to Darwin Core and biodiversity experts new to working with tracking data. Key components of this approach include

  • The original published data package and study in Movebank are referenced to link users to the complete version of the resource.
  • The preferred citation is for the Movebank-format version of the dataset, to better monitor data use.
  • The transformed Darwin Core Archive consists of a single occurrence table and an EML metadata file.
  • Data are reduced to hourly positions per animal. This is a somewhat arbitrary decision intended to reduce data volume in Darwin Core while keeping sufficient resolution for expected uses in biodiversity research.
  • The dwc_occurrence.sql file in the movepub package details the field mappings and additional data definitions.

Several datasets published in GBIF using this method are described in (van der Kolk et al., 2022).

Future work

We welcome feedback and future efforts to build on the best practices we describe here and to expand these recommendations along with data transformation tools. Possible next steps include (1) developing guidelines for animal tracking datasets including other species, methods and sensor types; (2) automating procedures for platforms to publish public data transformed to Darwin Core on GBIF; and (3) implementing generalizations such as those advised by GBIF (Chapman, 2020) to enable public archiving of sensitive species locations while maintaining restricted access to complete datasets.

Acknowledgements

Development of the code and protocol was done as part of the MOVE2GBIF project, funded by the Netherlands Biodiversity Information Facility. These best practices build on several efforts, including De Pooter et al. (2017), the International Bio-logging Society's Data Standardization Working Group, the TDWG Machine Observations Interest Group and the Standardizing Marine Biological Data Working Group.


Archiving best practices

Archive in Movebank

Archive in biodiversity data platforms