Best practices for study archival on Movebank

Movebank offers tools for collecting, organizing, and analyzing animal tracking and other bio-logging data from the start of data collection until publication and beyond. As part of our data policy, we strongly encourage data owners to eventually make their data public, or else have a long-term plan to address questions and requests for use. When it comes time to share a study with others or prepare it for long-term archiving, it is important to be sure that the study contains enough information for others to understand the data and use it appropriately. The following guidelines and examples describe how to make sure your study is complete, and covers common sharing and archiving strategies. For questions or assistance with archiving data through Movebank, contact support@movebank.org.

Data owners can choose whether and with whom to share their data stored in Movebank. Those who use others' data from Movebank must agree to the general Movebank terms of use and user agreement. Movebank's Permissions give the technical options for sharing with specific users or the public. If you create studies that you do not intend to share or archive beyond the end of your project, we recommend downloading a local backup and then deleting your study when you are done. To use Movebank for temporary purposes without intention to archive, please set the Study Type to "Test".

Following the general tips below are more details for data owners who want to

We also provide guidance for some common questions about what to archive.

General best practice tips

The user manual provides detailed instructions for creating studies and adding data. The following additional steps are recommended to support study archiving plans, and will help make sure that all studies can be discovered and properly used when shared publicly or privately.

  • Make your study name and summary public. Studies intended for long-term storage on Movebank should have the Study Type set as Research and have the study name and summary visible to the public. This step allows the public, including other researchers and conservation groups, to view the written summary in your Study Details and contact you for more information.

  • Maintain access to your study. Add at least one co-owner or colleague as a Data Manager for your study that could take over as the study contact if needed. If your email address changes, make sure to update the email in your Movebank account.

  • Complete your Study Details. This text is included in searches on the Tracking Data Map, so filling out relevant details and incorporating keywords will help people discover your work. Provide an informative title, study summary and citations for any published work describing the study. Make sure that the people and organizations leading the work are listed. Consider that readers might include both scientists and non-scientists.

  • Check your imported data. Complete quality control to make sure your data and reference data are imported and organized correctly.

  • Provide sufficient reference data. While Movebank has few mandatory requirements for reference data, please add as much information as you’re able to: missing details, such as the distinction between adults and juveniles, can restrict or misinform future analyses using the data. Of course, what is “sufficient” will depend on what is available, for example what information was collected in the field, the equipment and methods used, and the original purpose of the study. Keep in mind that new analyses might make valuable use of information you collected that wasn't important to the original study. Download the reference data from the study by selecting Download > Download reference data and make sure it looks complete and correct. The following is a minimum suggested set of reference data attributes:
    animal ID: always include
    animal taxon: always include
    tag ID: always include
    deployment ID: always include
    deploy-on date and deploy-off date: include if known, required if pre- or post-deployment locations are present
    animal life stage: include if known
    animal sex: include if known
    attachment type: include if known
    deploy-on latitude and deploy-on longitude: recommended especially when using methods that result in no or low-accuracy location estimates
    deployment end type: include if known
    manipulation type: always include, to clarify whether data represent "natural" or experimentally manipulated animal movements
    study site: recommended especially when there are multiple groups of animals or deployments (e.g. deployment locations)
    tag manufacturer: include if known
    tag model: include if known
    tag readout method: include if known

Many other attributes are available. See a complete list of terms and definitions in the Movebank Attribute Dictionary. You can add and edit reference data by importing a reference data table or in the Deployment Manager.

  • Want help? For help adding or checking your data, send an email to support@movebank.org.

Share in-progress studies

If you are collecting data through live feeds or over multiple field seasons, you may want to share it with the public or with specific people while data collection and analysis is still ongoing. Use your study's Permissions settings to add select users as Collaborators, or use a rolling embargo to share only older data with the public.

Review the general tips above. As new data come in, you’ll need to periodically review the study—consider setting up calendar reminders to do this or add it to your post-fieldwork protocol. Once you’re done with the study, be sure to see the options below for long-term archiving and publication to make sure your work isn’t lost. It is also possible to publish the current version of ongoing studies in the Movebank Data Repository as described below.

Public sharing

We strongly encourage all data owners to eventually make their data public, unless there are legal or conservation-related reasons for restricting access. Review the general tips above. When you are ready, you can allow public sharing in the Permissions for your study. The Permissions settings include data licenses and embargo options that give you control over when the public may download data. To get a DOI, persistent link and citation for your study, you can submit it to the Movebank Data Repository as described below.

Formal public archival

The best way to assure that others will be able to discover, access and use your data far into the future is to publish it in the Movebank Data Repository. This also provides you with a DOI, license and a citation for the dataset.

Review the general tips above to prepare your study for submission. Alternatively, we can import data and prepare a study for you. To begin the submission process, email curator Sarah Davidson at sdavidson@ab.mpg.de including

  • the name of the Movebank study (if there is one),
  • a copy of any manuscripts (to be kept confidential), papers or reports describing the dataset, and
  • the author list you’d like to use for the published dataset.

Controlled-access archival

Research indicates that sharing with individuals by request is an unreliable way to ensure access to data (for example see Couture et al. 2018). However, it can be a good option if data collection is ongoing, or if making the data public could pose a threat to sensitive species. You can use Movebank to make your project publicly discoverable without showing any individual locations or allowing data download, and to easily make your data available to individual collaborators if and when you agree on the terms of a specific use. We encourage owners to use Movebank's new embargo options, which allow Data Managers to restrict access to more recent data or commit to making the data public in the future.

Review the general tips above to prepare your study for sharing with others. Be aware that Movebank cannot provide access to non-public data without the permission of a Data Manager for a study, and thus cannot guarantee the long-term accessibility of controlled-access studies. To discuss more formal arrangements for releasing data through Movebank or assigning Movebank as a custodian to address data-sharing requests, contact support@movebank.org.

Special cases

Below are discussion and guidance covering some common situations.

One vs multiple studies

For projects involving multiple species, funders, field seasons, or deployment sites, you will need to decide if and how to divide your data across studies in Movebank. Movebank's data model treats studies independently, and there is no automated way to split or merge studies. Things to consider:

  • How will you share the data? Data sharing is done at the study level, so if you will be sharing different subsets of data with different users, it is easiest to create a separate study for each user group you are sharing with.
  • How much data do you have? Users are successfully managing Movebank studies with over a thousand animals and hundreds of millions of data records. However, if you are working with hundreds of deployments or many millions of data records, multiple studies might be easier to manage.
  • How much technical expertise and support do you have? Movebank's REST API allows advanced users to monitor and collect data in Movebank and send it between databases or applications. If you will not depend on Movebank to view or share data, this can enable large data volumes or more complicated sharing of data within a single study.

What should I archive?

When preparing to add data to Movebank, or when planning to submit data for formal public archiving as described above, it is sometimes unclear what scope or processing level of data should be included. Movebank is flexible in what, how and why owners store in Movebank, and we offer the following general suggestions.

Just the subset used for an analysis, or the entire dataset?

When different subsets of the same dataset are used in multiple papers, our general recommendation is to publish the entire dataset and refer to that each time, describing as needed in the data and papers which subsets were used for a given analysis. Advantages to publishing one larger dataset are

  • more opportunities for data re-use for a wider variety of purposes,
  • reduced chance of mistakes in how data are re-used,
  • reduced time needed to prepare one dataset, and
  • authors will likely get higher citation rates for one comprehensive dataset.

We recognize this differs from common journal policies, which focus on ensuring that a specific published analysis is replicable. In our experience, however, there is a much greater demand to re-use data for different purposes than to replicate existing results. When multiple overlapping subsets from one original dataset are published, it can be difficult or impossible for a user to put those datasets back together to accurately recreate the complete dataset. For users combining datasets for larger meta-analyses, the chance of misunderstanding or failure to notice overlapping data increases. To meet journal needs, authors can enable replication using a subset of a larger study with sufficient methods and reference data that define which subset was analyzed.

On the other hand, we realize that it is not always possible to publish an entire dataset at once, such as for ongoing long-term studies. In these cases we would recommend publishing in segments (e.g. one dataset for years 2010–2014 and one for years 2015–2019) or publishing a complete update that can supercede the existing dataset (e.g. one dataset for years 2010–2014 and an update with years 2010–2019). For studies published in the Movebank Data Repository, we can also offer a one-year embargo for published datasets. If formal archiving (including a DOI) is not needed, consider providing limited public access using an embargo.

Modelled or interpolated vs original datasets?

Some datasets consist of an "original" version and one or more other versions that have been processed to improve location estimates or to enable a specific analysis. Similar to the previous question, we generally recommend archiving data based on their most "original" version (after decoding and importing to Movebank). Potential data users can replicate processing or modelling steps with the original data, and with the original data they also have the opportunity to analyze in different ways. Other things to consider:

  • You can store both original and processed versions of the data on Movebank; best practice is to identify interpolated or modelled event records using "modelled" = true and use the same tag, animal, and deployment IDs across studies if multiple studies are used.
  • You can identify and flag outliers directly in Movebank, either manually or using filters, allowing publication of all data collected while clearly identifying records that should be ignored.
  • A processed or reduced version of data for sensitive or threatened populations or species could allow public archiving without posing a conservation threat.
  • For low-accuracy data, a filtered or processed version of the dataset could make the data accessible for a wider group of users, for example conservation groups that want to aggregate information about migration corridors but don't have the capacity to redo data processing. (For geolocator data see the next item below.)

What to include when archiving geolocator studies?

The developers of the light-level analysis packages TwGeos, GeoLight, probGLS, SGAT and FlightR and the TAGS platform have defined guidelines for archiving geolocator datasets and worked with Movebank to support this archiving. In order to document existing analyses while allowing for potential future reanalysis using improved methods, they recommend storing raw light-level data, twilight selections, location estimates, and comprehensive deployment information, with the option to store additional files with relevant scripts or protocol. Details are provided in Chapter 9 of the Light-level geolocation analysis manual, and Livoski et al. (2020).

How can I archive sensitive data?

Follow the instructions above for controlled-access sharing. Currently Movebank has no way to guarantee continued access to non-public studies over time, because access in these cases relies on the owners approving and providing access to others. The Movebank Data Repository is only for public data, because this is the only way we can ensure that data remain accessible beyond the career or life of the owners. For sensitive or legally-restricted data, this might not be an appropriate option. We are exploring more formal ways to offer long-term controlled-access archiving and welcome your input.

Rescue older data

A lot of the oldest animal tracking data is still not stored in shared databases. With so many critical ecological questions and policy-relevant issues dependent on knowledge of changes over the past several decades, these data should be archived and made available for appropriate future use. If you are aware of data that are in danger of being lost, for example due to a retirement, the end of a project, or storage on devices and file formats that are becoming obsolete, please reach out to us at support@movebank.org. We can help communicate with data holders and facilitate data conversion and import to Movebank. Data can remain access-controlled if there is a point person who can respond to inquiries.