Movebank accomodates data from a wide range of file types and allows users to import personalized file formats, and we cannot guarantee that every completed file import is error-free. Therefore, we suggest you take some steps to verify that your data, or other people's data you plan to use, are organized correctly.
Taking these steps soon after importing data will make it easier to fix any problems and help to prevent confusion later. If you would like assistance please contact us at firstname.lastname@example.org. Also read our best practices for organizing studies for additional guidance on creating archive-quality studies.
Note that you must be a Data Manager for a study to add or change data. Users who are not Data Managers should see suggestions under working with other people’s data below.
Check the locations. View your tracks on the map and make sure each track is displayed. Use the option to draw lines for selected animals to easily identify some kinds of outliers and mistakes in mapping timestamps.
- The most common issues with location data are events that should not be considered part of an animals' track because they were (1) not collected when the tag was on the animal or (2) the location coordinates are not accurate (read more). Detailed steps for quality control of studies with many outliers are below.
- If the number of tracks is less than expected, it could be that deployment information is missing. To view undeployed locations, select Data > Show in map (include undeployed) from the Studies page.
- If locations are imported using the wrong coordinate reference system, the tracks will be slightly shifted from their actual locations.
- If the lines between locations are clearly wrong, check that the timestamps and tag IDs were imported correctly. You can select Sort by last update to see the last date for each deployment. Incorrectly mapped timestamps often result in timestamps in the future.
Check attribute values. Download all or part of the imported data and compare attribute values for a few events to those in your original data file. In particular, check that the timestamps are correct and that any conversions or controlled-list mappings worked as expected. (If you select the wrong time zone when mapping your timestamp, there is no way for Movebank to know this is incorrect! Timestamps stored in Movebank use the format yyyy-MM-dd HH:mm:ss.SSS and are always in UTC.)
Use the Attribute Dictionary to confirm your values match Movebank’s definitions, especially for terms that have defined units. If additional information is needed to understand values for some data terms (such as values in generic attributes like “tag tech. spec.”), include this information in the reference data, Study Details or an extra file attachment.
- Check study statistics. Are these what you expect? In a completed study, the statistics shown in the Study Details should look something like this:
Incomplete or incorrect statistics might indicate you haven't yet provided the information needed to properly assign data to animals. In this case you might see values of 0 or not set, like this:
If the Time of First and Last Deployed Location or Number of Deployed Locations are present but not what you expect, check that the timestamps were imported correctly. Incorrectly mapped timestamps often result in timestamps in the future, or can reduce the total number of locations, depending on how duplicates were treated during import.
Check for duplicates. In some cases, the presence of duplicates in the imported data can indicate mistakes in mapping IDs and timestamps or problems in the original data file. You can easily filter for duplicate records in the Event Editor and take a closer look at the results if needed.
Check out the data in another program. If you commonly use R, ArcGIS or other programs for working with animal tracking data, check out your imported data there to make sure it looks and works as expected. You can do this by downloading the data or using the R package move or Movebank's REST API.
If the statistics in your Study Details say the taxa is not set, this means that you have not created animals in the study or that the animals have no taxon assigned.
If all animals in your study are the same species/taxon, you can quickly assign the taxon for all of them at once:
- From the Studies page, select Manage Deployments to open the Deployment Manager.
- Select Batch Edit > Animals.
- Check the box next to Taxon.
- Enter a taxon that complies with the ITIS taxonomy.
- Select Update ALL Selected Animals.
You can select or deselect animals on the list to assign different taxa to groups of animals in the study. For more complicated taxa assignments, or if your study does not yet contain animals, see fix incorrect deployments below. If your taxon is not accepted, see "Why is my taxon invalid?" in the FAQ.
Sometimes while importing or organizing data in your study, deployments are mistakenly assigned, or unwanted IDs are created. Use these steps to check your current deployment information and update if needed.
- From the Studies page, go to Download > Download Reference Data. This will give you a table of all the reference data (deployment, animal and tag information) that are currently in the study. This table contains one row for each deployment, and one row for each animal or tag ID without any associated deployment. In this table it is usually easy to see if unwanted tag or animal IDs have been added, or if tag or animal IDs are not linked together by deployments—these typically appear at the bottom of the file.
Based on what you find here, you can delete or edit the reference data in the Deployment Manager or by importing a new reference data file. If everything looks good, you're done! To completely remove and reimport all existing reference data in the study, follow these instructions:
Use Batch Edit to delete all animal IDs. This will remove all the animal and deployment reference data. If you delete tag IDs with associated data, those data will also be deleted. You will receive a warning if data will be deleted.
If you need to rename tag IDs that have associated data, you will need to rename each one individually from the Deployment Manager or Studies page, or delete the tags and data and then reimport the data with corrected tag IDs.
Edit your downloaded reference data file from step 1 to update the deployment assignments. Each deployment should have a row containing tag ID, animal ID and animal taxon. If pre- or post-deployment locations are included in the event data, deploy-on date and deploy-off date are required to exclude those. You can also add more reference data information—see our archiving best practices for a minimum suggested set of reference data attributes.
Now reimport the updated reference data file.
If you find that attributes were mapped incorrectly during import, you will typically need to and reimport the affected data files. The most common problems are timestamp formats and unit conversions for attributes with units.
- If needed, view details of individual tags or files within the Studies Page and Event Editor to assess which portions of the data are affected.
- Delete the affected file/s.
- Reimport the file, being sure to correct the mapping problem.
- It is sometimes easier to delete and recreate the study.
Data imported to Movebank will often contain locations that should be excluded from analysis and from tracks viewed in the Tracking Data Map. These unwanted locations generally fall into two categories:
- undeployed locations: locations collected when the tag was not attached to an animal, and
- location outliers: low-quality locations collected during the tag deployment.
Movebank is designed to manage both of these situations (read more). The following is a sample QC-checking procedure for such a dataset after the event and reference data have been imported. This example combines many of the general qc steps above.
Review what you’ve added. Start with a review of what you have imported to the study.
From the Studies page, go to Download > Download Reference Data.
Open the downloaded file so that you can refer to it during the next steps. Add a column to the table to note which tracks to inspect further.
Now select Data > Show in Map to view the data on the Tracking Data Map. On the left you will see a list of Animal IDs, with one row per deployment (if there have been multiple tag deployments for an animal you will see multiple rows for that animal).
Click Options and check the box next to Draw lines for selected animals.
Example: Here is a view of the example study after importing and prior to QC checking:
- Select the deployments/animals one at a time from the list to highlight them on the map to review each track for possible outliers or undeployed locations. Here are some of the things you might find to look at more closely.
Example: Here it looks like the start of the track (indicated by red cross) might be from prior to the actual deployment.
Example: A track with outliers. Note these outliers are only apparent if the single track is highlighted, since they occur in the midst of other valid occurrences.
Example: A track with a questionable movement pattern that should be looked at more closely.
Use the Event Editor to review and filter the data. Now use the Event Editor to have a closer look at any tracks that you flagged in the first steps.
- If the study contains < 500,000 location events, you can open the entire study in the Event Editor. If the study contains > 500,000 location events, or if it will be easier to check the tracks individually, you can open the Event Editor for individual tags, animals or deployments.
Example: Open a track (deployment) from the Studies page in order to view in the Event Editor.
Example: View of the deployment on the Studies page, and how to open in the Event Editor.*
If there are only a few visually obvious outliers, it might be easiest to manually flag them in the Event Editor.
Example: Here is our deployed track with outliers in the Event Editor before changes.
Example: And here it is after flagging the outliers.
If there are many outliers, or if it is difficult to notice them visually, it might be best to decide on a set of filter parameters and apply this to the entire study. The result of filtering will look the same as the manually flagged outliers above, except that the outliers will be flagged using the attribute algorithm marked outlier rather than manually marked outlier.
Note: What exactly is an outlier? Your desired filter parameters might depend on the current use of the study. For example, to share tracks with the public or to analyse local movements you might want to filter out even small outliers, at the expense of also filtering some good-quality locations, whereas for an analysis of long-distance movements, you may want to retain more locations even if some small outliers are present. Similarly, retaining duplicate tag-and-timestamp records might be fine for some purposes but cause problems for others. Because Movebank flags but does not delete outliers, you will not lose data and can update these settings anytime.
In some cases you will need to do something other than flag outliers.
If the outliers occur at the beginning or end of the track, evaluate whether the deploy-on or deploy-off dates need to be updated, and make the change in your downloaded reference data table or in the Deployment Manager.
Example: For this track, here we can see that the locations are probably accurate (not outliers) but were collected before the tag was attached to the animal. We can fix this by changing the deploy-on date to 2008-04-07 21:02:56.
You might also identify a pattern that needs to be looked at more closely, possibly by returning to the original database or file from which the data were obtained.
Example: The pattern here could be caused by incorrect timestamps or tag IDs in the original data. This could be resolved by flagging the entire set of highlighted locations as outliers, or by deleting the file containing these records and importing a corrected file.
Update reference data.
- During the review process you might find that changes to deploy-on date and deploy-off date are needed, or notice changes or additions to make to other reference data. If you record this on your downloaded reference data table, you can save this as a .csv and import it to replace and update what is currently in the study. You can also make these changes in the Deployment Manager or from the Studies page.
Review your changes.
- View on map again to review your changes and confirm that your QC checks are complete. Repeat steps as needed if you find additional changes to make. Keep in mind that updates to the calculated statistics and maps can take some time (typically less than 30 minutes) to update, so outliers might continue to display on the map for a while after you have successfully flagged them.
If you are working with studies for which you are not a Data Manager, you can still follow many of the steps above to assess data prior to use. Here are two common situations related to QC checking:
You find outliers or questionable locations. In this case, you could use Movebank to identify things to check, and edit your downloaded data as needed with other tools. If possible, contact the data owner with questions in case they can clarify and/or possibly correct any errors using the steps described above.
You want to ignore filtering done within Movebank. In some cases you might want to apply your own filtering procedures outside of Movebank. For example, you might need to apply consistent procedures across many datasets, or want to use less restrictive settings to retain and use more locations than were desired by the owner for their original use. You might also want to see the full dataset with outlier-related attributes. In this case, you should download the data from the Studies page and check the box next to include points marked as outliers. Events with visible = false will indicate records that are considered outliers.
Quality control of uploaded data