Research data management using DataLad

Research data management using DataLad

"Learning the basics of the DataLad version control system for research data. DataLad is a community project built on top of git and git-annex and a critical tool for reproducible cognitive neuroscience."

Information

The estimated time to complete this training module is 3h.

The prerequisites to take this module are:

Contact Isil Bilgin if you have questions on this module, or if you want to check that you completed successfully all the exercises.

Resources

This module was presented by Adina Wagner during the HBM Brainhack in 2020.

The material of the tutorial is available here.

The video of her presentation is available below:

For the installation of the DataLad please follow the instructions in the DataLad Handbook

Exercise

  • Follow along the tutorial with Adina. You can copy paste the commands from the DataLad handbook section linked above, while following the video.
    • Warning 1: The url for one of the books in the tutorial (byte-of-python.pdf) is broken, so the pdf is unreadable. This does not impact the tutorial, but just don't be surprised if that document does not open. Also it shows how important it is to create persistent URLs when you release material, such as those offered on platforms like zenodo, osf or figshare.
    • Warning 2: Follow the tutorial you may need to install new command line tools, such as tree.
    • Warning 3: To be able to clone the some repositories throughout the hands on parts of the lecture you will need to produce a SSH key and register it with your github account. To be able to create your SSH key please follow the instructions from Github. From the Git Bash terminal (a bash emulation that comes with the installation of Git) go to where the ssh key file is stored and run cat ~/.ssh/id_rsa.pub command to see the key. It will be a very long string of letters and numbers starting with an indicator ssh-rsa. Copy the whole chunk of the key string and go to your GitHub account, from Settings> SSH & GPG keys menu click to the New SSH key button. Paste the copied key into the Key text box and give a title to your key such as home_laptop_github_key. And click the Add SSH Key button to save it. Now you have your SSH key is settled for the current operating system environment and you are ready to run datalad clone command by using git@github.com:... links listed throughout the tutorial.
  • Check with Isil Bilgin to validate that the history of your DataLad repository that includes all the steps of the tutorial.
  • 🎉 🎉 🎉 You completed this training module! 🎉 🎉 🎉

More resources

If you want to learn more, check:

  • The DataLad handbook, which features lot of additional resources as well!
  • The DataLad datasets github organization, which provides an easy access to a number of data resources. This type of DataLad repositories are the easiest way to get access to datasets.
  • The DataLad lecture series
  • The DataLad Course Material
  • Note that for the last part of the tutorial you will need to install singularity and the datalad-container extension (installable through pip).
  • All of the Open Neuro datasets available on the Open Neuro github organization.
  • You can also read about the YODA principles for reproducible papers.