This is an appendix to the main 2018 git-annex user survey to find out more about how people are using git-annex with scientific data. Please also fill out the main survey.

This survey is open now through January 31st 2019.

git-annex is increasingly being used by scientists to manage their data, and the rest of the questions in the poll are for them. If you're not using git-annex for science, you can stop here, and thanks for participating in the poll.

What field of science do you use git-annex for?

astronomy (9%)


bioinformatics (6%)


biology (0%)


chemistry (1%)


computer science (30%)


geography (0%)


machine learning/AI (4%)


meteorology (0%)


neuroscience (30%)


physics (6%)


statistics (1%)


social sciences (1%)


mathematics (3%)


education (1%)


linguistics (1%)


Write in:

Total votes: 62
Posted Fri Nov 23 16:25:47 2018

Do you use git-annex more for getting access to data from others, or for producing/publishing your own data?

consuming data from others (17%)


producing/publishing data (62%)


both equally (20%)


Write in:

Total votes: 58
Posted Fri Nov 23 16:46:42 2018

Do you use software layered on top of git-annex for generating, discovering, managing, publishing data sets etc?

just git-annex (73%)


Datalad (http://datalad.org) (19%)


DataLad-HIRNI (https://pypi.org/project/datalad-hirni) (1%)


GIN (https://web.gin.g-node.org) (0%)


ReproIn/HeuDiConv (http://reproin.repronim.org) (1%)


our own custom software (4%)


Write in:

Total votes: 63
Posted Fri Nov 23 16:25:47 2018

If you use Datalad, is datalad run/rerun part of your workflow?

yes, I use datalad run/rerun (12%)


no, but I have heard about it (15%)


no (12%)


I don't use datalad (60%)


Write in:

Total votes: 58
Posted Fri Nov 23 16:25:47 2018

Finally here you can provide any details you'd like about your research project(s), where you used git-annex. Please include grant numbers supporting it, URLs, etc.

I use git-annex to store data related to my dissertation and other personal projects (in the field of historical linguistics). My main concerns are data backup and replication across the several machines I use to work on the projects. (25%)


Store/manage/sync hdf5 files produced by my solver code (https://github.com/aragilar/DiscSolver) for my PhD thesis. (2%)


Manage bioinformatics files related to viral genomics. (2%)


Medical image storage and research artifact archive. (2%)


Manage backups and copies of large working data (27%)


NSF 1429999 (DataLad project), NIH 1P41EB019936-01A1 (ReproNim), any neuroimaging study I touch gets annexed (5%)


Neuroimagingjc data repository https://OpenNeuro.org uses git-annex via datalad to manage dataset versions and export datasets to a versions S3 bucket (5%)


Any project ends up in a repository with an annex (DataLad dataset) (15%)


Manage pdfs of research papers (12%)


Distribution and archival of build products (2%)


Write in:

Total votes: 40

That's the end of the survey. Thanks for your participation! Please invite your colleagues to fill out this survey too.

Posted Fri Nov 23 16:34:54 2018