This is an appendix to the main 2018 git-annex user survey to find out more about how people are using git-annex with scientific data.

This survey ran from December 1st 2018-January 31st 2019.

git-annex is increasingly being used by scientists to manage their data, and the rest of the questions in the poll are for them. If you're not using git-annex for science, you can stop here, and thanks for participating in the poll.

What field of science do you use git-annex for?

astronomy (8%)


bioinformatics (4%)


biology (0%)


chemistry (2%)


computer science (32%)


geography (0%)


machine learning/AI (4%)


meteorology (0%)


neuroscience (19%)


physics (5%)


statistics (1%)


social sciences (1%)


mathematics (2%)


education (2%)


linguistics (2%)


biomedical engineering (2%)


EE (2%)


physiology (1%)


None of the above (13%)


Total votes: 100
Posted Fri Nov 23 16:25:47 2018

Do you use git-annex more for getting access to data from others, or for producing/publishing your own data?

consuming data from others (14%)


producing/publishing data (63%)


both equally (15%)


moving data around (6%)


Total votes: 91
Posted Fri Nov 23 16:46:42 2018

Do you use software layered on top of git-annex for generating, discovering, managing, publishing data sets etc?

just git-annex (77%)


Datalad (http://datalad.org) (12%)


DataLad-HIRNI (https://pypi.org/project/datalad-hirni) (1%)


GIN (https://web.gin.g-node.org) (0%)


ReproIn/HeuDiConv (http://reproin.repronim.org) (1%)


our own custom software (8%)


Total votes: 98
Posted Fri Nov 23 16:25:47 2018

If you use Datalad, is datalad run/rerun part of your workflow?

yes, I use datalad run/rerun (7%)


no, but I have heard about it (10%)


no (10%)


I don't use datalad (70%)


Total votes: 93
Posted Fri Nov 23 16:25:47 2018

Finally here you can provide any details you'd like about your research project(s), where you used git-annex. Please include grant numbers supporting it, URLs, etc.

I use git-annex to store data related to my dissertation and other personal projects (in the field of historical linguistics). My main concerns are data backup and replication across the several machines I use to work on the projects. (18%)


Store/manage/sync hdf5 files produced by my solver code (https://github.com/aragilar/DiscSolver) for my PhD thesis. (1%)


Manage bioinformatics files related to viral genomics. (1%)


Medical image storage and research artifact archive. (1%)


Manage backups and copies of large working data (25%)


NSF 1429999 (DataLad project), NIH 1P41EB019936-01A1 (ReproNim), any neuroimaging study I touch gets annexed (3%)


Neuroimagingjc data repository https://OpenNeuro.org uses git-annex via datalad to manage dataset versions and export datasets to a versions S3 bucket (3%)


Any project ends up in a repository with an annex (DataLad dataset) (10%)


Manage pdfs of research papers (11%)


Distribution and archival of build products (6%)


Biomechanical testing and computer modeling of knee meniscus injury. Git annex was used to track experimental data and store snapshots of analysis. Regular git was used in the same repository to track code, notes, and drafts. Work supported by NIH grants R01AR050052, R01EB002425, R21AR070966, U54GM104941. (1%)


being able to acces my archives from my 60GB notebook (11%)


Storing part of the pentago dataset (https://perfect-pentago.net) (1%)


Total votes: 59

That's the end of the survey. Thanks for your participation! Please invite your colleagues to fill out this survey too.

Posted Fri Nov 23 16:34:54 2018