This is an appendix to the main 2018 git-annex user survey to find out more about how people are using git-annex with scientific data. Please also fill out the main survey.

This survey is open now through January 31st 2019.

git-annex is increasingly being used by scientists to manage their data, and the rest of the questions in the poll are for them. If you're not using git-annex for science, you can stop here, and thanks for participating in the poll.

What field of science do you use git-annex for?

astronomy (9%)

bioinformatics (6%)

biology (0%)

chemistry (1%)

computer science (30%)

geography (0%)

machine learning/AI (4%)

meteorology (0%)

neuroscience (30%)

physics (6%)

statistics (1%)

social sciences (1%)

mathematics (3%)

education (1%)

linguistics (1%)

Write in:

Total votes: 62
Posted Fri Nov 23 16:25:47 2018

Do you use git-annex more for getting access to data from others, or for producing/publishing your own data?

consuming data from others (17%)

producing/publishing data (62%)

both equally (20%)

Write in:

Total votes: 58
Posted Fri Nov 23 16:46:42 2018

Do you use software layered on top of git-annex for generating, discovering, managing, publishing data sets etc?

just git-annex (73%)

Datalad ( (19%)

DataLad-HIRNI ( (1%)

GIN ( (0%)

ReproIn/HeuDiConv ( (1%)

our own custom software (4%)

Write in:

Total votes: 63
Posted Fri Nov 23 16:25:47 2018

If you use Datalad, is datalad run/rerun part of your workflow?

yes, I use datalad run/rerun (12%)

no, but I have heard about it (15%)

no (12%)

I don't use datalad (60%)

Write in:

Total votes: 58
Posted Fri Nov 23 16:25:47 2018

Finally here you can provide any details you'd like about your research project(s), where you used git-annex. Please include grant numbers supporting it, URLs, etc.

I use git-annex to store data related to my dissertation and other personal projects (in the field of historical linguistics). My main concerns are data backup and replication across the several machines I use to work on the projects. (25%)

Store/manage/sync hdf5 files produced by my solver code ( for my PhD thesis. (2%)

Manage bioinformatics files related to viral genomics. (2%)

Medical image storage and research artifact archive. (2%)

Manage backups and copies of large working data (27%)

NSF 1429999 (DataLad project), NIH 1P41EB019936-01A1 (ReproNim), any neuroimaging study I touch gets annexed (5%)

Neuroimagingjc data repository uses git-annex via datalad to manage dataset versions and export datasets to a versions S3 bucket (5%)

Any project ends up in a repository with an annex (DataLad dataset) (15%)

Manage pdfs of research papers (12%)

Distribution and archival of build products (2%)

Write in:

Total votes: 40

That's the end of the survey. Thanks for your participation! Please invite your colleagues to fill out this survey too.

Posted Fri Nov 23 16:34:54 2018