Hiding in plain sight

The benefits of public sharing of (temporarily) encrypted, private data

Richard D. Morey
3 min readAug 11, 2017

In discussions on data sharing in science, people sometimes tell me that they want to share their data, just not right now. Not on release of a preprint or on submission of a manuscript; but maybe just before publication or sometime after. There’s nothing wrong with this per se, as long as data is shared with reviewers, and as long as — if possible — data is shared with readers of the eventual publication. Not all data has to be shared all the time, of course.

However, by keeping your data and analysis scripts on your own computer, you are missing out on many benefits that open scientists can reap now. These benefits include:

  • A safe place to put your data, managed by professionals and difficult to mistakenly delete/overwrite
  • Easy sharing of code/data with colleagues on the Open Science Framework/GitHub; no passing around data files, scripts, or Dropbox folders where people can delete/screw things up
  • No mucking about with colleagues’ own computers’ paths in analysis scripts
  • Effortless eventual public sharing; just send a link or include it in the manuscript
  • Public code that can be checked by others, including reviewers, without revealing the data

The problem is that the way people normally share code, if one wants these benefits, it seems like one has to share the data publicly on the Open Science Framework or GitHub.

But…there is another way. First, install the sodium package in R, if you don’t already have it:

install.packages('sodium', dep = TRUE) # If needed

Now try running the following code in R (the “source” command reads the file here into R):

key_string = "This is a passphrase"
source("https://osf.io/73thx/download?version=3")

Didn’t work, did it? It should have given you an error: “Error in data_decrypt(cipher, key, nonce) : Failed to decrypt.” That’s because the data is stored in a public OSF repository, but it is encrypted (the file containing the encrypted data is here). If you have the wrong passphrase, you can’t open it.

Try this now, with the correct passphrase:

key_string = "This is a secret passphrase"
source("https://osf.io/73thx/download?version=3")

You will now have a data frame defined in your R session called “exp1”. You can see it by typing:

summary(exp1)

Of course, you’ll need to know something about the data set in order to analyse it, but those two lines of code are all you need to load the data in R. The data are stored publicly on the OSF, safe, backed up, and managed by professionals. You can access them anytime you like, from anywhere. You can send a link to colleagues. All you need is the secret passphrase.

I have built two examples, one for GitHub users and one for OSF users. In the repository, there is example code to encrypt a file and to load the encrypted data set off either GitHub or OSF.

The great thing about this is that when you’re preparing a paper, you can put the data set safe in plain sight. Put the link in your manuscript, and when you’re ready to share, just add the passphrase to repository! Then anyone can load the data. Safe sharing is easy.

--

--

Richard D. Morey
Richard D. Morey

Written by Richard D. Morey

Statistical modeling and Bayesian inference, cognitive psychology, and sundry other things

No responses yet