On 31 August 1869 in Birr (then Parsonstown), Ireland, Mary Ward met an untimely demise. She is the first motor vehicle accident mortality, having fallen off a steam-powered car built by a relative, and dying of a broken neck. But Ward was also an amateur scientist, and being relatively young when she passed away, no doubt had unfinished investigations or projects that remained unfinished.
“But!” I hear you saying, “what on earth does a 19th-century Irish road accident have to do with science?”. The answer is metadata.
I recently started a new position where I inherited some projects from me predecessor, which is certainly not unique. A post-doc starts a project/experiment, and leaves before it’s finished, or a grad student builds upon some previous work in the lab, or uses data collected by a summer student. All common phenomena in the scientific world. And whenever one starts to use data collected by someone else, chances are, there will be problems.
On Twitter, this is often expressed using the #otherpeoplesdata hashtag. and Christine Bahlai even started a fantastic blog, Practical Data Management, about some of the data issues she’s encountered, and how to avoid them.
At a team meeting today, we spent some time talking about the challenges of working with other peoples’ data, which is something we do quite frequently here. The conversation then turned to the challenges I’ve had accessing, interpreting, and analyzing data from my predecessor, and that’s when I thought of the Ward Test.
Take a look at a particular dataset (where “dataset” is defined as what you need to write a paper). If, like Mary Ward, you were shmucked by a car, would someone else be able to access, interpret, and analyze your data? This is the Ward Test.
For many of us, the answer will be a resounding “NO!”, and the reasons for this are diverse and many, and not what I’m going to get in to here. But what I do want to emphasize is the importance of metadata and organization.
Metadata are the bits of information that provide context for the data. In my line of work, this could be where a bird was banded, it’s band number, species, age, sex, measurements, etc. that accompany some tracking data. Technically, all the data are the positions recorded by the tracking device, but without the metadata, it’s utterly useless.
Even simple spreadsheets (do such things exist?!) could do with some metadata. It also helps for when you go back to re-analyze data, or use data for a different project, or collaborate with others. And as the culture of data sharing increases, this data about data (metadata!) will be increasingly important.
So, my goal is to include the metadata for each new dataset I generate. In a spreadsheet, this can be included on an extra tab with definitions of what the columns mean. In other cases, it would be a text file explaining what certain files are, and where the required information can be found. I’ve written before about the importance of having a plan for what to do with data when a scientist passes away – the trick is making sure the data passes the Ward Test.
The same applies when sharing data – including metadata, and having data that pass the Ward Test will maximize the chance that whoever tries to use it has what they need to succeed.