• Home
  • About
  • Contact
  • Languishing Projects
  • Beyond Science
  • Other Blogging
  • Queer in STEM

The Lab and Field

~ Science, people, adventure

The Lab and Field

Tag Archives: data management

Manuscript necromancy: challenges of raising the dead

15 Sunday Mar 2015

Posted by Alex Bond in how to

≈ 8 Comments

Tags

data management, manuscript necromancy, manuscripts, necromancy, publishing, students, writing

If you’ve been doing research for any length of time, you probably have data that aren’t doing anything but taking up space on your hard drive.  Stick around a little longer, and you’ll eventually have entire projects with half-written (or even completely written) manuscripts that, for one reason or another (or indeed no reason at all) have fallen by the wayside.  At some time in the future, you’re organizing files, or chatting with a colleague and you suddenly think “Oh yeah. Whatever happened to that?”.  Or, if you’re a PI/manager, you’ve had students write their theses/reports, which should have/could be manuscripts, but aren’t.

I’ve dubbed the process of (re)discovering a dead manuscript, and breathing new life into it manuscript necromancy. I think the comparison works.  And like true resurrection from the dead* manuscript necromancy isn’t without its challenges and limitations, and you might need some … interesting tools to get the job done.

I should point out that in a perfect world, necromancy wouldn’t be needed, and all data would be formatted beautifully with wonderful metadata and reproducible analysis scripts.  But this world is far from perfect.  This is the scientific dark arts. Hold on to your tracked changes, boys and girls, we’re going in, and it could get ugly.

 

Slash and burn

Student theses aren’t often written with tight language, good grammar, and in the style of a journal article.  There’s frequently lots of exposition and background, a verbose writing style, mixed tenses, inconsistent formatting, … the list goes on.  The first step is to go through the current draft with a take-no-prisoners edit to remove unneeded text, straighten out the grammar and style, and to give yourself a general feel for the manuscript.  This is, often, the most labour intensive part of the job.  A recent manuscript we resurrected took me 3 full days of editing, which ultimately reduced its length by almost half.

My next step is to tackle the references.  Theses often cite everything under the sun (Smith et al. 1758), regardless of how useful it is (Jones 1877).** 9 (or 10) times out of 10, the references are incomplete or missing, and almost certainly aren’t in your reference manager of choice (let alone the journal’s style, but that’s another argument for another day).  One trick is to look for references that are only in one place, and ask whether they are truly needed. If they are, keep them. If not, away they go.

The last item on this first step is to look at the tables and figures.  Are they all needed? Are they all necessary?  Are they clear?  Hopefully the answer is yes, or requires minimal changes (though see some spooky possibilities below).

Congratulations! You’re now a Level 1 Manuscript Necromancer (and are entitled to the post-nominals M.N. in certain circles).

 

The festering wound

But a manuscript can still be alive, though severely wounded.  In some cases, you’ll discovery (to your utter dismay) that you need to re-analyze data, or re-draw a figure.  Both of these require necromancy of the most troubling form: data.

Data management has been improving as  whole ***, but student thesis data is not known to be the most friendly for outsiders to wrangle.  You just have to check out #otherpeoplesdata on Twitter to get a taste of the frustrations.

While your initial reaction would be to re-create the analyses done in the original draft, and obtain the same results before moving on, I strongly recommend against it unless the data are well archived with appropriate metadata and explanations of the analysis (in the form of notes, an R script, etc).  You will not get the same results, and you will tear out your hair (and possible scalp) looking for it.  The situation is already less than ideal, so cut your losses, and use what you have.  By all means, cull anything that’s rubbish (and document it!), and then proceed with your analysis/graph.

Level 2 completed.

 

Communicating with the dead

One of the biggest challenges of necromancy is in the final stages. You have a draft with the right analyses & figures, and you’re ready to submit. Assuming that someone else started this science (be they a student, technician, contractor, or sorcerer’s apprentice), I’d argue that there’s an obligation to include them as a coauthor.  The exception might be if the end product bears no resemblance to the original, but that is less about manuscript necromancy, and more manuscript transfiguration (a topic for another post).

Make every effort to get in touch with the originator so they can a) see what changes you’ve made, b) approve of them, and c) know your plans for the paper.  This means old email addresses, good old Google searching, contacts through third parties (e.g., friends of friends) and the like.  And keep records of these in case you can’t track them down.  If you can’t, and have made every effort to find them, they should still be listed as a coauthor. Most journals require you to state that all authors have read and approved the submission, so in this case, my pragmatic argument is that, unless there were major changes to the conclusions, their first draft is implied approval****. If there were major changes, you absolutely must track them down, or remove them from the authorship list.

 

Rest and recharge

Manuscript necromancy can be more work and is certainly more exhausting than writing a manuscript yourself.  Don’t resurrect more than one manuscript at a time, and don’t do more than two or three in a row.  You need time to recharge your mind, and to many resurrections in a short period can lead to botched necromancy (and no one wants that) because of reduced effort, particularly in the Slash & Burn phase.

 

Preventing (manuscript) death in the first place

The best solution, though, is to avoid necromancy in the first place. This isn’t always possible, though, and just because something doesn’t get written up doesn’t make it less Science.  Some things, though, can vastly improve the chances of successful necromancy, and are good research practices to boot:

  • encourage good writing. This isn’t easy, and Terry McGlynn has some good thoughts on this issue more broadly.
  • give good, timely feedback (which increases the chance of a successful manuscript before it dies for the first time)
  • encourage good data management.  The easier it is for someone else to piece together the analysis, the better chances of necromancy, especially when deeper techniques of the academic dark arts are required.
  • encourage good data management.  Have I said this yet? It’s sort of important.

 

Glass houses, stones, and all that

One last note – manuscript necromancy need not apply to just someone else’s work, but is equally applicable to your own work from the past that’s being revisited. The same tools and techniques (and problems) apply. In this sort of case, your familiarity with the manuscript may be overwhelming to your necromancy techniques.  Having an outsider read it over as a friendly reviewer is strongly recommended.

 

Wishing you all much success in your exploration of the scientific dark arts.

— — —

*well, not exactly “true”, sensu stricto, but more widely known

**see what I did there? Not exaggerating either.

***or at least I hope it is.

****ONLY in the absence of actually approving it, mind you, and as an absolute last resort.

The Ward Test, and the importance of metadata

19 Thursday Jun 2014

Posted by Alex Bond in how to

≈ 3 Comments

Tags

data management, Ward Test

On 31 August 1869 in Birr (then Parsonstown), Ireland, Mary Ward met an untimely demise.  She is the first motor vehicle  accident mortality, having fallen off a steam-powered car built by a relative, and dying of a broken neck.  But Ward was also an amateur scientist, and being relatively young when she passed away, no doubt had unfinished investigations or projects that remained unfinished.

“But!” I hear you saying, “what on earth does a 19th-century Irish road accident have to do with science?”.  The answer is metadata.

I recently started a new position where I inherited some projects from me predecessor, which is certainly not unique.  A post-doc starts a project/experiment, and leaves before it’s finished, or a grad student builds upon some previous work in the lab, or uses data collected by a summer student.  All common phenomena in the scientific world.  And whenever one starts to use data collected by someone else, chances are, there will be problems.

On Twitter, this is often expressed using the #otherpeoplesdata hashtag.  and Christine Bahlai even started a fantastic blog, Practical Data Management, about some of the data issues she’s encountered, and how to avoid them.

At a team meeting today, we spent some time talking about the challenges of working with other peoples’ data, which is something we do quite frequently here.  The conversation then turned to the challenges I’ve had accessing, interpreting, and analyzing data from my predecessor, and that’s when I thought of the Ward Test.

 

Take a look at a particular dataset (where “dataset” is defined as what you need to write a paper).  If, like Mary Ward, you were shmucked by a car, would someone else be able to access, interpret, and analyze your data?  This is the Ward Test.

 

For many of us, the answer will be a resounding “NO!”, and the reasons for this are diverse and many, and not what I’m going to get in to here.  But what I do want to emphasize is the importance of metadata and organization.

Metadata are the bits of information that provide context for the data.  In my line of work, this could be where a bird was banded, it’s band number, species, age, sex, measurements, etc. that accompany some tracking data.  Technically, all the data are the positions recorded by the tracking device, but without the metadata, it’s utterly useless.

Even simple spreadsheets (do such things exist?!) could do with some metadata.  It also helps for when you go back to re-analyze data, or use data for a different project, or collaborate with others.  And as the culture of data sharing increases, this data about data (metadata!) will be increasingly important.

 

So, my goal is to include the metadata for each new dataset I generate.  In a spreadsheet, this can be included on an extra tab with definitions of what the columns mean.  In other cases, it would be a text file explaining what certain files are, and where the required information can be found.  I’ve written before about the importance of having a plan for what to do with data when a scientist passes away – the trick is making sure the data passes the Ward Test.

The same applies when sharing data – including metadata, and having data that pass the Ward Test will maximize the chance that whoever tries to use it has what they need to succeed.

Science Borealis

Science Borealis

Follow me on Twitter

My Tweets

Archives

Recent Posts

  • 2020 by the numbers
  • Science, people, and surviving in the time of a global pandemic
  • Queer in STEM ask me anything – another LGBTQ&A
  • Overseas field courses and equity, diversity & inclusion.
  • What a long year the last month has been

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • The Lab and Field
    • Join 12,875 other followers
    • Already have a WordPress.com account? Log in now.
    • The Lab and Field
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar