, , , ,

Wow – ecologists sure like their writing programs.

After posting yesterday on why I’m reluctant to move out of my established Word/Endnote workflow, the post quickly became the most read in a single day on The Lab and Field (maybe I should have also put “Bayesian” or “frequentist” in the title!).

Folks brought up some great points in the comments, and I wanted to elaborate, and add some perspective.  I’m also not ashamed to say I was swayed by some arguments.

First, I don’t mean to imply that advocates for Markdown/LaTeX/R etc. think of themselves as superior, or look down on those who use Word/Endnote/SPSS.  When I was first learning R (a tool I now use daily, and have taught others to use), I had some pretty sour experiences dealing with more advanced useRs.  Yes, these are the exception to the rule, but their smugness left an off taste in my mouth.  And apparently, I’m not alone:

Second, there are non-monetary costs of using Word.  The format is proprietary, meaning you need a Word license to access the file (or convert it to another format in something like OpenOffice Writer).  For many of us, this isn’t an issue since Word is standard-issue on workplace machines.  Switching between operating systems (Mac to Windows, anything to Linux) can be fraught with formatting errors though.  Plain text formats (like Markdown and LaTeX) don’t suffer the same fate since the formatting is part of the actual document, which itself is plain text and readable in freely-available programs on any platform.  A first-order heading in Markdown, for example, looks like this:

This is the title of an article

Markdown and LaTeX (and their ilk) are fantastic if you need to embed equations, code, etc, which is painfully awful to do in Word.  This isn’t something I normally have to deal with (though when I do, I use LaTeX).  Most of the work in conservation biology / ecology / ornithology that I’ve done has used small datasets (>2000 data points), at most 1 or 2 simple equations in the manuscript, and relatively simple stats (general/generalized linear models).  The places where I’ve worked have had site licenses for Word and Endnote, so I’ve always had access to them.

The tools I’ve used have worked for the problems at hand.  But they may not for you.  You might be at a non-profit that can’t afford multiple licenses for MS Office or Endnote.  You might be doing lots of modeling and require multi-line equations in your manuscripts.

There is also a philosophical argument to be made, however.  When I move on to my next job, will I have access to the same tools? Will my previous work be lost to the Proprietary File Format gods?

I’m heading out of town for a week, but when I get back, I’m going to work with Andrew MacDonald and Gavin Simpson, two strong proponents of open science, to see if I can/want make the switch to 100% open software (heck, if I can do it, I’m certain just about anyone can!).

Here are the features of my current workflow I’d require, in somewhat decreasing order of importance to me:

  • Integration with my reference database (currently ~3300 entries), ideally including something similar to Endnote’s “Cite While You Write” feature, and the ability to easily format references to a journal’s particular style.
  • Ability for coauthors to comment on and edit manuscript drafts easily and concurrently, and to track, accept/reject these changes.
  • Work across platforms (Windows & Mac), and not require an internet connection (rain days in the field are some of my more productive writing days, after all!).

Data from spreadsheets (the most common format of data in my life) can be stored in *.csv files, and analysed in R (if you use R, and haven’t tried the Rstudio interface yet, give it a go).  If we can work this out, I’ll try it out for the fall semester to make a good comparison with my current system.  I’m already a half-convert (I use R, and have used LaTeX).  It might turn out to be something that works for me (which is the most important point), and I’ll keep everyone updated.

Why have I (and many others) been resistant to this change? Because it means admitting we’ve failed at something.  Confronting it is like going to the dentist – sure, you brush your teeth, perhaps floss occasionally, and for most of the year, that’s fine. But when you sit in that chair, and are told you need a filling, you feel like crap, and are embarrassed.  Dealing with “Open Science” questions of file format, and how repeatable analyses are is awkward for most of us because if we look critically at our own work, it’s often as closed as a nun’s habit on Sunday. Sure, we could (sometimes) repeat analyses, but can anyone else if we send them the data? But openness of data and repeatability in ecology is another post for another day.

Have I completely changed positions? No. Use what works for you. If you use Word to write your manuscript, keep data in Excel files, and run all your stats in SPSS I could care less. If you use Markdown/Pandoc to submit your paper that was analysed in R, I care equally as little.  Above all else, I care about good science. Can you do good science in Excel, Word, Endnote and SPSS? Sure. Can you do bad science using Markdown, LaTeX, Pandoc, and R? Absolutely.  But the reverse of both is equally as true.  As I wrote in my very first post, one shouldn’t let the study site or focal species drive the research questions.  Pick the best system to answer your question.  Similarly, pick the best software to do what you need to do.  And if that’s Word, Markdown, or an XF stub nib for your fountain pen, that’s OK.