• Home
  • About
  • Contact
  • Languishing Projects
  • Beyond Science
  • Other Blogging
  • Queer in STEM

The Lab and Field

~ Science, people, adventure

The Lab and Field

Tag Archives: Markdown

Breaking down the Markdown Summit

01 Sunday Sep 2013

Posted by Alex Bond in how to

≈ 1 Comment

Tags

Markdown, open science, writing tools

Earlier this week, as promised, I had a chat with Andrew MacDonald about the recent back and forth and back again about the use of writing tools in ecology. In typical Canadian fashion, we both really enjoyed our chat, and learned something from each other in the process

Absolutely fantastic chat with @polesasunder on Markdown, open science, & writing tools. Will digest some thoughts & post an update tomorrow

— Lab and Field (@thelabandfield) August 27, 2013

may you often have conversations that improve & change your thinking on a topic, like the one I just had with @thelabandfield — Andrew MacDonald (@polesasunder) August 27, 2013

Let me say that Markdown (and integrating it with R using the R package knitr) is REALLY NEAT. Andrew showed me an example of the departmental coffee co-op he runs where after entering each month’s consumption, the document automatically produces the most recent stats, graphs, and customer balances. Whereas I would have entered the data, generated new figures, and pasted them into the document manually before, the combination of Markdown and R/knitr is fantastic. In a more ecological application, consider an annual report where similar elements are required each year (e.g., temperature and precipitation plots, or a running graph of a bird population’s reproductive success). Just enter the data, “knit” the file, and you’re off to the races. Bloody brilliant if you ask me. Consider, too, a manuscript with a complex analysis that keeps getting tweaked. Instead of separating the stats/modeling, plotting, and writing, one could do them all in one self-updating document (well, except the writing. But if anyone figures that out, let me know). In my original response, I highlighted three issues that would be required for me to break out of the Word paradigm that dominates ornithology like wetness dominates the ocean:

  1. Integration with a reference database
  2. Track changes made by coauthors
  3. Available offline

There’s a command in the program git that will compare two markdown files, and highlight changed words in different colours. I haven’t played around with it, but Andrew sent an example:

here is an example of git diff –color-words @thelabandfield pic.twitter.com/TpmOWY0smi

— Andrew MacDonald (@polesasunder) August 31, 2013

Markdown doesn’t handle citations natively, but integrating with the program pandoc can take advantage of the LaTeX reference format, and programs like JabRef can be used to facilitate inline citations. It’s not as slick as the Endnote/Word interface in my view (especially when dealing with two papers that could be cited as “Smith et al. 2013”), but it’s workable.

But what we agreed was the biggest hurdle is the cultural shift required, particularly among field ecologists. i still know folks who manage citations in their manuscripts by manually typing them in. Among the 15-20 manuscripts I’m asked to review by journals and colleagues each year, I’d say that half have errors in the literature cited that reveal such a manual strategy. So getting people comfortable with any sort of command line operation or mark-up coding will require a significant shift in a field not known for its shifting.

Another challenge is in the way that tools like R, Markdown, and the open science movement are proselytized*. As I pointed out earlier, when confronted by open science advocates, many of us feel like the kid who got caught by the dentist not flossing between his incisors, and is subsequently chided. The focus can sometimes be on why Word, Endnote, and SPSS are bad, not why Markdown, and R are better options than Word, and SPSS. I’ve also found that concepts that some people take for granted get turned into jargon that makes communicating this to non-users challenging. As an example, I finally had to ask Andrew what “git” actually was – a program, a built-in command, or a term of derision often preceded by “smarmy”. For the record, git is a program that, depending on one’s operating sytem, has several graphical user interfaces (GUIs) that avoid the need for command line typing.

*Obviously this isn’t a blanket statement, but it’s been mentioned to me by at least 2 other independent sources, so it must be a thing.

As Andrew and I wrapped up our 2-hour-long Skype/Google video chat, he asked me whether I would use Markdown et al. If I were writing a single-authored manuscript or report, I might. But right now, it will take some playing around to see what if I can make it work. I’ll let you all know.

Postscript: I wrote the original text of the post in Markdown, and thanks to Andrew for looking it over this weekend.

Academic hipsters redux (and why open science is like going to the dentist)

09 Friday Aug 2013

Posted by Alex Bond in opinion

≈ 9 Comments

Tags

LaTeX, Markdown, open science, R, workflow

Wow – ecologists sure like their writing programs.

After posting yesterday on why I’m reluctant to move out of my established Word/Endnote workflow, the post quickly became the most read in a single day on The Lab and Field (maybe I should have also put “Bayesian” or “frequentist” in the title!).

Folks brought up some great points in the comments, and I wanted to elaborate, and add some perspective.  I’m also not ashamed to say I was swayed by some arguments.

First, I don’t mean to imply that advocates for Markdown/LaTeX/R etc. think of themselves as superior, or look down on those who use Word/Endnote/SPSS.  When I was first learning R (a tool I now use daily, and have taught others to use), I had some pretty sour experiences dealing with more advanced useRs.  Yes, these are the exception to the rule, but their smugness left an off taste in my mouth.  And apparently, I’m not alone:

@ibartomeus @ftmaestre @thelabandfield I’m not saying don’t be open to new things, but these things are a tool rather than a religion (1/2)

— Franciska de Vries (@frantecol) August 9, 2013

@ibartomeus @ftmaestre @thelabandfield And people who don’t use them are not 1) stupid 2) conservative 3) against open science (2/2)

— Franciska de Vries (@frantecol) August 9, 2013

Second, there are non-monetary costs of using Word.  The format is proprietary, meaning you need a Word license to access the file (or convert it to another format in something like OpenOffice Writer).  For many of us, this isn’t an issue since Word is standard-issue on workplace machines.  Switching between operating systems (Mac to Windows, anything to Linux) can be fraught with formatting errors though.  Plain text formats (like Markdown and LaTeX) don’t suffer the same fate since the formatting is part of the actual document, which itself is plain text and readable in freely-available programs on any platform.  A first-order heading in Markdown, for example, looks like this:

This is the title of an article
===============================

Markdown and LaTeX (and their ilk) are fantastic if you need to embed equations, code, etc, which is painfully awful to do in Word.  This isn’t something I normally have to deal with (though when I do, I use LaTeX).  Most of the work in conservation biology / ecology / ornithology that I’ve done has used small datasets (>2000 data points), at most 1 or 2 simple equations in the manuscript, and relatively simple stats (general/generalized linear models).  The places where I’ve worked have had site licenses for Word and Endnote, so I’ve always had access to them.

The tools I’ve used have worked for the problems at hand.  But they may not for you.  You might be at a non-profit that can’t afford multiple licenses for MS Office or Endnote.  You might be doing lots of modeling and require multi-line equations in your manuscripts.

There is also a philosophical argument to be made, however.  When I move on to my next job, will I have access to the same tools? Will my previous work be lost to the Proprietary File Format gods?

I’m heading out of town for a week, but when I get back, I’m going to work with Andrew MacDonald and Gavin Simpson, two strong proponents of open science, to see if I can/want make the switch to 100% open software (heck, if I can do it, I’m certain just about anyone can!).

Here are the features of my current workflow I’d require, in somewhat decreasing order of importance to me:

  • Integration with my reference database (currently ~3300 entries), ideally including something similar to Endnote’s “Cite While You Write” feature, and the ability to easily format references to a journal’s particular style.
  • Ability for coauthors to comment on and edit manuscript drafts easily and concurrently, and to track, accept/reject these changes.
  • Work across platforms (Windows & Mac), and not require an internet connection (rain days in the field are some of my more productive writing days, after all!).

Data from spreadsheets (the most common format of data in my life) can be stored in *.csv files, and analysed in R (if you use R, and haven’t tried the Rstudio interface yet, give it a go).  If we can work this out, I’ll try it out for the fall semester to make a good comparison with my current system.  I’m already a half-convert (I use R, and have used LaTeX).  It might turn out to be something that works for me (which is the most important point), and I’ll keep everyone updated.

Why have I (and many others) been resistant to this change? Because it means admitting we’ve failed at something.  Confronting it is like going to the dentist – sure, you brush your teeth, perhaps floss occasionally, and for most of the year, that’s fine. But when you sit in that chair, and are told you need a filling, you feel like crap, and are embarrassed.  Dealing with “Open Science” questions of file format, and how repeatable analyses are is awkward for most of us because if we look critically at our own work, it’s often as closed as a nun’s habit on Sunday. Sure, we could (sometimes) repeat analyses, but can anyone else if we send them the data? But openness of data and repeatability in ecology is another post for another day.

Have I completely changed positions? No. Use what works for you. If you use Word to write your manuscript, keep data in Excel files, and run all your stats in SPSS I could care less. If you use Markdown/Pandoc to submit your paper that was analysed in R, I care equally as little.  Above all else, I care about good science. Can you do good science in Excel, Word, Endnote and SPSS? Sure. Can you do bad science using Markdown, LaTeX, Pandoc, and R? Absolutely.  But the reverse of both is equally as true.  As I wrote in my very first post, one shouldn’t let the study site or focal species drive the research questions.  Pick the best system to answer your question.  Similarly, pick the best software to do what you need to do.  And if that’s Word, Markdown, or an XF stub nib for your fountain pen, that’s OK.

Beware the academic hipster (or, use what works for you) UPDATED

08 Thursday Aug 2013

Posted by Alex Bond in opinion

≈ 34 Comments

Tags

advice, Github, Markdown, R, software, word processors, writing

UPDATE: Be sure to read the comments below, and my response

As a newly-minted PhD student, I was talking with a friend about writing papers.  “Use LaTeX”, he said.  I thought he meant the rubbery material commonly found in lab gloves.  But apparently not.  LaTeX (pronounced “lay-tech”) is typesetting software that he used for writing papers.

Eager to be on the cutting edge of scholarship, I spent a few days learning how LaTeX worked, how to insert symbols, figures, and tables.  I even produced my thesis proposal with it.  But my supervisor used Word exclusively, and I had no compelling reason to use LaTeX over Word, so I switched back.

Fast-forward a few years.  Now, everyone should be using markdown in a plain text editor, doing statistics in R, uploading versions to github or figshare, and managing citations with JabRef, BibTex or Mendeley.  Apparently, Word, Excel, Endnote, and SPSS are things of the past.  Special sessions at the 2013 Ecological Society of America meeting seem to be the nail in the proverbial coffin.  Some are even calling these new tools essential pieces of software for students.

There is a movement afoot to move the process of writing science out of Microsoft Word, and into other “better” formats like LaTeX, or Markdown with the argument that “researchers shouldn’t waste time on formatting, just the text of what they’re writing”.  They can then keep version control using something like GitHub, and invite collaborators to do the same.  This also keeps science open, since scientists aren’t beholden to a proprietary file format.

But in my mind, there are two arguments: the practical (A is tangibly better than B), and the philosophical (A is better than B because of ethical, moral, or philosophical reasons).  These are both important discussions to have, but in this post, I’m going to focus on the first.

Learning Curve

I’ve used Word for my typing needs since about 1997 (prior to which, I used Clarisworks, and WordPerfect, two functionally similar programs).  I know how to easily insert commonly used non-Roman letter symbols (like β), and most of my work (>95%) doesn’t extend beyond simple mathematical symbols or diacritical marks (like ±, Σ, or é).  I use minimal formatting in Word (bold, italics, line numbers, maybe changing the font size of the title), and after almost 20 years, I’ve gotten pretty good at Ctrl-B (or, in the last 10 years, command-B).

Coauthor inertia

The vast majority of my work is collaborative to some degree.  Whether it’s a supervisor or boss, or a larger group of other researchers, someone’s going to read, comment on, revise, and critique any paper I write before it goes to the journal.  Word is ubiquitous, while these other methods are not.  And like me, my coauthors are most familiar with Word, and use its Track Changes feature to make suggestions, comment on text, and insert their own edits.

Reference integration

This is really the deal-breaker for me.  Since 2005, I’ve used Endnote to manage my reference papers, and I use the “Cite While You Write” feature in every paper.  Basically, this means I can write something like “Birds have feathers, and can fly (Gill 2007)”, and Endnote will drop the full citation (in the specified format) in the Literature Cited section.  How cool is that?  It also makes reformatting for different journals relatively easy.  Yes, there are other types of programs that can do that for you (e.g., BibTex), but there’s a learning curve, and many hours updating citation keys so that there aren’t 4 “Jones2007”s.

Cost & Access

Word (and to a lesser extent, Endnote) are readily available at most research organizations, or are relatively cheaply obtained (let’s say a maximum of $200).  If you want to keep your projects private, GitHub will run you $7/month (or about $200 over 2 years), while the rest are free.  Word and Endnote are perpetual licenses.  True, universities and research organizations pay for these, but it’s unlikely that will change since the programs are used by non-academic staff, too.

Academic hipsters

The following was just tweeted from the 2013 ESA conference

Do it MT @ucfagls @recology_ : “throw away MS Word and pick up Markdown” – great advice in the reproducible research workshop #ESA2013

— Andrew MacDonald (@polesasunder) August 5, 2013

@thelabandfield But some of us want to have reproducible research so embed R or Python for the analysis in paper @polesasunder @recology_

— Gavin Simpson (@ucfagls) August 5, 2013

The implication, whether intended or not, is that those of us still using Word aren’t doing reproducible research.

Now before folks get their open sources all in a knot, I’m not just being a Luddite.  I use R regularly.  I’ve also used LaTeX for one manuscript.  I’m not advocating against using any of these tools if they’re the right tools for the job.  What I’m saying is don’t use them for the sake of using them–a form of what I could call academic hipsterism.

Feel like I should write an R package. I don’t have anything that needs doing, it just feels like it’s what all the cool kids are doing now.

— Steven Hamblin (@BehavEcology) August 5, 2013

Case in point.

My experiences with other early-career researchers, collaborators, supervisors, and grad students is that 99% of them will keep their data in Excel, write the manuscript in Word, and some will integrate references using Endnote (important point: the same applies to non-Microsoft products like Apple’s Pages and Numbers, OpenOffice etc.).

And for a good chunk of the statistical analyses I do, or that are in papers I read, review, and co-author, it doesn’t matter if they were done in R, or SPSS, or SAS, or Minitab, or JMP, or many other common statistical programs.

Are there issues with all these pieces of software? Yes. Are there issues with any piece of software? Yes.  Has a manuscript in ecology/zoology been rejected because the authors used a particular program to compose their text? I don’t think so.

Jeremy Fox at Dynamic Ecology wrote about how he keeps on top of the literature.  His point was that his system works for him, and yes, there are other systems out there.  The interface that I set up on my computer between Word and Endnote when I started my MSc aeons ago still works for me.  It also works with my coauthors, all of whom use Word as a primary text editor for manuscripts, and it works for journals, all of which accept submissions in Word format, or the easily-generated PDF.

Are tools like markdown, LaTeX, and github useful? To some, they are.  But they’re not yet useful to me. If they look useful to you, check them out – they just may be. But don’t feel beholden to adopt the latest software trend.

30 years ago, John Weins wrote in The Auk on the perils of word processors:

John Wiens on the perils of using word processors in an editorial in The Auk, 1983

John Wiens on the perils of using word processors in an editorial in The Auk, 1983

Has word processing improved how science is disseminated? Of course.  Perhaps we could say the same for the current crop of new tools in manuscript writing and statistics. But not for me, at least not yet.

I’m not saying these new pieces of software are terrible and useless.  I’m saying that I’m not inclined to use them because I don’t see how they are materially better than my current system.  Sometimes, it seems like the argument from the non-Word proponents is that “our way is better than yours in every case” (see the quote tweets above), which isn’t the case.

For what it’s worth, I’m going to have a lengthy skype chat with Andrew MacDonald later this month about the advantages of Markdown, and integrating it with BibTex.  I might even try it.  I’ll let you all know how it goes.

— — —

As a quick note, I’m off to the Society of Canadian Ornithologists meeting in Winnipeg, and won’t be as quick to approve new commenters, or respond to comments. Thanks for your patience. -AB

Science Borealis

Science Borealis

Follow me on Twitter

My Tweets

Archives

Recent Posts

  • 2020 by the numbers
  • Science, people, and surviving in the time of a global pandemic
  • Queer in STEM ask me anything – another LGBTQ&A
  • Overseas field courses and equity, diversity & inclusion.
  • What a long year the last month has been

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • The Lab and Field
    • Join 12,875 other followers
    • Already have a WordPress.com account? Log in now.
    • The Lab and Field
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...