Using consistent R and LaTeX fonts in Org (or knitr, or Sweave)

I love good typography, even more so as Microsoft Word and PowerPoint have debased our standards. When I see a really fine piece of technical typesetting, it’s almost always done using TeX and friends. Beautiful LaTeX documents are easy to recognize. Beautiful R graphics are also easy to recognize. When literate programming systems like Sweave, Org mode, or knitr weave R graphics and LaTeX typesetting together, the beauty of both LaTeX and R is obvious, but documents can still look all wrong because of font clash.

Documents typeset purely in LaTeX can have a visual consistency that is hard to match. Take Kevin Lynagh‘s beautifully typeset undergraduate thesis. Kevin obviously cares about typography, so much that he ended up making many of his plots in using the LaTeX pgfplots package, based on the equally incredible PGF/TikZ. These are terrific packages, and ones that I use myself. But these are not replacements for R. To get R graphics output into LaTeX without any font clash, I need to either use something like the tikzDevice package, which has been dropped from CRAN and seems to have stalled, or generate PDFs and PNGs that complement my font choices in LaTeX.  [edit:  Kevin’s thesis source is available here.]

As wonderful as R is for plotting, changing the fonts in plots can be a bit cryptic. The base graphics package has methods, but lattice and ggplot2 are built on top of the grid package, which is another beast entirely. The extrafonts package described by Winston Chang is a terrific option for individual plots, but at least for me it didn’t seem quite clear how to change an entire literate document in a single line.

An alternative is the Cairo package which provides the ability to change fonts in any supported device.  Cairo also provides its own drop in replacement commands to the standard commands png(), pdf(), etc., which can be dropped in for a literate programing session.  I’d be interested to know what limitations others have found in these replacements.

Most of the time I use the LaTeX mathdesign package with Charter BT fonts. But I’m fickle, and sometimes use urw-garamond. When preparing Beamer presentations at NUS I tend to use Verdana, because that’s the university’s standard. Since I’m almost always using Org, with R blocks evaluated in the Babel literate programming framework, I want a solution in which all the graphics generated by R will match the LaTeX main text font as closely as possible.  When I move an R code block from a beamer presentation to a manuscript draft, I don’t want to have to do anything special.  It should just work.

The solution using Cairo appears to be pretty simple. In the beginning of an Org mode document in which the LaTeX will be typeset in Garamond, I can put the following

#+begin_src R :exports none :results silent :session
  mainfont <- "Garamond"
  CairoFonts(regular = paste(mainfont,"style=Regular",sep=":"),
             bold = paste(mainfont,"style=Bold",sep=":"),
             italic = paste(mainfont,"style=Italic",sep=":"),
             bolditalic = paste(mainfont,"style=Bold Italic,BoldItalic",sep=":"))
  pdf <- CairoPDF
  png <- CairoPNG

With that in place, my fonts in exported PDF or PNG graphics from R will all use Garamond, largely in keeping with the LaTeX font. Strictly speaking, the urw-garamond in the LaTeX mathdesign package is not the same as the system font on MacOSX that R will be using, but it’s pretty close. Note this has to be done in each R session if an Org-mode file is running multiple sessions.

So for example, a code block like

#+begin_src R :exports results :results graphics :session :file histogram.png 
x <- rnorm(100)
hist(x,main="This is a histogram using Garamond")

will result in a histogram like

The Cairo package is all well documented stuff, though I have to admit I found the other ways of handling graphics fonts confusing. But within the limits of the four choices available in Cairo, one can mix and match system fonts. If your locale supports UTF-8, you can do some crazy things.  For example, you can redefine the italic family to a font that supports Chinese characters, and create something completely nonsensical, such as a ggplot histogram with text in a mix of xkcd and Chinese, e.g.

#+begin_src R :exports results  :results graphics output :session :file chinese.png
  mainfont <- "xkcd"
  CairoFonts(regular = paste(mainfont,"style=Regular",sep=":"),
             bold = paste(mainfont,"style=Bold",sep=":"),
             italic = paste("SimSun","style=Regular",sep=":"),
             bolditalic = paste(mainfont,"style=Bold Italic,BoldItalic",sep=":"))
  pdf <- CairoPDF
  png <- CairoPNG
  qplot(x) + theme_bw() + ggtitle(expression(paste(italic("这是一个用"),"xkcd",italic("的直方图"))))

Alas, the XKCD font is not Unicode, so there are no Chinese xkcd characters.

In this case, what works for Org mode should work equally well for Sweave and knitr

LSM2241 Lecture 2: A temporarily Org-mode-less lecture

My feedback from students after the first week of LSM2241 (Introductory Bioinformatics) was generally quite positive. Many commented on the well organized slides (where Org mode is my secret weapon) and provided useful suggestions. So when I sat down to make my new slides, I was pretty excited, but as it turns out I may have a tough adhering to some of their input.

The second week of LSM2241 is devoted to bioinformatics databases. There’s a high level of variability in student background: some students not have had any introduction to databases in general, so I have to cover that too. Compounding matters, this is also one of the lectures I didn’t give last term. I of course looked through slides from previous years for ideas, and also found presentations from other universities. Some clear themes emerged.

  • In previous years, lecturers had covered a lot of material in a short period
  • Screen shots, screen shots, screen shots. Wow, were there a lot of screen shots. This was true not just in previous years of LSM2241, but in other courses covering the topic.
  • It seems really easy to inadvertently include a number of out of date or inaccessible databases
  • It seemed helpful to use a few related queries to drive most of the examples across different databases.

While I wanted to start my slides from scratch, I got behind in my preparations, and eventually realized that my best bet was to use the previous years’ material as a starting point and (uggh) use PowerPoint. My revisions are pretty extensive, but the basic structure is quite similar, which leaves a huge number of slides, about three times what I’d like to target! Ironically, part of the feedback from lecture 1 was to speak more slowly. This may pose a challenge….

Next time: update R graphs and deal with screenshots

I did make a few plots for the lecture using R, and have started putting data into Google Docs for charts I’ll have to update in future years. This should make for clean Org babel R source code blocks, like

dblink <- ""
myCsv <- getURL(dblink)
dbs <- read.csv(textConnection(myCsv))
names(dbs) <- c("Year","Databases")
dbs[[1]] <- as.factor(dbs[[1]])
p <- qplot(x=Year,y=Databases,data=dbs,geom="bar",fill="gray")
p + theme_bw() + opts(axis.text.x = theme_text(angle=90, hjust=1.2, size=16),
axis.text.y = theme_text(size=16),
legend.position="none") +
ylab("Databases in NAR Database issue")

For screen shots, I’m looking at webkit2png. When it comes time to revise this lecture and put it into Org, I should be able to generate all the screen shots from Org babel code blocks. Why go through the trouble? For one thing, if I decide to change the example, I can regenerate the slide deck appropriately without manually cutting screen shots. If database interfaces change, I can see the results in the screen shots and revise just those sections. I only wish I had thought of it in time to make the slides that way this term.

Using Org mode for course development and presentations

For the last two semesters I’ve been teaching part of LSM2241, Introductory Bioinformatics at NUS. This is the first serious exposure to Bioinformatics for students in the Life Sciences at NUS, so it’s a great opportunity to help ~160 students appreciate the increasingly central role bioinformatics has in the practice of biology.

This coming term I’m giving all the lectures. I don’t consider myself a great lecturer – on the contrary, I consider this an opportunity to practice – but I think second year undergraduates don’t benefit much from team-taught lecture courses, so one lecturer of my quality is better than four lecturers of varying quality.

The workhorse of my course planning and preparation is Emacs Org mode. I use it for planning my own work, for making presentations and handouts (via Beamer and LaTeX export), for preparing exams, and for tracking my own development as a teacher.

Last term I found some huge benefits of Beamer over PowerPoint or Keynote. For example, when we were discussing dynamic programming algorithms for sequence alignment, it was pretty straightforward to write an alignment program that emitted TikZ diagrams animating the steps of filling in an alignment matrix. I ended up using this twice in the lecture: first as a worked example in the slide copies the students received, and second during the lecture itself, so we could walk through the whole classroom and perform an alignment via student participation.

The animation would have been unthinkable in PowerPoint, since it added the equivalent of ~80 slides to the deck. With Beamer I could (i) make the animation, (ii) distribute a slide deck with the filled matrix from the animation, thus satisfying the demands of today’s students to have slides ahead of time, while still minimizing excessive printing, and (iii) generate a second animation from an unseen alignment problem for the class to work through together during the lecture. Doing it this year will be as simple as changing the input to the program and regenerating the slides.

This is all done via Org-babel, the miraculous multi-lingual literate programming environment supported by Org mode.

But before getting into that level of detail, I’ll mention one tweak I use in Org-beamer export. When exporting to LaTeX or HTML, Org mode knows to do the right thing for figures and tables. So, for example,

#+CAPTION: This is a caption

will create a figure with a caption when exported to any supported format, including LaTeX and HTML.

However, the LaTeX export uses \caption{}, which automatically adds a Figure 1 label to the caption of the first figure in the document. Likewise for tables. In normal LaTeX documents, that’s the right thing to do, but for Beamer numbered figures and tables aren’t needed.

But I still want captions! To fix this, I make sure to include the caption package in the header with

#+LATEX_HEADER: \usepackage[justification=centering]{caption}

Then I add a hook to convert all my \caption to \caption* as the last step in my Org Beamer exporter.

(defun latex-buffer-caption-to-caption* ()
(when org-beamer-export-is-beamer-p
(replace-regexp "\\(\\\\caption\\)\\([[{]\\)" "\\1*\\2" nil
(point-min) (point-max))))

(add-hook 'org-export-latex-final-hook
'latex-buffer-caption-to-caption* 'append)

The org-export-latex-final-hook captures all the hooks that run right before saving the generated LaTeX buffer, and org-beamer-export-is-beamer-p restricts the behavior to Beamer export.

Making interactive slides with Org mode and googleVis in R

There’s been a lot of justifiable excitement in the R community about Yihui Xie’s great work, and most recently the incorporation of his knitr package into the RStudio software. Knitr is seen, justifiably, as a worthy successor to SWeave for dynamic, beautiful report generation. It is all that, but as an Org mode user, I already have something better than Sweave for both reproducible research and literate programming, which works with more than 30 different computer languages, not just R. This is not to mention the astonishing amount of functionality that Org mode provides for any number of problems. I mean, really: it’s Emacs! (There are probably some great use cases for using knitr together with Org mode, but I haven’t come across any myself.)

But then Markus Gesmann wrote a interesting blog post about using knitr and the googleVis package to produce interactive HTML presentations by converting the knit-produced markdown to Slidy, and I wanted to do the same in Org mode. Markus gamely provided the Rmd source for his own slide show in a GitHub gist, so with his permission I borrowed some of the same visualizations (not the whole thing, which would be shameless) in an Org mode demo.

Org mode can easily export to HTML, and there are several documented options for creating slide shows using HTML export or a variant of it. My favorite is relatively new, an outstanding ClojureScript (compiled to JavaScript) org-html-slideshow setup, which supports separate projector, notes, and presenter preview views. Unfortunately, while that works great for ordinary slideshows, I haven’t been able to get that to work with the googleVis package output.

So instead I’m using org-slidy, which exports to Slidy, the same format Markus used.

It’s easy if you already have emacs, and pretty straightforward even if you don’t.

  1. Download org-slidy
  2. Put some files in your source directory (the .js, .css, and .org files), and make sure emacs can find org-htmlslidy.el
  3. M-x load-library org-htmlslidy
  4. Put the following in your org file:
    #+BIND: org-export-html-preamble nil
    #+SETUPFILE: ~/Dropbox/_support/org/
  5. Export to HTML and open in your browser with C-c C-e b

Any R code source blocks can be done as usual. The googleVis package creates HTML code for embedding into web pages, so the way to specify this is with #+BEGIN_SRC R :results output html, which will capture the output of print() statments on googleVis created R objects.

An example slideshow using R sourcecode blocks and googleVis is here (be sure to set your browser to full screen mode):


And you can get the actual Org mode file in a gist on GitHub.