Using consistent R and LaTeX fonts in Org (or knitr, or Sweave)

I love good typography, even more so as Microsoft Word and PowerPoint have debased our standards. When I see a really fine piece of technical typesetting, it’s almost always done using TeX and friends. Beautiful LaTeX documents are easy to recognize. Beautiful R graphics are also easy to recognize. When literate programming systems like Sweave, Org mode, or knitr weave R graphics and LaTeX typesetting together, the beauty of both LaTeX and R is obvious, but documents can still look all wrong because of font clash.

Documents typeset purely in LaTeX can have a visual consistency that is hard to match. Take Kevin Lynagh‘s beautifully typeset undergraduate thesis. Kevin obviously cares about typography, so much that he ended up making many of his plots in using the LaTeX pgfplots package, based on the equally incredible PGF/TikZ. These are terrific packages, and ones that I use myself. But these are not replacements for R. To get R graphics output into LaTeX without any font clash, I need to either use something like the tikzDevice package, which has been dropped from CRAN and seems to have stalled, or generate PDFs and PNGs that complement my font choices in LaTeX.  [edit:  Kevin's thesis source is available here.]

As wonderful as R is for plotting, changing the fonts in plots can be a bit cryptic. The base graphics package has methods, but lattice and ggplot2 are built on top of the grid package, which is another beast entirely. The extrafonts package described by Winston Chang is a terrific option for individual plots, but at least for me it didn’t seem quite clear how to change an entire literate document in a single line.

An alternative is the Cairo package which provides the ability to change fonts in any supported device.  Cairo also provides its own drop in replacement commands to the standard commands png(), pdf(), etc., which can be dropped in for a literate programing session.  I’d be interested to know what limitations others have found in these replacements.

Most of the time I use the LaTeX mathdesign package with Charter BT fonts. But I’m fickle, and sometimes use urw-garamond. When preparing Beamer presentations at NUS I tend to use Verdana, because that’s the university’s standard. Since I’m almost always using Org, with R blocks evaluated in the Babel literate programming framework, I want a solution in which all the graphics generated by R will match the LaTeX main text font as closely as possible.  When I move an R code block from a beamer presentation to a manuscript draft, I don’t want to have to do anything special.  It should just work.

The solution using Cairo appears to be pretty simple. In the beginning of an Org mode document in which the LaTeX will be typeset in Garamond, I can put the following

#+begin_src R :exports none :results silent :session
  library(Cairo)
  mainfont <- "Garamond"
  CairoFonts(regular = paste(mainfont,"style=Regular",sep=":"),
             bold = paste(mainfont,"style=Bold",sep=":"),
             italic = paste(mainfont,"style=Italic",sep=":"),
             bolditalic = paste(mainfont,"style=Bold Italic,BoldItalic",sep=":"))
  pdf <- CairoPDF
  png <- CairoPNG
#+end_src

With that in place, my fonts in exported PDF or PNG graphics from R will all use Garamond, largely in keeping with the LaTeX font. Strictly speaking, the urw-garamond in the LaTeX mathdesign package is not the same as the system font on MacOSX that R will be using, but it’s pretty close. Note this has to be done in each R session if an Org-mode file is running multiple sessions.

So for example, a code block like

#+begin_src R :exports results :results graphics :session :file histogram.png 
x <- rnorm(100)
hist(x,main="This is a histogram using Garamond")
#+end_src

will result in a histogram like

http://tucker-kellogg.com/blog/wp-content/uploads/2012/10/wpid-histogram.png

The Cairo package is all well documented stuff, though I have to admit I found the other ways of handling graphics fonts confusing. But within the limits of the four choices available in Cairo, one can mix and match system fonts. If your locale supports UTF-8, you can do some crazy things.  For example, you can redefine the italic family to a font that supports Chinese characters, and create something completely nonsensical, such as a ggplot histogram with text in a mix of xkcd and Chinese, e.g.

#+begin_src R :exports results  :results graphics output :session :file chinese.png
  Sys.setlocale("LC_CTYPE","en_US.UTF-8")
  library(Cairo)
  mainfont <- "xkcd"
  CairoFonts(regular = paste(mainfont,"style=Regular",sep=":"),
             bold = paste(mainfont,"style=Bold",sep=":"),
             italic = paste("SimSun","style=Regular",sep=":"),
             bolditalic = paste(mainfont,"style=Bold Italic,BoldItalic",sep=":"))
  pdf <- CairoPDF
  png <- CairoPNG
  qplot(x) + theme_bw() + ggtitle(expression(paste(italic("这是一个用"),"xkcd",italic("的直方图"))))
#+end_src

http://tucker-kellogg.com/blog/wp-content/uploads/2012/10/wpid-chinese.png

Alas, the XKCD font is not Unicode, so there are no Chinese xkcd characters.

In this case, what works for Org mode should work equally well for Sweave and knitr

Thoughts after LSM2241 Lecture 2

I just gave the lecture this morning, and haven’t yet reviewed student feedback, but I’m expecting it to be worse than the first week. Using other peoples’ slides is almost always a mistake, especially when the slides are so different, stylistically, from my own. In the end, I wound up with 120 slides for 90 minutes of lecture time, which is just not reasonable for how I lecture.

Further, the content was a bit unbalanced for the course. I should have noticed this when I was updating the slides. In keeping with the previous term, the PDB was used as an example database; that would be OK except that we’ll spend some of the last portion of the course discussing structure, so why spend time on the PDB now? Using the PDB as an example now precluded discussion of other equally relevant databases.

I also kept some discussion of database APIs, all of which could have been dropped since the students won’t really have an opportunity to use them in this course.

The outcome was pretty predictable: I rushed through the end of the lecture, and skipped sections that should never have been in the slide deck in the first place. Overall, it was not the lecture I wanted to give.

LSM2241 Lecture 2: A temporarily Org-mode-less lecture

My feedback from students after the first week of LSM2241 (Introductory Bioinformatics) was generally quite positive. Many commented on the well organized slides (where Org mode is my secret weapon) and provided useful suggestions. So when I sat down to make my new slides, I was pretty excited, but as it turns out I may have a tough adhering to some of their input.

The second week of LSM2241 is devoted to bioinformatics databases. There’s a high level of variability in student background: some students not have had any introduction to databases in general, so I have to cover that too. Compounding matters, this is also one of the lectures I didn’t give last term. I of course looked through slides from previous years for ideas, and also found presentations from other universities. Some clear themes emerged.

  • In previous years, lecturers had covered a lot of material in a short period
  • Screen shots, screen shots, screen shots. Wow, were there a lot of screen shots. This was true not just in previous years of LSM2241, but in other courses covering the topic.
  • It seems really easy to inadvertently include a number of out of date or inaccessible databases
  • It seemed helpful to use a few related queries to drive most of the examples across different databases.

While I wanted to start my slides from scratch, I got behind in my preparations, and eventually realized that my best bet was to use the previous years’ material as a starting point and (uggh) use PowerPoint. My revisions are pretty extensive, but the basic structure is quite similar, which leaves a huge number of slides, about three times what I’d like to target! Ironically, part of the feedback from lecture 1 was to speak more slowly. This may pose a challenge….

Next time: update R graphs and deal with screenshots

I did make a few plots for the lecture using R, and have started putting data into Google Docs for charts I’ll have to update in future years. This should make for clean Org babel R source code blocks, like

1 2 3 4 5 6 7 8 9 10 11 12 13 14
dblink <- "https://docs.google.com/spreadsheet/pub?key=0Amd94LRhVxVWdElNYVdHblVLRjZKR1lwaFFFZHVyWUE&single=true&gid=0&output=csv"
require(RCurl)
myCsv <- getURL(dblink)
dbs <- read.csv(textConnection(myCsv))
library(ggplot2)
names(dbs) <- c("Year","Databases")
dbs[[1]] <- as.factor(dbs[[1]])
p <- qplot(x=Year,y=Databases,data=dbs,geom="bar",fill="gray")
p + theme_bw() + opts(axis.text.x = theme_text(angle=90, hjust=1.2, size=16),
axis.title.x=theme_text(size=16),
axis.title.y=theme_text(size=16,angle=90),
axis.text.y = theme_text(size=16),
legend.position="none") +
ylab("Databases in NAR Database issue")

For screen shots, I’m looking at webkit2png. When it comes time to revise this lecture and put it into Org, I should be able to generate all the screen shots from Org babel code blocks. Why go through the trouble? For one thing, if I decide to change the example, I can regenerate the slide deck appropriately without manually cutting screen shots. If database interfaces change, I can see the results in the screen shots and revise just those sections. I only wish I had thought of it in time to make the slides that way this term.

Using Org mode for course development and presentations

For the last two semesters I’ve been teaching part of LSM2241, Introductory Bioinformatics at NUS. This is the first serious exposure to Bioinformatics for students in the Life Sciences at NUS, so it’s a great opportunity to help ~160 students appreciate the increasingly central role bioinformatics has in the practice of biology.

This coming term I’m giving all the lectures. I don’t consider myself a great lecturer – on the contrary, I consider this an opportunity to practice – but I think second year undergraduates don’t benefit much from team-taught lecture courses, so one lecturer of my quality is better than four lecturers of varying quality.

The workhorse of my course planning and preparation is Emacs Org mode. I use it for planning my own work, for making presentations and handouts (via Beamer and LaTeX export), for preparing exams, and for tracking my own development as a teacher.

Last term I found some huge benefits of Beamer over PowerPoint or Keynote. For example, when we were discussing dynamic programming algorithms for sequence alignment, it was pretty straightforward to write an alignment program that emitted TikZ diagrams animating the steps of filling in an alignment matrix. I ended up using this twice in the lecture: first as a worked example in the slide copies the students received, and second during the lecture itself, so we could walk through the whole classroom and perform an alignment via student participation.

The animation would have been unthinkable in PowerPoint, since it added the equivalent of ~80 slides to the deck. With Beamer I could (i) make the animation, (ii) distribute a slide deck with the filled matrix from the animation, thus satisfying the demands of today’s students to have slides ahead of time, while still minimizing excessive printing, and (iii) generate a second animation from an unseen alignment problem for the class to work through together during the lecture. Doing it this year will be as simple as changing the input to the program and regenerating the slides.

This is all done via Org-babel, the miraculous multi-lingual literate programming environment supported by Org mode.

But before getting into that level of detail, I’ll mention one tweak I use in Org-beamer export. When exporting to LaTeX or HTML, Org mode knows to do the right thing for figures and tables. So, for example,

#+CAPTION: This is a caption
[[file:/media/foo.png]]

will create a figure with a caption when exported to any supported format, including LaTeX and HTML.

However, the LaTeX export uses \caption{}, which automatically adds a Figure 1 label to the caption of the first figure in the document. Likewise for tables. In normal LaTeX documents, that’s the right thing to do, but for Beamer numbered figures and tables aren’t needed.

But I still want captions! To fix this, I make sure to include the caption package in the header with

#+LATEX_HEADER: \usepackage[justification=centering]{caption}

Then I add a hook to convert all my \caption to \caption* as the last step in my Org Beamer exporter.

(defun latex-buffer-caption-to-caption* ()
(when org-beamer-export-is-beamer-p
(replace-regexp "\\(\\\\caption\\)\\([[{]\\)" "\\1*\\2" nil
(point-min) (point-max))))

(add-hook 'org-export-latex-final-hook
'latex-buffer-caption-to-caption* 'append)

The org-export-latex-final-hook captures all the hooks that run right before saving the generated LaTeX buffer, and org-beamer-export-is-beamer-p restricts the behavior to Beamer export.

Computing with chromatin modifications

A few months ago my friend and former Millennium colleague Barb Bryant submitted a manuscript on “Chromatin Computing” for publication. She sent me a preprint, and we started thinking about what we could do together with the ideas she had put forward. Barb and I have since worked together on early versions of these problems, and today we (strictly speaking, Barb) gave a talk at ISMB on some of the results.

Graph with a Hamiltonian Path from 0 to 6

There is one path through this directed graph from node 0 to node 6 that goes through each node exactly once

What Barb did in her paper was very imaginative: she showed, formally, that modifications of chromatin could serve as a universal computing engine following a set of string rewriting rules. Seeing chromatin dynamics in this way is refreshing. One begins to think about what rules underlie chromatin mark changes that actually occur in cells, and how those rules affect biological outcome. In principle, chromatin states could potentially be engineered to solve real problems and thus form a novel type of synthetic biological computer.

There’s a lot more to it, but what we’ve been doing in the last few months is not building biological systems, but extending the ideas using in silico implementation of a chromatin computer. We have a simulator that we can use to understand how rule sets can be used to solve different problems. We’ve solved the original Hamiltonian path problem of Leonard Adelman (recapitulated in Barb’s paper) in a number of different ways. These include several non-deterministic solutions (my student Li Chenhao developed a compact and elegant representation) and a deterministic approach that performs a depth-first search of the graph but requires rules that operate on several regions of chromatin at once (like real complexes that form loops).

Barb’s talk was packed, and we both answered questions. We showed several animations of different solutions to the Hamiltonian path problem using a chromatin computer. The original and modified stochastic solutions animations are fun to watch, but the multi-site rule solution comes with a soundtrack if you use Google Chrome.

There are a bunch of ways we’d like to take this work. One clear challenge is to understand better how chromatin dynamics can be represented in a chromatin computer, starting with mining data available on real chormatin modifying complexes. Another is to implement learning systems using chromatin computers, and apply them to a range of problems. A third is to scour the details of chromatin dynamics for biological inspiration to real problems in computation (Ziv Bar-Joseph’s work is a good model for this).

Finally, there are obvious synthetic biology applications, such as building a biological chromatin computer to solve problems on a dish instead of on a chip. That’s a tall order, but something we can explore first by simulation.

Update: slides are posted on SlideShare

Making interactive slides with Org mode and googleVis in R

There’s been a lot of justifiable excitement in the R community about Yihui Xie’s great work, and most recently the incorporation of his knitr package into the RStudio software. Knitr is seen, justifiably, as a worthy successor to SWeave for dynamic, beautiful report generation. It is all that, but as an Org mode user, I already have something better than Sweave for both reproducible research and literate programming, which works with more than 30 different computer languages, not just R. This is not to mention the astonishing amount of functionality that Org mode provides for any number of problems. I mean, really: it’s Emacs! (There are probably some great use cases for using knitr together with Org mode, but I haven’t come across any myself.)

But then Markus Gesmann wrote a interesting blog post about using knitr and the googleVis package to produce interactive HTML presentations by converting the knit-produced markdown to Slidy, and I wanted to do the same in Org mode. Markus gamely provided the Rmd source for his own slide show in a GitHub gist, so with his permission I borrowed some of the same visualizations (not the whole thing, which would be shameless) in an Org mode demo.

Org mode can easily export to HTML, and there are several documented options for creating slide shows using HTML export or a variant of it. My favorite is relatively new, an outstanding ClojureScript (compiled to JavaScript) org-html-slideshow setup, which supports separate projector, notes, and presenter preview views. Unfortunately, while that works great for ordinary slideshows, I haven’t been able to get that to work with the googleVis package output.

So instead I’m using org-slidy, which exports to Slidy, the same format Markus used.

It’s easy if you already have emacs, and pretty straightforward even if you don’t.

  1. Download org-slidy
  2. Put some files in your source directory (the .js, .css, and .org files), and make sure emacs can find org-htmlslidy.el
  3. M-x load-library org-htmlslidy
  4. Put the following in your org file:
    #+TITLE:
    #+AUTHOR:
    #+BIND: org-export-html-preamble nil
    #+SETUPFILE: ~/Dropbox/_support/org/htmlslidy.org
  5. Export to HTML and open in your browser with C-c C-e b

Any R code source blocks can be done as usual. The googleVis package creates HTML code for embedding into web pages, so the way to specify this is with #+BEGIN_SRC R :results output html, which will capture the output of print() statments on googleVis created R objects.

An example slideshow using R sourcecode blocks and googleVis is here (be sure to set your browser to full screen mode):

slideshow.jpg

And you can get the actual Org mode file in a gist on GitHub.

Support Sam Husseini, journalist

This is not a political blog, but it’s the space I have, so occasionally there will be rants on issues or people I feel strongly about.  This is one, about my friend Sam.

Sam Husseini and I went to college together back in the 1980s. I tried to teach him to play guitar, he tried to get me to read Chomsky. Sam grew up in New York. When Sam and his father became naturalized US citizens during Sam’s junior year, Osama Farid Husseini briefly became Samuel Frank Hennessy; we bought him a bottle of liquor and a book of Irish pub jokes so he could learn the heritage of his temporarily adopted surname.

After graduation, Sam, who majored in Applied Mathematics (Computer Science) worked at Moody’s, which he disliked, but rather than taking a job offer with JP Morgan gave up his corporate career for independent journalism. It was a radical career shift, but characteristic of Sam made with reflection and thought. For Sam is, as much as anyone I know, a reasonable person. For about a year after that choice Sam stayed with me and some friends in New Haven, where he did some substitute teaching, and traveled back and forth to New York. During this time he was beginning his long work with FAIR, the persistent New York based media watchdog group. Eventually, Sam went to serve as Communications Director for the American-Arab Anti-Discrimination Committee, and then the Institute for Public Accuracy, which tries to provide alternative voices to the echo chamber of well funded think tanks inside the Beltway.

Sam was one of two groomsman at my wedding, and we’ve remained close despite global movement. We don’t always agree on politics, but that’s at least partly because Sam is fearless. Being a liberal (as I am) is much easier than advocating radical alternatives (as Sam does), especially when your vocation is to speak truth to power. But, once again, Sam is a person of reason and hope, optimistic about individual ideals and imaginative about politics: see his cleverly conceived VotePact for an example.

Just over a week ago, Sam was expelled from the National Press Club, where he was a longstanding member, for agitating. He asked a question about the legitimacy of the Saudi government to the former head of Saudi intelligence. Imagine, a journalist asking an uncomfortable question! Now, I can’t say I would have asked the question Sam asked in exactly the way he asked it, but at heart it’s a damn good question. The context, as Sam introduced his question, was the legitimacy of the Syrian government. Many mainstream journalists are asking questions about the legitimacy of the Syrian government. Those questions are being asked now, and not earlier, because the governments that have been supporting Syria are only now unwilling to defend or ignore Syria’s actions, as it turns it guns on its own people and their aspirations. The guns fire, the governments withdraw their support, and journalists at the National Press Club are able to discuss whether Syria’s government ever had basis for legitimacy.

If the Syrian government’s basis for legitimacy was always a fiction, what of the Saudis? In Chomsky’s anarchism, and perhaps Sam’s radicalism, the answer might be self-evident but apparently the question itself was too much for the National Press Club, who want their luncheons digested undisturbed. Perhaps the National Press Club only wants their members to ask questions wrapped in shiny packages, with “pretty please”, the way we teach children to make requests of adults. That was my first thought. But they don’t. As Sam points out in his open letter to the Press Club, he had been at least as animated and vigorous when asking questions of the Austrian neo-Nazi Jörg Haider, with a hearty support from the NPC moderator. So the problem for the NPC is clearly not who is doing the agitating, but who is being agitated.

Despite the title of this post, I’m not sure what ordinary, non-journalists can do to support Sam, except make our voices heard.  I’m open to suggestions.

The David Allen experience

I had never flown the now defunct America West before flying to Phoenix in the early fall of 1999, but I was aware of its nickname as “America’s Worst”. When I found my seat, I immediately noticed there was no overhead air vent: how cheap does a commercial airline have to be to save on basic ventilation? I was hating this trip before the plane pulled away from the gate.

My brooding was interrupted by the arrival of the person assigned to sit next to me, who was as sanguine and relaxed as I was stressed. I noticed his elegant leather flight bag and his practiced, effortless preparation for takeoff. I had traveled enough by then to know that experienced travelers didn’t usually spend cross-country flights in conversation with strangers, but five hours of complete silence could also be awkward. So when he took his seat, I just said “you must travel a lot”. Continue reading

Singapore steps up on proxy gambling

The Singapore government is, encouragingly, keeping its punitive focus on local employers who use foreign workers as proxy gamblers. The strongly worded statements coming from several ministries leave no doubt about the government’s position: the fault lies with the employers, not the foreign workers.

The Straits Times itself has an equally strident editorial which goes one step further by noting the strangely amoral views of employers who described their exploitation of these workers as providing opportunities for wealth and lessons in “life skills”. But the employers are not only amoral, they also seem to believe these absurdities: one quoted in the original article proudly described how he sent multiple foreign workers to gamble as a form of investment diversification. Perhaps Singapore needs to introduce revised math education in addition to its new morals education; both could be funded with a portion of revenues from the gambling industry. Nobody should get though the PSLE without understanding that making bets against the house is a road to poverty, not riches.

The ST editorial laments that “careful crafting of regulations to minimise social harm could not have foreseen unlikely breaches”.  True, this exact scenario would have been hard to imagine, but discouraging Singaporeans and permanent residents from gambling while welcoming foreigners creates an economic system for breaches, at S$100 a breach. With hindsight, isn’t it inevitable that poor foreigners would have been used to circumvent the system?

Gambling by proxy

The Straits Times (Singapore’s main English language newspaper), published a very disturbing article this week about employers who send foreign workers with cash to gamble on the employer’s behalf at one of Singapore’s two casino resorts. If they win, they can keep some of the money. If they lose a little, the employer takes the loss, but “if they [lose] too much money, their pay would be docked.”

The exploitation should be obvious to anyone familiar with the foreign labor environment in Singapore. “Foreign workers” is a catch all term for foreigners at the lower end of the job market: these men and women work in a variety of industries for monthly wages often measured in hundreds of dollars, plus room and board. Live-in maids typically have a small room adjacent to the kitchen of their employer.  For men, rooming is often hostel living or, for construction sites, on-site temporary dormitories. Everything in their lives depends on their continued employment and good relations with their employer. Many of them remit most of their monthly salaries to families back home in India, China, the Philippines or Bangladesh, who depend on them to build a better life.

Continue reading