computer science

  • Here’s my best advice to young computer science students, especially those who are interested in building systems: “I will engage in a heroic engineering effort and …” is always a far worse starting point for a course or long-term project than “I will engage in heroic system characterization and ….” The sooner you learn this, the happier you’ll be and the better work you’ll do.

    (I originally posted a shorter expression of this sentiment on Twitter.)

  • High praise for the Lua programming language from a respected PL and runtime-systems researcher. Ramsey’s observation that the design effort surrounding Lua “could not have been done at a North American university” rings especially true.

  • Here’s a good article by John Regehr on failures of peer review. This point, about the consequences of disciplinary conservatism, is pretty spot-on: “In computer systems, paper submissions from non-PhDs in industry are not uncommon. Usually these papers doesn’t quite look right and they often get rejected. In a number of cases I’ve seen reviews so blisteringly ugly that nobody in their right mind who received them would ever submit another paper to that community.”

  • John Carmack shares some thoughts on static program analysis for finding defects. The whole article is a good read, but I especially appreciated his tongue-in-cheek interpretation of Microsoft’s pricing model for their analysis tools (free in the Xbox developer kit, but very expensive in the Windows developer kit): “I read into this that Microsoft feels that game quality on the [Xbox] 360 impacts them more than application quality on Windows does.”

    Also, see this related and hilarious photo, especially if we went to grad school together. (Photo link via Pascal Cuoq.)


Research and development

April 18th, 2010  |  Tags: , ,  |  Leave a comment

Rob O’Callahan has done a lot of high-quality research and a lot of high-quality development; his thoughts on research (and, to a lesser extent, on development) are thus worth reading. The following passage should prove especially amusing if you’ve spent several years in some tiny subcorner of the discipline:

Since 1984 hundreds or thousands of papers have been written about program slicing, but in reality it is almost never used. I believe the research in this area has been completely pointless. Probably hundreds of millions of dollars have been wasted, not to mention enormous amounts of the time of very smart people. This is criminal misuse of resources.

(One wonders what current trend in programming language research we will have come to regard as “completely pointless” in 2020 or 2030.)

A memorial on Ada Lovelace Day

March 24th, 2010  |  Tags: ,  |  Leave a comment

Today is Ada Lovelace Day, an “international day of blogging to celebrate the achievements of women in technology and science.” I’ve been fortunate to work with many brilliant women (as classmates and as colleagues) and have no shortage of wonderful contributions to celebrate.

I’d like to focus this post on one person, though, and to share a brief memorial for Anne Moeller, who taught business and programming classes at my high school in Rockville, MD and who was willing to take a chance supervising an enthusiastic but green kid in an independent study term that wound up covering a lot of CS1 and CS2 material. She was energetic, extremely sharp, and did a great job of keeping me honest: if I started to get cocky after finishing a project quickly, she’d respond immediately with a much trickier problem. Over the course of that term, I learned a lot about inductively-defined data structures and perhaps even more about humility. (To the extent that I am able to use them, both skills have served me well since!)

We exchanged a few emails after I went to college, especially towards the end of my undergraduate career, when I decided that there might be something to this computing nonsense after all and wanted to thank her for the experience I’d had working with her. I didn’t send her a note when I finally finished my PhD, but I should have gotten in touch again when I had the chance; I just learned that Mrs. Moeller passed away suddenly last April.

Anne Moeller was a tireless and talented instructor who inspired, challenged, and amazed a generation of students. I’m thankful for her time, efforts, and encouragement, but most of all for her patience with a sixteen-year old kid’s tour around the outskirts of an absolutely fascinating discipline. May she rest in peace.

Making mazes

February 6th, 2010  |  Tags: , ,  |  Leave a comment

Part of a generated maze

When I was little, my dad would sometimes entertain me with miscellaneous software running on the VAX he had in his lab. One of the cooler things was a random maze generator, which probably came from the DECUS archives (we’ve both long since forgotten any details about the author or language) and would fill a whole page with twisting paths. This seemed pretty close to magic to me at the time.

Now that my son is old enough to enjoy mazes, I thought I’d replicate this old program so he could have random mazes of varying intricacy to solve. As far as I can tell, most common maze generation algorithms treat the grid of the maze as an undirected graph with edges between neighboring cells, build a spanning tree, and then place passageways between cells that are immediately connected in the spanning tree. The end result is guaranteed a path from the start to any other node in the maze, because the spanning tree must reach every node. (It’s a little sad that after spending more than a few years of grad school thinking about problems that boil down to graph reachability, this technique seems less like “magic” and more like “obvious.”)

I was able to write a program to build mazes very quickly, and have cleaned it up a bit for public release. You can download the gem package here (use sudo gem install willb-mazegen), or browse the source on github. It will make square-grid mazes to fill a sheet of letter paper, and can make a document consisting of one or many mazes. The code is fairly readable but not particularly fast, but you should be able to generate mazes at least as quickly as a small human can solve them. In the future, I would like to improve performance, optimize the generated PDF (it is currently pretty inefficient), and allow for mazes on non-square grids. (The technique itself is sufficiently general to allow for, e.g., hexagonal or triangular grid mazes, but other aspects of the code would have to change to enable this.)

If you’re just interested in seeing some mazes, here’s a set of twenty preschool-difficulty mazes to download: mazes.pdf. Or you can try this comically ridiculous example (but be warned that it is almost an 8mb file!): hugemaze.pdf.

The myriad delights of academic spam

July 16th, 2009  |  Tags: , ,  |  Leave a comment

If you look at the web pages of twenty American computer science professors, you will probably find at least twelve warnings indicating that one of the worst things a prospective student can do is send a form letter asking for an assistantship or for some aid in admission. Apparently, this sort of spam is very common and — as you might imagine, given the sort of person who is desperate enough to send out many impersonal, dishonest messages to strangers in the hope that some stranger will elect to stake some part of his or her professional reputation on the sender’s future academic work — typically fairly unsubtle. These warnings generally include the provision that the professor welcomes email from prospective students as long as it indicates that the student has some familiarity with the professor’s work and finds it to be a good fit for the student’s interests, aptitudes, and experience.

This sort of spam isn’t confined to professors, though. In fact, I got such a message this February:


I appreciate that this fellow — whose name I have redacted — has instructed his spamming script to mention something about some prose I’ve written, but “ACM Digital Library” is a web site, not a journal. In fact, the piece in question was published as a “Kernel Korner” article in the Linux Journal; I wrote it in 2001, as a very junior grad student. It is not a research article (and thus is unlikely to be “in accordance with” anyone’s research interests) and it is totally unrelated to any of the research I did for my dissertation. (I did briefly conduct some research tangentially related to that article before starting on my thesis work, but I never attempted to publish any scholarly papers on the matter.) Furthermore, anyone who read the article carefully would probably have noted that the technique I described for modifying kernel behavior via loadable modules no longer applied beginning with the 2.6 series of Linux kernels — that is, roughly since 2004. Finally, his script clearly needs a little work; note the funny spacing around the article title.

Unfortunately, I have no hiring authority, so I was forced to disregard this message. It’s a shame that I couldn’t pounce on the chance to hire someone who had almost completed his “B.Tech (Hons)” and clearly has at least rudimentary string-manipulation skills. I do wonder, though, if this young man got any responses. What would someone say? “Sure, kid, I can have you on the next plane from Kharagpur, INDIA. Don’t worry about sending your CV or any evidence of your eligibility to work in my country. I just hope you’re ready to do plenty of work on OS Security, Network Security, and time travel.”

Whereof one cannot speak

July 15th, 2009  |  Tags: , , ,  |  2 Comments

Almost a year ago, I posted a short note about how to set Polytonic Greek in the LaTeX typesetting system, inspired by my difficulties in typesetting a chapter epigraph for my dissertation, and included with it a challenge to my readers: namely, say something clever about the (unattributed) quote I had included as an example:

οὐ γὰρ μήποτε τοῦτο δαμῇ εἶναι μὴ ἐόντα·
ἀλλὰ σὺ τῆσδ’ ἀφ’ ὁδοῦ διζήσιός εἶργε νόημα

Today, someone did. Congratulations are due to D Jagannathan, who had something clever to say both about the quote itself — which is from the work of the presocratic philosopher Parmenides of Elea — and about the perils of using LaTeX to set texts in non-Latin alphabets. Fine work!

One (imperfect) way to render the quote into English, given by John Burnet in 1892, is “For this shall never be proved, that the things that are not are; and do thou restrain thy thought from this way of inquiry.” Parmenides is either my favorite or second-favorite presocratic, and I like this quote for two reasons: first, it is remarkably Tractarian for something written around two-and-a-half millennia before Wittgenstein destroyed positivism; second, it could be read to describe logic programming or mechanical proof search.

While I was unable to resist one of the basest impulses of computer scientists (viz., quoting philosophers in an effort to distance oneself from the fact that one is essentially in an engineering discipline; see also here), this quote at least was plausibly connected to the chapter it headed. I was, however, able to resist the temptation to include this quote in its original language in the final version of my dissertation. (Indeed, the only moment of weakness I had with regard to non-English languages was in the dedication, where it seems a little indulgence is justifiable.)

On dependent types

June 24th, 2009  |  Tags: , ,  |  Leave a comment

One of the perils of computing is this: one encounters a problem that has some unappealing solutions (or partial solutions) and one or more obvious workarounds. Nevertheless, one is compelled to think about the possibility of an elegant, general solution to a class of similar problems. That happened to me this morning, and I wrote it up at Chapeau.

Computer science and plumbing

May 12th, 2009  |  Tags: , , ,  |  1 Comment

I enjoyed this comment by Marc Hamann in a discussion of MIT’s switch away from SICP for undergraduate computer science education; the whole comment is thought-provoking, but this excerpt is especially delightful:

Somewhat more facetiously, I have to suggest that maybe they are just being merciful to their students, since it seems that many people, seduced by the excitement of SICP, go on to suffer miserably in their career as API and framework plumbers, wishing that being a programmer was actually the elegant and rational process that Abelson and Sussman had made it out to be.

SICP, if read carefully and properly, presents almost an entire undergraduate computer science curriculum in a semester. It’s a shame that MIT EECS students won’t be drinking from that firehose any more. Furthermore, I have long believed that Scheme is unbeatable as a language for teaching computer scientists rather than programmers. (Although “unbeatable” implies a partial ordering: I’d guess it’s possible to argue that Haskell or an ML-family language is equally suitable.)

Multicore Erlang

January 28th, 2009  |  Tags: , , , , , , ,  |  Leave a comment

I was glad to see Ulf Wiger’s slides and talk from his DAMP 2009 tutorial on Erlang programming for multicore processors. I would have been very interested in attending DAMP, but it conflicted with the last day of VMCAI (and thus with my paper), so I wasn’t able to go in person.

Jargon provenance

January 27th, 2009  |  Tags: , ,  |  2 Comments

In a question that bothered me every time my mind wandered yesterday, Alan Jacobs asked about nerd nomenclature on Twitter:

Why do geeks say “extensible” instead of “extendable”? It’s not like you can “extense” something. My theory: rms leads, others follow.

Specifically, Jacobs is referring to Richard Stallman’s characterization of the emacs editor (see here for a mention from 1981), which indeed struck me as a plausible candidate for a very early use of “extensible” in the jargon sense of “expanding the scope or functionality of a thing.” (Contrast this with the sense of extending the length of, for example, a telescoping mop handle.)

I checked the OED and was very surprised indeed to see that both “extensible” and “extendible” have a long history of established use. The first use the OED has for either is from 1477, in which “extendible” was used in the sense of increasing the physical extent of a thing (in this case, a scent). The first uses of each in the functional-extent or scope sense come from the mid-17th century.

So, while we can almost certainly blame Stallman for the currency of the functional sense of “extensible” in software and computer science jargon, at least I haven’t been using a nerd neologism all of these years. I’ve merely been living in a bizarro suburb of our language!

Buoyed by this modest etymological triumph, I decided to further justify my disciplinary jargon use through OED searches. Alas, the jargon-sense “orthogonal” (meaning roughly “independent” or “unrelated”), which has completely infected my technical writing and my technical thoughts, is completely unknown to the OED. I promise you, dear reader, my metanoia has already begun.

Mustard watches

January 23rd, 2009  |  Tags: , , ,  |  Leave a comment

It seems impossible that I haven’t already linked to Jean-Yves Girard’s pseudonymous masterwork “Mustard Watches: An Integrated Approach to Time and Food,” but it looks like I haven’t. Enjoy.

Appeals to authority

January 18th, 2009  |  Tags: , , ,  |  Leave a comment

I just won a months-old argument with a colleague. (I had, in fact forgotten that the argument had happened in the first place.) The specific term at the center of the argument and the identity of the colleague aren’t important, but the concession, which took the following form, was downright awesome: “We disagreed about the meaning of term X, but I have decided you are right because Patrick Cousot defined it the same way in his talk.”

Now that’s vindication.

Euclidean rhythms

January 15th, 2009  |  Tags: ,  |  Leave a comment

“The Euclidean Algorithm Generates Traditional Musical Rhythms” (pdf link) is a fascinating paper by Godfried Toussaint at McGill. Friends of this site will likely guess that I was powerless to resist reading the whole thing after glancing at the first paragraph:

What do African bell rhythms, spallation neutron source (SNS) accelerators in nuclear physics, Sturmian words and string theory (stringology) in computer science, Markoff numbers and two-distance sequences in number theory, drawing digital straight lines in computer graphics, calculating leap years in calendar design, and an ancient algorithm […] for computing the greatest common divisor of two numbers, originally described by Euclid, have in common? The short answer is: patterns distributed as evenly as possible. For the long answer please read on.

Touissant shows that the Euclidean greatest-common-divisor algorithm has the same structure as an algorithm, due to Bjorklund, for evenly scheduling n pulses in k units of time. (Bjorklund’s application was scheduling high-voltage power over intervals.) He then demonstrates that this algorithm can be used to generate musical rhythms that have appeared in music throughout recorded history; for example, scheduling three pulses over eight units of time results in the tresillo, or 3+3+2/8 rhythm. (This rhythm is familiar in African and Caribbean music; Touissant notes that it is also the rhythm of the bass part in Elvis Presley’s “Hound Dog.”)

This discussion alone would make for an excellent read, but it represents only the first fifth of the paper! Touissant also identifies a wide range of actual rhythms from world music that can be generated using this technique; compares rhythms generated using this technique to rhythms categorized in the aksak system of rhythms (which I first encountered, but not by that name, in the music of Bartók); and finally, makes convincing analogies with finding leap years, drawing straight lines on grids of pixels, and several other problem domains.

I haven’t had so much fun reading a research paper in ages. If you are reading this site, you will probably enjoy Touissant’s paper.

Rent-a-coder hilarity

November 26th, 2008  |  Tags: , , ,  |  1 Comment

This amazing “want-ad” for an impossible task is maybe the greatest thing I’ve seen on the internet this fiscal quarter:

The purpose of this project is to create a debugger program. This program will take as input the source code another program, and will analyze that other program and determine if it will run to completion, or have an error, or go into an infinite loop.

Predictably, many “bidding contractors” are in on the joke — “KurtG” and “GeorgeCantor” make appearances — but the best part are the completely earnest, form-letter replies from contractors who are willing to get this done on time and under budget, like this fellow (all errors are in the original):

Dear Sir/Madam, I looked at your bid request and I am here to tell you that we are really interested in this project and we are exacatly the coders you are looking for. I assure you that we can really do it the way you want.In 15 days we can bring you high quality results to your complete satisfaction, along with 90 days of warranty upon the delivery of the final version of our product. We’re looking forward to starting this project as soon as possible. Just give us a chance , you will never be disappointed. Its not about doing it , its about doing it professionally exactly according to the requirements.

Just awesome. One wonders what other undecidable problems could be contracted out for less than a grand. It would almost be worth the money to get one of these firms — especially those with an ironclad guarantee — to produce a deliverable.

UPDATE: all of the comments are now gone. Fortunately, I anticipated this and saved a screendump (pdf link).

(thanks to mef for the link)

Making a git mirror of Jikes RVM

October 4th, 2008  |  Tags: , , , ,  |  Leave a comment

I’ve been a longtime subversion user but have been switching some projects over to git lately. One major disadvantage that svn presented for my dissertation work arose because I was often interested in extending someone else’s code (like jikesrvm or soot): I’d want version control on my changes (and the opportunity to make branches, etc.), but I’d also want an easy way to track changes to the official project. Subversion does not, by itself, provide an easy way to do this.

Git makes it pretty easy to track changes to a repository that you don’t control while providing version control for local modifications and easy branching and merging. However, importing a large repository from svn can a really long time (almost a whole day for a medium-sized repository), and some svn servers (sourceforge in particular) are a bit flaky, which can lead to local repository corruption.

If you’re interested in having a local git mirror of a subversion repository, and the remote repository supports rsync, then it will be much faster to mirror the repository locally (via rsync) and then convert your local svn repository to a git repository. Then you have a local git repository, and you can just rsync and git svn fetch when it’s time to update from the original repository.

Here’s a quick snippet to show how it’s done for JikesRVM.

On another note, I couldn’t be more pleased with the GitHub hosting service! I’ve posted some code snippets on “gist,” their version-controlled pastebin. I am also using github to house some more substantial code — although the only public repository there right now is my LaTeX template for Wisconsin dissertations. Hopefully, I will be able to share more code in the future.

Sorted by irrelevance

August 21st, 2008  |  Tags: , , ,  |  Leave a comment

The ACM Digital Library should be useful. It is one of the largest single-site repositories of academic writing about computer science and holds almost every paper published within the last thirty years that bears an ACM copyright. Unfortunately, it does so behind an absurdly ridiculous interface.

Say you wanted to find Tom Knight’s classic 1986 paper “An architecture for mostly functional languages.” This paper appeared in an ACM conference (specifically, the ACM conference on Lisp and functional programming), so you’d be right to search the ACM DL for it. It would be reasonable to assume that searching for “knight mostly functional” would give you a good chance of finding the paper quickly.

If you did this, you’d be presented with a list of results “sorted by relevance.” The most “relevant” results, it appears, are a wide range of papers from the last ten years on speculative multithreading — from conferences, journals, and unrefereed newsletters — that cite Knight’s paper. The ACM’s search algorithm identifies Knight’s actual paper as the 46th most “relevant” search result for “knight mostly functional.” This would be risible if there were tens of thousands of search results for this string. Since there are only 48, it’s completely unconscionable.

Seek and ye shall find, I guess.