Archive

2009

Let me in my turn wish you all the best for 2009! I wish you a beautiful, entirely non-reproducible year with lots of great experiences!

2008 was the year in which this site got started, and to be honest, I am quite happy with the frequency at which I managed to post articles here. In its first year, the site also obtained a reasonably good visibility on Google, so nothing to complain about. It does remain to a large extent a one-way communication, but as I hear from colleague bloggers, that is not uncommon. Let me at the start of this year invite you again: if you read this blog, and like or dislike something I write, please post a comment! It will encourage me to continue writing, and make me feel a bit less lost in blogosphere.

And up to a wonderful 2009 now!

The volatility of URLs

I am getting worried these days about the volatility of URLs and web pages. I guess you all know the problem: it is very easy to create a web page, and hence many people do so. Great! However, after some years, only few of those web pages are still available. Common reasons include people retiring, or moving to other places, and therefore their web pages at their employer’s site disappear. Similarly, registering a domain name at some point in time does not mean you will keep on paying the yearly fees forever. Or also, web sites getting an entire re-design often result in broken URLs.

Why does this worry me so much?

Continue reading ‘The volatility of URLs’

Berlin 6 Open Access Conference

Last week, I attended the Berlin 6 Open Access Conference in Düsseldorf (Germany). It was an interesting conference, on different aspects of Open Access: making publications freely available online. There was a wide variety of talks, from publishers’ perspectives over financial models for Open Access and open standards,  to benefits of Open Access for developed and developing countries.

One of the sessions was organized by Mark Liberman around the topic of reproducible research. I gave a talk there about my experiences with reproducible research, but that’s not what I want to talk about here. I found it very interesting to see the wide range of subjects and perspectives that Mark gathered in that session. Slides of the entire session are available here for those who are interested.

Continue reading ‘Berlin 6 Open Access Conference’

What’s in a name?

Reproducible research, literate programming, open science, and science 2.0. All different namings, and (in my opinion) all covering largely the same topic: sharing code and/or data complementing a publication as a presentation of your research work. While literate programming is more focused on adding documentation to code, and science 2.0 seems to include the assumption that you put work in progress online, there really seems to be a very large intersection between these topics.

This clearly shows that from various sides of the scientific community, in very different fields of science, the same ideas pop up. That is a really exciting thing! And at the same time it also shows that there is a clear need for such open publication of a piece of research. And I think everyone will agree that there would be nothing nicer than being able to really start from the current state-of-the-art when starting to do research in a certain field?

Should all these efforts be merged under a single “label”? It would definitely be exciting. And it would create a huge impact, as a joint effort for “open science”, “reproducible research”, or whatever the name may be, would receive a lot of attention, and cannot be overlooked by anyone anymore. At the same time, every research domain needs other specifics or finetuning, and it is not clear to me now what the “best” setup would be for the type of work I am doing now. So maybe we should let these variations co-exist for some more time, and see later which ones survive, are the simplest to use, and which tools can be combined to create an optimal method for research.

But of course (if anyone is reading these posts), I would be very happy to hear your own opinion on this!

Scientific fraud

A few months ago, I read in a Belgian newspaper that 9% of the participants in a study among 2.000 American scientists said they had witnessed scientific fraud within the past three years. And it seems they were not talking about those cases where people use Photoshop to crop an image or so, but rather inventing fake results or falsifying articles.

Although I wasn’t able to find this back on the web with Google, I am quite sure the original authors checked the number. Wikipedia reports on another study, where the actual number was 3%. Anyhow, whether it is 3 or 9 percent, this number is much too high. Let us hope it can be taken down by requiring higher reproducibility of our research work. I do realize that there will always be people cheating, and falsifying results (Wikipedia even keeps a list of the most famous cases). But I also strongly believe that in the end, most researchers just want to do good work. And many of them perform non-reproducible work, just because they don’t feel the need for making it reproducible (yet). Or are too busy with their next piece of work to properly finish off the current one…

SIGGRAPH 2008

Impressive. Very impressive.

As you can see, I am again impressed by the annual SIGGRAPH conference that took place last August, and about which my colleagues reported. There were more than 28000 participants, and the acceptance ratio for the presented papers was below 20%. While the main focus of the conference is on computer graphics, it also includes a wide range of presentations on 3D, image and video enhancement, and image processing in general. Next to these technical sessions, there are also movie screenings, and a computer animation festival.

But, apart from the high quality and interesting mix of topics, I also really like the way papers are presented. Certainly for people like me, who did not attend the conference. Each paper starts off (after the title and author list) with a “telling illustration”, graphically illustrating the paper. Really nice to get a quick idea about the paper. Moreover, for most of those papers, the authors also have a nice video presenting their paper on their website. I have no idea whether that is mandatory, and whether one could find all those presentation videos on the ACM website. My colleagues also told me that all the presentations from this year’s SIGGRAPH conference would be recorded and made available online. I am curious! It’s still not the same as actually going there, but it is as close as I can get. For now.

Data set competitions

One of the reproducibility problems with many current papers is that everyone applies his new algorithm to his own set of data. So did I in my super-resolution work, too. A problem with that is that it is very difficult to assess whether the data set is used (a) because that was the one the author had at hand, (b) because it was the most representative one, or (c) because the algorithm performed best on that data set.

To allow more fair comparisons, competitions are being set up in various fields. Often in the period before a conference, a competition is set up, where everyone can try his algorithm on a common dataset given by the organizers.

Continue reading ‘Data set competitions’

Reproducible Research History (1)

To my knowledge, the reproducible research efforts in computational sciences were started by Jon Claerbout (who retired earlier this year) in the early 90s. In his Stanford Exploration Lab at Stanford University, Claerbout and his colleagues (working in seismic imaging) developed a system using Makefiles that allows to remove all figures, and reproduce them using a single Unix command. This allows any person (with a Unix/Linux system) to reproduce all the results in their work. I think it is about as close to “one-click reproducibility” as one can get! Claerbout and his lab performed a lot of the pioneering work in promoting reproducible research, which has spread later to various disciplines. A history by Claerbout himself is available here.

In their work, Claerbout and his colleagues make a distinction between three types of figures/results. First of all, and most common, there are easily reproducible results, which can be reproduced by a reader using the code and data contained in the electronic document. Secondly, conditionally reproducible results are results for which the commands and data are given, provided that certain resources are available (such as Matlab or Mathematica), or for which it requires more than 20 minutes to reproduce the results. And finally, non reproducible results, a label used for results that cannot be reproduced, such as hand-drawn figures, scans, or images taken from other documents for comparison.

Their Makefile setup was recently developed further by Fomel et al. in the Madagascar project, using SCons, a similar language to Makefiles, but which should make reproducibility even more simple, and cross-platform! See their project page for more details.

Middlebury Stereo

An article close to my current work on 3D now:

D. Scharstein and R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Journal of Computer Vision, 47(1/2/3), pp. 7-42, April-June 2002.

In their article, Scharstein and Szeliski make a comparison of stereo estimation algorithms. But they do not just offer this overview of algorithms. On their webpage, they also provide the source code, and a widely used dataset of stereo images. They also invite other researchers to try their own algorithm on this dataset, and upload the results. This has resulted over the years in a performance comparison of almost 50 stereo algorithms, nicely listed on their webpage.

A nice example of what reproducible research can do! I think we need a lot more of these comparisons on common (representative) datasets.

Reproducible Research in Medicine

I just read the following article:

C. Laine, S. N. Goodman, M. E. Griswold, and H. C. Sox, Reproducible Research: Moving toward Research the Public Can Really Trust, Annals of Internal Medicine, Vol. 146, Nr. 6, pp. 450-453, 2007.

A very interesting article, about how the journal “Annals of Internal Medicine” is promoting reproducible research. They do not require that all papers are reproducible, but they do ask the authors of each paper whether theirs is reproducible or not. If it is reproducible, they provide links to the protocol, data, or statistical code that was used.

While, certainly in medicine, this still does not guarantee that the entire research work is reproducible, it does give a lot of additional information (and credibility) about the presented work. I (as an ignorant researcher) also found it very interesting to read the description of the thorough editorial process that each paper undergoes. I have put an overview of reproducible research initiatives by journals on our RR links page. That is, the initiatives I know about of course. Feel free to let me know if you know other examples!

This initiative was (among others) initiated by an article about this topic by Peng et al. It would be great if other journals take over these examples, and reproducible research becomes the ‘default’ for a paper…