I got a pointer earlier this week to a New York Times article about R. A very interesting article about the use of R in scientific communities and industrial research, mainly for statistical analysis. R is open source software, so it is free and has already taken advantage from contributions made by various authors. And (although I haven’t used it myself yet), it is a great tool for reproducible research. Using the package Sweave, authors can write a single document containing their article and the R code to reproduce the results and put them in place. This ensures that all the material is in a single place.
It also shows something about the amazing power of open source software developed by a community of authors (and typically users at the same time).
I seem to be dwelling quite some time on the web lately… After my post about the lifetime of URLs, here’s one about domain names and reproducibility. I recently noticed when looking around that there are quite some websites and domain names related to reproducible research.
reproducibleresearch.org is an overview website by John D. Cook containing links to reproducible research projects, articles about the topics, and relevant tools. It also contains a blog about reproducible ideas.
reproducibleresearch.com is owned by the people at Blue Reference, who created Inference for Office, a commercial tool to perform reproducible research from within Microsoft Office.
reproducibility.org is used by Sergey Fomel and his colleagues as home for their Madagascar open source package for reproducible research experiments.
reproducible.org is a reproducible research archive maintained by R. Peng at Johns Hopkins School, where the goal is to host a place for reproducible research packages.
Quite a range of domain names containing the word “reproducible” (or a derivative), if you ask me! And then I didn’t even start about the Open Research or Research 2.0 sites. Let’s hope this also means that research itself will soon see a big boost in reproducibility!
Let me in my turn wish you all the best for 2009! I wish you a beautiful, entirely non-reproducible year with lots of great experiences!
2008 was the year in which this site got started, and to be honest, I am quite happy with the frequency at which I managed to post articles here. In its first year, the site also obtained a reasonably good visibility on Google, so nothing to complain about. It does remain to a large extent a one-way communication, but as I hear from colleague bloggers, that is not uncommon. Let me at the start of this year invite you again: if you read this blog, and like or dislike something I write, please post a comment! It will encourage me to continue writing, and make me feel a bit less lost in blogosphere.
And up to a wonderful 2009 now!
I am getting worried these days about the volatility of URLs and web pages. I guess you all know the problem: it is very easy to create a web page, and hence many people do so. Great! However, after some years, only few of those web pages are still available. Common reasons include people retiring, or moving to other places, and therefore their web pages at their employer’s site disappear. Similarly, registering a domain name at some point in time does not mean you will keep on paying the yearly fees forever. Or also, web sites getting an entire re-design often result in broken URLs.
Why does this worry me so much?
Continue reading ‘The volatility of URLs’
Reproducible research, literate programming, open science, and science 2.0. All different namings, and (in my opinion) all covering largely the same topic: sharing code and/or data complementing a publication as a presentation of your research work. While literate programming is more focused on adding documentation to code, and science 2.0 seems to include the assumption that you put work in progress online, there really seems to be a very large intersection between these topics.
This clearly shows that from various sides of the scientific community, in very different fields of science, the same ideas pop up. That is a really exciting thing! And at the same time it also shows that there is a clear need for such open publication of a piece of research. And I think everyone will agree that there would be nothing nicer than being able to really start from the current state-of-the-art when starting to do research in a certain field?
Should all these efforts be merged under a single “label”? It would definitely be exciting. And it would create a huge impact, as a joint effort for “open science”, “reproducible research”, or whatever the name may be, would receive a lot of attention, and cannot be overlooked by anyone anymore. At the same time, every research domain needs other specifics or finetuning, and it is not clear to me now what the “best” setup would be for the type of work I am doing now. So maybe we should let these variations co-exist for some more time, and see later which ones survive, are the simplest to use, and which tools can be combined to create an optimal method for research.
But of course (if anyone is reading these posts), I would be very happy to hear your own opinion on this!
A few months ago, I read in a Belgian newspaper that 9% of the participants in a study among 2.000 American scientists said they had witnessed scientific fraud within the past three years. And it seems they were not talking about those cases where people use Photoshop to crop an image or so, but rather inventing fake results or falsifying articles.
Although I wasn’t able to find this back on the web with Google, I am quite sure the original authors checked the number. Wikipedia reports on another study, where the actual number was 3%. Anyhow, whether it is 3 or 9 percent, this number is much too high. Let us hope it can be taken down by requiring higher reproducibility of our research work. I do realize that there will always be people cheating, and falsifying results (Wikipedia even keeps a list of the most famous cases). But I also strongly believe that in the end, most researchers just want to do good work. And many of them perform non-reproducible work, just because they don’t feel the need for making it reproducible (yet). Or are too busy with their next piece of work to properly finish off the current one…
Welcome on my personal blog!
On these pages, I plan to post thoughts and ideas on reproducible research, image processing research, or other things I find interesting enough to share with “the world” (that means you). It is also meant for experimenting with this medium, so it is still a bit unclear to me what and how often I will post here. I guess that will also depend on your feedback…
To be honest, it’s not my first attempt at blogging. When I was still at EPFL, I already started a blog on reproducible research, but somehow I never managed to publish things regularly enough there. So this time I’ll try to keep it a bit broader, and write a bit more regularly.