The most important element of a "scientific" statement/result is the fact that others should be able to falsify it. The Tsunami of data that has engulfed astronomers in the last two decades, combined with faster processors and faster internet connections has made it much more easier to obtain a result. However, these factors have also increased the complexity of a scientific analysis, such that it is no longer possible to describe all the steps of an analysis in the published paper. Citing this difficulty, many authors suffice to describing the generalities of their analysis in their papers.
However, It is impossible to study the validity a result if you can't reproduce it. The complexity of modern science makes it vitally important to exactly reproduce the final result. Because even a small deviation can be due to many different parts of an analysis. Nature is already a black box which we are trying so hard to comprehend. Not letting other scientists see the exact steps taken to reach a result, or not allowing them to modify it (do experiments on it) is a self-imposed black box, which only exacerbates our ignorance.
To better highlight the importance of reproducibility, consider this analogy: the English style of scientific papers. Why do non-English speaking researchers have to invest a lot of time and energy in mastering English to a sufficiently high level for publishing their exciting results? Using an online translator is enough to convey the absolute final result of their analysis. For example that galaxies grow as a specific function of the age of the universe. Everyone will get the ultimate point they want to make through a poor translation. Then, why do journals request that the paper be written a very good English level?
It is because journals do not publish raw results. Understanding the method that the result was obtained is more important than the result itself. Good language (English in this case) standardizes the reading and allows the readers to understand the details of the method more easily and focus on the details. Otherwise, the readers have waste a lot of mental energy on what a mis-spelled word, or poorly written sentence (bad grammar/style) may mean in interpreting the result.
Exactly the same logic applies to reproducibility: without sufficient standards, readers cannot focus on the details, and will make different interpretations. Just as non-English speakers are forced to master English, people that are not trained in software MUST invest the time and energy to do so. You hear this statement a lot from many scientists: "software is not my specialty, I am not a software engineer, so the quality of my code/processing doesn't matter. Why should I master good coding style (or release my code), when I am hired to do Astronomy/Biology?". This is akin to a French scientist saying that "English is not my language, I am not Shakespeare. So the quality of my English writing doesn't matter. Why should I master good English style, when I am hired to do Astronomy/Biology?".
Other scientists should be able to reproduce, check and experiment on the results of anything that is to carry the "scientific" label. Any result that is not reproducible (due to incomplete information by the author) is not scientific: the readers have to have faith in the subjective experience of the authors in the very important choice of configuration values and order of operations: this is contrary to the definition of science.
This topic is recently gaining attention in the community. Nature's editorial board recently announced their new policy regarding software and methods in their published papers along with a Code and software submission checklist. Here is an excerpt from the new policy:
Authors must make available upon request, to editors and reviewers, any previously unreported custom computer code or algorithm used to generate results that are reported in the paper and central to its main claims. Any reason that would preclude the need for code or algorithm sharing will be evaluated by the editors who reserve the right to decline the paper if important code is unavailable.
For some recent discussions in the astronomical community in particular, please see Shamir et al. (2018), Oishi et al. (2018) and, Allen et al. (2018). Here, I propose my complete and working solution to doing research in a fully reproducible manner and go on to discuss how useful this proposed system can be during the research and also after its publication. This system has been implemented and is evolving in my own research. The GNU Astronomy Utilities (a collection of software for astronomical data analysis) was also created (and improved) in parallel with my research to provide the low-level analysis tools that were necessary for maximal modularity/reproducibility. Please see the Science and its tools section of the Gnuastro book for further discussion on the importance of free software and reproducibility.
Some slides are also available with figures to help demonstrate the concept more clearly and supplement this page.
All the software used are free software, enabling any curious scientist to easily study and experiment on the source code of the software producing the results. All the processing steps in the proposed reproduction pipeline are managed through Makefiles. Makefiles are arguably the simplest way to define dependency between various steps and run independent steps in parallel when necessary (to improve speed and thus be more creative).
When batch processing is necessary (no manual intervention, as in a reproduction pipeline), shell scripts usually come to mind. However, the problem with scripts for a scientific reproduction pipeline is the complexity. A script will start from the top/start every time it is run. So if you have gone through 90% of a research project and want to run the remaining 10% that you have newly added, you have to run the whole script from the start again and wait until you see the effects of the last few steps (for the possible errors, or better solutions and etc).
The Make paradigm, on the other hand, starts from the end: the final target. It builds a dependency tree to find where it should actually start each time it is run. Therefore, in the scenario above, a researcher that has just added the final 10% of steps of her research to her Makefile, will only have run those extra steps. This greatly speeds up the processing (enabling creative changes), while keeping all the dependencies clearly documented (as part of the Make language), and most importantly enabling full reproducibility. Since the dependencies are also clearly demarcated, Make can identify independent steps and run them in parallel (further speeding up the process). Make was designed for this purpose and it is how huge projects like all Unix-like operating systems (including GNU/Linux or Mac OS operating systems) and their core components are built.
The output of a research is either some numbers, a plot, or more formally, a report/paper. Therefore the output (final Makefile target) of the reproduction pipeline described here is a PDF that is created by LaTeX. Each step stores the necessary numbers, tables, or images that need to go into the final report into separate files. In particular, any processed number that must be included within the text of the final PDF is actually a LaTeX macro. Therefore, when any step of the processing is updated/changed, the numbers and plots within the text will also correspondingly change.
The whole reproduction pipeline (Makefiles, all the configuration files of the various software and also the necessary source files) are plain text files, so they don't take much space (usually less than 1 mega-byte). Therefore the whole pipeline can be packaged with the LaTeX source of the paper and uploaded to arXiv when the paper is published. arXiv is arguably one of the most important repositories for papers in many fields (not just astronomy) with many mirrors around the world. Therefore, the fact that the whole processing is engraved in arXiv along with the paper is arguably one of the best ways to ensure that it is kept for the future.
working template based on the proposed method above
has been defined which can be easily configured for any
file in that template gives full information how to
configure and adapt the pipeline to your research
needs. Please try it out and share your thoughts to make
it work more robustly and integrate better in the
Reproducibility is the main reason this system was designed. However, a reproduction pipeline like this also has lots of practical benefits that are listed below. Of course, for all the situations above to be maximally effective, the scripts have to be nicely/thoroughly commented for easy human (not just computer) readability.
While doing the research
- Other team members in a research project can easily run/check/understand all the steps written by other members and find possibly better ways of reaching the result or implement their part of the research in a better fashion.
- During the research project, it might happen that one of the parameters is decided to be changed or a new version of some of the used software is released. With this system, updating all the numbers and plots in the paper is as simple as running a make command and the authors don't have to worry about part of the paper having the old configuration and the other part with the new configuration. Manually trying to change everything in the text will be prone to errors.
- If the referee asks for another set of parameters, they can be immediately replaced and all the plots and numbers in the paper will be correspondingly updated.
- Enabling version control (for example with Git) on the contents of this reproduction pipeline will make it very simple to revert everything back to a previous state. This will enabling researchers to experiment more with alternative methods and new ideas, even in the middle of an on-going research. GitLab enables free private repositories which is very useful for collaborations to privately share their work prior to its publication.
- The authors can allow themselves to forget the details and keep their mind open to new possibilities. In any situation they can simply refer back to these scripts and see exactly what they did. This will enable researchers to be more open to learning/trying new methods without worrying about loosing/forgetting the details of their previous work.
- Other scientists can modify the parameters or the steps in order to check the effect of those changes on the plots and reported numbers and possibly find enhancements/problems in the result.
- It serves as an excellent repository for students, or scientists with different specialties to master the art of data processing and analysis in this particular sub-field. By removing this barrier, it will enable the mixture of the experiences of the different fields, potentially leading to new insights and thus discoveries.
- By changing the basic input parameters, the readers can try the exact same steps on other data-sets and check the result on the same text that they have read and have become familiar with.
Fortunately other astronomers have also made similar attempts at reproduction. A list of papers that I have seen so far is available below. With Nature's new policy regarding software and code in their papers, hopefully this list will greatly expand in the near future. If you know of any other attempts, please let me know so I can update the list.
- Essential skills for reproducible research computing: Contents of a workshop on reproducible research, some slides are also provided, along with a nice literature review and a Manifesto. In general Lorena Barba (author of the links here) and her team are making some great progress in this regard.
- Paxton et al. (2018, arXiv:1710.08424). Astrophysical Journal Supplement Series, 234:34. Their reproduction system is described in Appendix D6, as part of a larger system for all research using the MESA software.
- Parviainen et al. (2016, arXiv:1510.04988) Astronomy & Astrophysics, 585, A114. The reproduction scripts are available on GitHub.
- Moravveji et al. (2016, arXiv:1509.08652). Monthly Notices of the Royal Astronomical Society, 455, L67. The reproduction information is available on Bitbucket.
- Robitaille et al. (2012, arXiv:1208.4606) Astronomy & Astrophysics, 545, A39. The reproduction scripts are available on GitHub.
Since there is no adopted/suggested standard yet, each follows a different method which is not exactly like this paper's reproduction pipeline. This is still a new concept and thus such different approaches are great to make the concept more robust. Besides the suggested style here, please have a look at these methods too and adopt your own style (what you find the best in each) and share it.
Unfortunately not all researchers have the view described above on scientific methodology. In their view to science, only results are important. Therefore, a vague description (in the text of the paper) is enough and the exact method can be kept as a trade secret. To this class of researchers, doing science is similar to doing magic tricks (where the magician's methods are his/her trade secrets, and their audience only want results/entertainment). Here is a list of such papers that I have come across so far:
- Zhang et al. 2018 (Nature, June 4th, 2018, arXiv:1806:01280). In the "Code availability" section close to the end (which they had to add due to Nature's new policy on Availability of computer code and algorithm), they blatantly write: "We opt not to make the code used for the chemical evolution modeling publicly available because it is an important asset of the researchers’ toolkits".
Mohammad-reza Khellat and Alan Lefor kindly provided very useful comments during the creation of this reproduction system. Mosè Giordano, Gérard Massacrier, Ehsan Moravveji, Peter Mitchell, and Paul Wilson (in alphabetical order) kindly informed me of some of the links mentioned here.