CRG: New method to ensure reproducibility in computational experiments

CRG: New method to ensure reproducibility in computational experiments

News from CRG

In order to make sense of genomic data, scientists are increasingly relying on a combination of computer software named pipelines. These pipelines process data and deliver analytical results such as genetic risks for instance. Unfortunately, the results of these pipelines are not always reproducible. 

Now, a team of researchers at CRG, led by Cedric Notredame, have developed a workflow management system that ensures reproducibility in computational experiments. The system, named Nextflow, has been described in the current issue of Nature Biotechnology

The main reason for irreproducibility is the complexity of modern computers. With all the libraries and software they contain, computers are like machines made of billions of moving parts. Even when using exactly the same pipeline and the same data, slight variations across computers can lead to irreproducibility. The solution to this problem is providing not only the data and the software, but also the complete pre-configured execution environment within a new generation of virtualization technology named containers. The CRG team implemented Nextflow as a tool to manage a computational workflow along with its dependencies by using these containers. “It is like freezing the experiment, so everyone aiming at reproducing it can do it the same way without having to manually re-introduce complex configurations. This way of doing things guarantees that the same dataset will produce the same results anywhere” explain the authors.

More information:
CRG website

Reference article:
Di Tommaso et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316-319 (2017). doi: 10.1038/nbt.3820