News from CRG
In order to make sense of genomic data, scientists are increasingly relying on a combination of computer software named pipelines. These pipelines process data and deliver analytical results such as genetic risks for instance. Unfortunately, the results of these pipelines are not always reproducible.
Now, a team of researchers at CRG, led by Cedric Notredame, have developed a workflow management system that ensures reproducibility in computational experiments. The system, named Nextflow, has been described in the current issue of Nature Biotechnology.
The main reason for irreproducibility is the complexity of modern computers. With all the libraries and software they contain, computers are like machines made of billions of moving parts. Even when using exactly the same pipeline and the same data, slight variations across computers can lead to irreproducibility. The solution to this problem is providing not only the data and the software, but also the complete pre-configured execution environment within a new generation of virtualization technology named containers. The CRG team implemented Nextflow as a tool to manage a computational workflow along with its dependencies by using these containers. “It is like freezing the experiment, so everyone aiming at reproducing it can do it the same way without having to manually re-introduce complex configurations. This way of doing things guarantees that the same dataset will produce the same results anywhere” explain the authors.
Di Tommaso et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316-319 (2017). doi: 10.1038/nbt.3820