Connecting the dots with combined data

Biologist Bas Heijmans (LUMC) is the leader of the Biobank-based Integrative Omics Study (BIOS) Consortium, one of BBMRI-NL’s Rainbow projects. Together with other Principal Investigators – Joyce van Meurs (Erasmus MC), Rick Janssen (VUmc) Peter-Bram ‘t Hoen (LUMC) and Lude Franke (UMCG) – he developed a large-scale omics data infrastructure. The infrastructure is the first of its kind in the Netherlands, and probably worldwide. “When we presented it to NIH representatives in the US, they were quite impressed.”

Heijmans initiated the BIOS project six years ago in close collaboration with Lude Franke, under BBMRI-NL 1.0. The goal of the project was to develop a research infrastructure, that could help understand the pathway from genes to disease development. “We know a lot about genetic variation, but still we don’t understand how genetic risk factors lead to diseases. Bringing all available data together could contribute to a better understanding and eventually to better treatment.” This month two papers about this infrastructure have been published in the renowned journal Nature Genetics (see press release).

Understanding the pathway
BIOS brings together different types of omics data, such as genome, gene expression, DNA regulation and methylation data, from over 4000 individual people from Dutch population studies.”  Heijmans: “We started with the idea that we wanted to understand the mechanism how gene-related factors contribute to the development of diseases.” The role of the environment is more and more acknowledged.  “Now we are going towards a broader vision. We want to understand the various causes of diseases in groups of people, looking at both genetic and environmental factors.” Although Heijmans states there is still a long road to go, the end goal is clear. “In the long term, we hope that omics data enables us to make a distinction between different causes of the same disease and develop personalized treatments for this.”

A logistic process
The BIOS omics infrastructure is based on six large Dutch population studies:  LifeLines (Groningen), Leiden Longevity Study (Leiden), CODAM Study (Maastricht), Rotterdam Study, Prospective ALS study the Netherlands (Utrecht) and Netherlands Twin Register (Amsterdam). “These six population studies not only had data on genetics and many disease characteristics but also rooms full of freezers filled with DNA and RNA samples isolated from blood. The DNA and RNA was put in big boxes with dry ice and sent to Rotterdam to generate new omics data. The RNA was sequenced to obtain gene expression data and the DNA was put on a chip to get genome-wide data on DNA methylation, an important epigenetic mark. The resulting terabytes of data were sent to SURFsara through an extra fast lightpath connection. The largest task actually was to harmonize, clean and link the multiple data types of all study participants. But now the BIOS omics data infrastructure is there and researchers can apply for access and analyse the data using the fast computers at SURFsara.

Scientific output
The first freeze of data set has been open for researchers since 2014. Already over 20 papers have been published. BIOS, however, is more than just a data set. “Equally important is the leading expertise we built up in The Netherlands and the opportunities the data provide for young scientists to move forward their careers. The great scientific output shows that it works”, Heijmans says. “It saves researchers a lot of time: the data is there, ready to use.  All hurdles have been taken, all you have to do as a scientist is focus on your research questions and scientific creativity”. Next to the accessible data, BIOS also delivers methodological standardization and atlases and reference data with open access results.

Open for researchers
The BIOS data infrastructure is open for all researchers working on omics studies. They can submit a project proposal that is evaluated by the management team of BIOS and by the principal investigators of the population cohort studies. Currently there have been close to 50 applications to use the BIOS data. Not enough, according to Heijmans: “Every now and then I come across researchers that are working with only tens of samples. But for reliable results and answering more interesting question you will need the thousands of samples we offer. I can’t stress it enough: every PhD student, every postdoc who works on omics studies should have a look at this infrastructure!”

Increasing the data sets
Heijmans has big plans for the future. “I hope that in the future we can include more longitudinal studies, with a stronger focus on environment. It would even be better if we can combine this with data from hospitals, that would be a great combination. But before we can make that step we have to solve many technical and privacy issues.”

As a biologist, Heijmans is really excited about all developments within BIOS: “Instead of doing experiments in a lab, understanding how diseases develop in different people by means of calculations with data. That is what I find really intriguing!”.

Newsletter December 2016