Case Study

Synthetic data generation for data sharing with Lifelines

About the client

Lifelines, conducts a multigenerational cohort study since 2006 with over 167,000 participants to collect relevant data and biosamples. This data is related to lifestyle, health, personality, BMI, blood pressure, cognitive abilities, and more. Lifelines offers this valuable data, making it an essential resource for national and international researchers, organizations, policymakers, and other stakeholders that typically focus on preventing, predicting, diagnosing, and treating diseases.

The situation

As a biobank is on a mission to make its data more accessible for researchers, organizations, policymakers, and other stakeholders, having strategic solutions in place to safeguard the privacy of its participants is essential. Hence, Lifelines partners with Syntho to synthesize the data, thereby enhancing its accessibility and preserving the privacy of participants. As an alternative to using real data, everyone has now the possibility to work with synthetic data. Anyone interested in the data is encouraged to reach out for further information and support.

The solution

As for adopting new solutions, Lifelines wanted to evaluate Synthetic Data and Syntho in practice via an initial evaluation study. Here, it approved synthetic data from Syntho on accuracy, privacy, and ease of use in comparison to open-source solutions and commercial solutions. Here, as for the set, geographical location and longitudinal data are crucial. As a sneak preview, we can see the distributions of postal codes of participants for the real data, the synthetic data, and a comparison graph between real data and synthetic data. As the graphs overlap closely, it was concluded by Lifelines that fidelity and accuracy are preserved. As this is only one element as part of this evaluation, other results are available on request.

Syntho lifelines

Researchers, organizations, policymakers, and other stakeholders have now the opportunity to receive synthetic datasets

This successful evaluation of synthetic data generated by Syntho marks a significant step forward for Lifelines in leveraging new solutions to make their data more accessible while preserving the privacy of participants. Hence, Lifelines utilizes now synthetic data to create artificial datasets that mirror the statistical properties of real data without compromising participant privacy. Consequently, researchers, organizations, policymakers, and other stakeholders that have an interest in this data have now the opportunity to receive customized synthetic datasets, generated in collaboration with Syntho. By embracing synthetic data, Lifelines boosts access to data and accelerates research while maintaining the highest level of privacy protection for their participants. This underlines their commitment to both scientific advancement and privacy preservation.

The benefits

Faster access to data

Synthetic data allows for faster access to data by minimizing compliance paperwork and procedures. This enables data users for quicker analysis, faster hypothesis testing, and earlier results, without delays caused by compliance procedures.

Preserve the privacy of participants

By incorporating synthetic data, participant information remains secure, safeguarding their sensitive details effectively. Privacy-enhancing techniques, like synthetic data, improve confidence in participants that their data is protected, encouraging their active participation in research projects. This fosters trust in this biobank as a reliable and trusted resource, further accelerating participant engagement.

Increased accessibility of data

Synthetic data opens new possibilities for sharing information with organizations that might not prefer to access real data or might have access to minimal data. This approach allows for increased data accessibility while mitigating risks associated with sharing actual data.

Preview data before buying with a data catalog

With data commercialization, potential buyers often prefer to preview the data before making a purchase in something like a sandbox environment. However, using real data for previews becomes problematic due to compliance paperwork requirements and the risk of devaluing the data if exchanged beforehand. One could overcome these challenges by employing a synthetic data catalog, allowing prospective buyers to preview data conveniently, thereby enhancing the commercialization process.

Organization: Lifelines

Location: The Netherlands

Industry: HealthCare

Size: 100+ employees

Use case: Analytics

Target data: Healthcare historical data 

Website: On Request

Synthetic Data in Healthcare cover

Save your synthetic data in healthcare report!