View all posts

Synthetic data generation for data sharing with Lifelines

Details
Organization

Organization

Lifelines

Location

Location

The Netherlands

Industry

Industry

Healthcare

Size

Size

100+ employees

Use case

Use case

Analytics

Target data

Target data

Healthcare historical data

About the client

Lifelines, conducts a multigenerational cohort study since 2006 with over 167,000 participants to collect relevant data and biosamples. This data is related to lifestyle, health, personality, BMI, blood pressure, cognitive abilities, and more. Lifelines offers this valuable data, making it an essential resource for national and international researchers, organizations, policymakers, and other stakeholders that typically focus on preventing, predicting, diagnosing, and treating diseases.

The situation

As a biobank is on a mission to make its data more accessible for researchers, organizations, policymakers, and other stakeholders, having strategic solutions in place to safeguard the privacy of its participants is essential. Hence, Lifelines partners with Syntho to synthesize the data, thereby enhancing its accessibility and preserving the privacy of participants. As an alternative to using real data, everyone has now the possibility to work with synthetic data. Anyone interested in the data is encouraged to reach out for further information and support.

The solution

As for adopting new solutions, Lifelines wanted to evaluate Synthetic Data and Syntho in practice via an initial evaluation study. Here, it approved synthetic data from Syntho on accuracy, privacy, and ease of use in comparison to open-source solutions and commercial solutions. Here, as for the set, geographical location and longitudinal data are crucial. As a sneak preview, we can see the distributions of postal codes of participants for the real data, the synthetic data, and a comparison graph between real data and synthetic data. As the graphs overlap closely, it was concluded by Lifelines that fidelity and accuracy are preserved. As this is only one element as part of this evaluation, other results are available on request.

Researchers, organizations, policymakers, and other stakeholders have now the opportunity to receive synthetic datasets

This successful evaluation of synthetic data generated by Syntho marks a significant step forward for Lifelines in leveraging new solutions to make their data more accessible while preserving the privacy of participants. Hence, Lifelines utilizes now synthetic data to create artificial datasets that mirror the statistical properties of real data without compromising participant privacy. Consequently, researchers, organizations, policymakers, and other stakeholders that have an interest in this data have now the opportunity to receive customized synthetic datasets, generated in collaboration with Syntho. By embracing synthetic data, Lifelines boosts access to data and accelerates research while maintaining the highest level of privacy protection for their participants. This underlines their commitment to both scientific advancement and privacy preservation.

The benefits

Faster access to data

Synthetic data allows for faster access to data by minimizing compliance paperwork and procedures. This enables data users for quicker analysis, faster hypothesis testing, and earlier results, without delays caused by compliance procedures.

Preserve the privacy of participants

By incorporating synthetic data, participant information remains secure, safeguarding their sensitive details effectively. Privacy-enhancing techniques, like synthetic data, improve confidence in participants that their data is protected, encouraging their active participation in research projects. This fosters trust in this biobank as a reliable and trusted resource, further accelerating participant engagement.

Increased accessibility of data

Synthetic data opens new possibilities for sharing information with organizations that might not prefer to access real data or might have access to minimal data. This approach allows for increased data accessibility while mitigating risks associated with sharing actual data.

Preview data before buying with a data catalog

With data commercialization, potential buyers often prefer to preview the data before making a purchase in something like a sandbox environment. However, using real data for previews becomes problematic due to compliance paperwork requirements and the risk of devaluing the data if exchanged beforehand. One could overcome these challenges by employing a synthetic data catalog, allowing prospective buyers to preview data conveniently, thereby enhancing the commercialization process.

Explore more case studies

Mimic (sensitive) data with AI to generate synthetic data twins

Save your synthetic data guide now

What is synthetic data?

How does it work?

Why do organizations use it?

How to start?

Privacy Policy

Join our newsletter

Keep up to date with synthetic data news