Potential fallout from the recent UK Biobank data exposure – and similar cases – highlights that governance cannot be treated as an administrative add on. It is foundational infrastructure on which the Health Data Research Service (HDRS) must build.
A few weeks ago, UK Biobank was at the centre of controversy following reports that its volunteers’ data had been made available for sale on the Chinese retail platform Alibaba. Although UK Biobank stated that the data were not provided by the organisation itself and that action was taken once the listings were identified, the incident further complicated an already fragile public narrative. In response, all access to their research platform has been suspended, while additional security measures, including a restriction on the size of files researchers can export, are put in place.
These concerns built on issues raised just over a month earlier, when a report from the Guardian revealed that confidential health data from UK Biobank, one of the world’s most celebrated biomedical research resources, has been repeatedly exposed online after being inadvertently uploaded by approved researchers. While no evidence of widespread re-identification has been demonstrated, the episode has reignited concerns about the security of health data once it leaves controlled environments and about the adequacy of governance mechanisms designed to protect public trust.
For many in the health data community these incidents are not entirely surprising, because they reflect well‑known risks inherent in the de-identification and sharing approach to data access. As critics have long argued, once data leaves its original controlled environment, there is little effective oversight, because copies can be duplicated, combined, and reused beyond the reach of technical controls, real‑time monitoring, or meaningful enforcement.
For the public, however, they are yet more examples in which health data provided altruistically for public benefit appears to have been mishandled. As the UK prepares to operationalise the Health Data Research Service, the question is not whether incidents like this matter but whether their lessons are being taken seriously enough.
A public informed through controversy
The UK public is well-engaged and informed when it comes to health data use, and are generally supportive of it. Yet much of this familiarity has been shaped not by proactive communication or consistent transparency but by high profile controversies surrounding government data initiatives, such as care.data or the GP Data for Planning and Research (GPDPR), which came to be widely referred to as a “data grab”.
Each case is distinct yet together they have had a cumulative effect. They reinforce a perception that ambitious health data initiatives are launched before governance arrangements are adequately tested or clearly explained. By the time weaknesses emerge considerable public trust and institutional credibility may already have been lost. In this context, assurances that no breach occurred or that data were de-identified do little to address the underlying concern. Trust cannot be built retrospectively nor maintained through technical explanations delivered only after problems arise.
What actually happened
Neither of the recent UK Biobank data exposures were caused by a cyber attack or by malicious intent. The exposure arose because approved researchers are permitted to download datasets to local system. A small number of these researchers unintentionally included portions of those datasets when sharing analytical code online.
The subsequent Alibaba incident appears to reflect a different but related vulnerability. Once data have been shared, opportunities for redistribution increase significantly even where terms of use prohibit this. These episodes illustrate how quickly oversight can weaken once data move outside controlled environments.
Together they highlight a structural vulnerability rather than an isolated failure. When sensitive health data are allowed to move beyond secure processing environments, even with safeguards, control is partially relinquished. This also exposes the limits of relying on de-identification as a primary safeguard at a time when data linkage techniques and external information sources are rapidly expanding.
Why this matters for the Health Data Research Service (HDRS)
The Health Data Research Service (HDRS) has been positioned as a secure streamlined route to data access where analysis takes place within trusted research environments. The £500 million commitment by the UK government, alongside a £100 million pledge from Wellcome, signals the priority being placed on health data as national infrastructure and reflects the high expectations attached to the success of the service.
But success will not be determined by technical capability alone. The HDRS will be judged on whether it can convincingly demonstrate that it will avoid repeating known failures. This requires engaging directly with incidents like the UK Biobank exposure rather than treating them as historical anomalies. They should be approached as real world stress tests that reveal where governance systems assumptions and incentives may be misaligned.
Importantly, claims that data are pseudonymised or that re-identification is unlikely are increasingly insufficient as a basis for reassurance. Public awareness of data linkage capabilities continues to grow, including the ability to combine health records with other datasets such as genomic data, administrative records, or commercially available information to infer identities or sensitive attributes. Confidence therefore depends on governance that explicitly acknowledges and addresses this reality.
Additionally, as public awareness of artificial intelligence (AI) grows, the HDRS needs a clear and credible strategy for governing AI development within secure environments. This includes addressing how models are trained, audited and evaluated and how risks such as memorisation are mitigated in practice. Simply asserting that analysis occurs in a secure setting is not enough. The limits of what those environments can safely support must be openly communicated.
Governance as infrastructure
Perhaps the most important lesson from these episodes is that governance cannot be treated as an administrative add on. It is foundational infrastructure.
In other jurisdictions this has been made explicit. The European Health Data Space (EHDS), the EU’s recently adopted regulation governing the primary and secondary use of health data in the EU, for example, mandates safeguards for the secondary use of health data including the routine use of secure processing environments to access anonymised data wherever possible and justification when more identifiable data are required. These are applied as system-wide standards rather than optional best practice.
The HDRS presents an opportunity to establish similar expectations across the UK health data ecosystem. Not simply as a new service but as a mechanism for setting consistent governance requirements across data holders. Additionally, as the use of emerging technologies such as AI give rise to novel threats, those requirements will need to be actively maintained, reviewed, and adapted rather than treated as a one‑off exercise. Without this, the risk is that fragmentation will continue to persist and that weaknesses will re-emerge elsewhere in the system.
Too much to lose
Too much has been invested in the HDRS financially, politically and too much is now expected of it by the research community, for it to be undermined by avoidable governance failures. But the consequences of failure are not only institutional. They also represent a failure of duty to the individuals whose data the system is entrusted to protect. Getting this wrong would shape public attitudes to health data use for years to come, affecting research service planning and innovation across the system.
Trust does not depend on the absence of risk. It depends on honesty about where risks exist, clarity about how they are mitigated and accountability when safeguards fail. If the HDRS is to succeed it must confront uncomfortable precedents openly and use them to inform stronger system design.
Leave a comment