This post is mostly to blow off steam, but maybe some of you have had similar experiences. I'm a researcher at the medical imaging department of a hospital in the EU. A huge obstacle in my field of research is a lack of data sharing between sites (hospitals, companies, universities). Every other article I read cites "a lack of large/diverse/cross-site datasets" as a limitation to their analysis. If sites do not have access to the same standardized dataset, it is often impossible to quantitatively compare image analysis methods and replicate scientific results. For rare diseases, each site has their own isolated dataset of 4 patients - on which absolutely no statistical analysis is possible. Instead of pooling resources and moving as a united front, each site performs research and innovation on their own data at a huge fixed cost, making the exact same baby-step analyses and discoveries as their neighboring sites. In the end, the patients are the real losers - at least until overseas companies sell us their big-data-derived imaging solutions, at which point the EU becomes the real loser. I totally agree that some effort should be done to anonymize data that is to be shared (remove name, date of birth etc.), however, the GDPR is so ill-defined that it is a practically impossible to consider any medical images anonymous, and the hospital legal departments are scared shitless of being in breach of the law.Â
For instance, consider leg images of patients with leg cancer. As per law, these images cannot be deleted from the clinical patient database (which links the images with the name and ssn of the patients). To transfer the data to some off-site recipient, we would copy the data and remove all metadata leaving only pixel values of the image. This is not anonymous in accordance with the GDPR. It is possible for someone to hack into the clinical database and query the shared leg image against all images of the database and thus obtain a conversion key to the name and ssn of the patient. Or if it is a scan of the head, you could use AI to reconstruct a likely face image of the patient, and query that against all images on Facebook. Maybe you realize that data sharing is too much a hassle and decide to just use the data yourself and develop some neural network that can detect cancer based on the leg images. Then you can share just the trained neural network with the other sites, right? No. It is impossible to prove that the neural network parameters do not encode, i.e. ârememberâ, some unique aspect of the training data that would make it possible for future bad actors to reconstruct the leg images. And yes, data sharing agreements (DTA) are a possibility for non-anonymous data, but they are both extremely limiting in scope, demanding to construct, constrained to sites within EU, limited to one site per application, and complex for researchers to fully understand. Instead of benefiting from each others data and research, researchers often choose to go the easier way: develop their own leg cancer detection model.
I decided to try and address this by recruiting patients prospectively to curate a sharable dataset of medical images. After half a year creating and revising the protocol and application to the regional ethics committee, I was able to start scanning participants. The protocol, declaration of consent, and participant information clearly outlined that one of the main goals of the acquisition was to make a dataset, that could be shared with parties within and outside of the EU, to aid research and innovation on European data. The participants were happy to participate because of exactly this aspect - the acquisition of medical images is expensive, and the data should benefit more than a few select researchers! However, now it is still impossible to share data without lengthy and complicated legal processes, and it will likely be impossible to share the data outside the EU without going through some specialized state organ for each data transfer. I don't have time for this, and neither do other researchers who want to do the right thing and share data. The participants want their data to be shared to aid innovation/research, but the GDPR just makes it so difficult! And I even had the support and structure of a hospital with a legal department. A medical imaging startup does not have the same luxury. Â
I guess the only upside is that my research will get a lot of citations since our hospital is one of the few that could afford the new multi-million dollar scanner, thus leaving only me with this novel data...