The New England Journal of Medicine (herein NEJM) jumped into some hot water in January over an editorial published on data sharing. The authors of the piece are concerned that secondary uses of data may be misused or misinterpreted for applications beyond the original intent of the data collection, and perhaps more pertinently, that a ‘new class’ of researcher will emerge who is not at the coal face of data collection;
people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”
As anyone with a competent understanding of human communication might observe, many people took umbrage with that last word, with #researchparasites becoming a reasonably notable trend on twitter, and garnering its own amusing account. The editorial was deeply flawed, not only for wasting a paragraph pointing out research methods: secondary data 101 in its opening paragraph, but also for its pleasant, but somewhat wishful conclusion. Of course researchers should acknowledge the origin of their data, but always proposing a collaboration with those who originally gathered it is one thing to say, and quite another undertaking to execute. Lastly, as the @dataparasite twitter account points out, where exactly does a researcher claim ownership and rights to data from (particularly in medicine) non-privately funded research?
What is particularly interesting about this article and the debate it ignited (or soured, depending on how you look at it) is that it seems blind to the increasing interdisciplinary trend in data science – of which medicine is an obvious part – for fresh eyes to find new ways to understand a problem. This is exemplified by the winners of Kaggle’s data bowl competition, Tencia Lee and Qi Liu, who created a neural network algorithm to identify heart disease from MRI scans. Lee and Qi’s backgrounds are in the financial industry as ‘quants’, a name as dated as the braces you still spot on certain members of the city who did not grasp that Gordon Gekko was not a heroic figure. I cannot think of a field further away from medicine than finance, but had this been outside the context of a Kaggle competition, people like Lee and Qi may be in the line of fire as ‘data parasites’. Any application of their algorithm is still a few years of strenuous testing away from deployment in healthcare, but reflect a more general trend in the field for algorithmic pattern recognition of pathologies, such as cancers, brain disease or even improving the scanners themselves. It has already been well stated that the NEJM’s article was not a step in the right direction, to the extent of retraction, but perhaps we are better off emphasising this interdisciplinary approach itself, instead of trying to rebuild, or reinforce, ivory towers.