Open Humans: A platform for participant-centered research and personal data exploration

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Many aspects of our lives are now digitized and connected to the internet. As a result, individuals are now creating and collecting more personal data than ever before. This offers an unprecedented chance for human-participant research ranging from the social sciences to precision medicine. With this potential wealth of data come practical problems (such as how to merge data streams from various sources), as well as ethical problems (such as how to best balance risks and benefits when enabling personal data sharing by individuals).

Results

To begin to address these problems in real time, we present Open Humans, a community-based platform that enables personal data collections across data streams, giving individuals more personal data access and control of sharing authorizations, and enabling academic research as well as patient-led projects. We showcase data streams that Open Humans combines (e.g. personal genetic data, wearable activity monitors, GPS location records and continuous glucose monitor data), along with use cases of how the data facilitates various projects.

Conclusions

Open Humans highlights how a community-centric ecosystem can be used to aggregate personal data from various sources as well as how these data can be used by academic and citizen scientists through practical, iterative approaches to sharing that strive to balance considerations with participant autonomy, inclusion, and privacy.

Article activity feed

  1. Abstract

    **Reviewer 1. Joon-Ho Yu ** Thank you for the opportunity to review this manuscript. Overall, I appreciate this argument for and description of Open Humans.  Broadly, the manuscript would benefit from greater attention to writing and organization. As my comments describe below, the "ethical analysis" offered is narrowly focused and appears to serve as a justification for the resource; yet, in its current state, I think the ethical analysis either should be removed or expanded. Ideally, the manuscript would be strengthened by a deepening and broadening of ethical considerations. Note that I use P(page)C(column)L(lines) to locate my comments for the authors.

    1. Abstract P1L36-37.  I am struck by the framing of this ethical problem as the responsibility of data subjects.  I assume this is intentional and would appreciate a little more, perhaps in the introduction, as to what is entailed in this responsibility?
    2. Abstract P1L42-43. I am not sure if the framing of the ethical problem is resolved by the description of the utility of Open Humans.  While overall, I suggest deepening the ethical problems presented, another alternative is to leave it out all together.
    3. P2C2L6-9. It would help me if parties were more clearly stated.  I think you mean researchers not research and it isn't clear to me that commercial data sources have interests but rather the companies that hold these resources do, right?
    4. P2C2 Participant Involvement.  It is unclear to me what the purpose of this section is.
    5. P2C1 Data Silos. Most of the descriptive language is written in the passive voice which I understand may be the norm but in my opinion, it unintentionally highlights how interests and responsibilities are dissociated or dis-located from stakeholders.  For instance, in the section on Data Silos, it remains unclear for whom Data Silos are a problem and whose interests have created and maintained these silos.  Again, this sort of analysis might help identify or locate solutions rather than only set up a problem that Open Human's solves.  My point here is that the developers of Open Humans need not rely on a somewhat limited ethical analysis to justify its existence and argue for its utility.
    6. P2C1L44-49. While I agree this is accurate reflection of the scope of literature, the issues raised by "big data" research now extend far beyond the common risks relayed in a consent process.
    7. P2C1L49-51. I agree that this is an important issue but this single statement citing Barbara Evans sounds a little like a strawman.  My sense is that through the efforts of many patient-driven organizations, patient and participant-driven research has increased a great deal in the past decade or so.  Perhaps this ought to be recognized especially given that many of the authors have been critical to the development of this movement.  Also, the next section on participant involvement seems at odds with the argument so some clarification might help readers understand the nuances.
    8. P2C2L53-61.  While I totally agree and appreciate these key points to the participant-centered approach to research, in all honesty, I did not come to these conclusions based on the above exposition.  I suggest moving this up as the scaffold for the introduction and reorganize based on these points.
    9. P3C1L30-36. These are the main points I think readers need in the introduction to help us understand the need for Open Humans.  I suggest you spend more time explaining these points and characterizing the evidence of these important assertions.
    10. P3C2L46-50. Could you explain the rationale behind this feature and briefly describe if more detailed information is conveyed about the IRB approval or review/determination?
    11. P4C2L25-27. This is an important statement, at least to me, but it would be helpful to reiterate how privacy is maintained, I'm assuming because its pseudonymous?
    12. P4C2L27-30. Again, what are the simple requirements?
    13. P5C1L58-C2L59. So what are the ethical implications of this use case?  I think an important point to highlight is that privacy may be a nominal issue with members of efforts like Open Humans as they often have a greater than average interest in research benefits than maintaining individual privacy. Further, I'm under the impression that personal privacy is less of a concern for many or rather our sense of what is private is changing.  Assuming I'm understanding the argument, what I'm confused about is that the ethical analysis presented in the background assumes that privacy is of central perhaps even sole concern.  Also, there are many other ethical issues that open humans both addresses possibly in a positive way and potentially raises as risks to members and even society.  So, I would welcome that analysis alongside this nice introduction to the platform or I would not rest the argument for the platform on a relatively narrow ethical frame.
    14. P6C2L16-21. Do you mean the public data are being used as training sets for the algorithms?  Are there any risks of bias based on these sorts of uses?
    15. P6C1L44-45. So are there any ethical issues related to the application of OAuth2 to these particular use cases or overall?  This isn't a trick question, I have no idea but would encourage the authors to consider based on their expertise.
    16. P7C2L9-11. Agreed, but does it also make it harder for bad actors to use these data?  It would be great if the authors could help us think about this potential trade off.
    17. P7C1 Discussion. I would like the authors to consider the following in the discussion and possibly the introduction. (1) Given that most people who engage in citizen science in the biomedical research space are likely to subscribe to the value of openness and sharing of samples, data, tools, etc., I wonder if focusing on privacy as key ethical barrier is on target and sufficient.  For instance, many of the challenges to genomic research  articulated by historically vulnerable populations have to do with offensive data uses, lack of control, lack of direct benefit, differential benefit based on SES, risks to groups, etc.  Again, a critical analysis of how this resource might increase or decrease such risks involved in citizen science would contribute to the larger project of extending citizen science or patient-led research to community-led research.  Of course, I understand this might been outside the bounds of this manuscript but that preclude some consideration. (2) I very much appreciate Open Humans as a tool that addresses the practical problem of bridging/linking/aggregating.  I have no problems with this argument yet I wonder if it is somewhat naive to assume that bridging as a practical benefit does not also risk other ethical challenges.  For example, the ease of bridging to pre-selected resources blurs the line between simply linking resources and advancing particular interpretations of the data, in fact, one's own data.  If I understand Open Humans, it is a tool that automates protocols for linking and sharing intended to facilitate citizen science and patient-led research.  The practical benefits are clear. But what are the risks associated with more automated linking and sharing?
    18. P7C2 Enabling individual-centric research and citizen science. This section is very helpful and references a number of mechanisms that begin to address, at least on an individual level, issues such as "to what uses", "control", "governance", etc.  I would love to either see this description expanded and moved up into the initial description of the resource (maybe before or around P2C2L57) and or these functional benefits better incorporated and explicated in the use cases.
    19. P8C1L13-16. It is unclear to me how it is "an ethical way" especially as it isn't clear to me what an "unethical way" would entail.   I think some pieces are presented but this argument could be much stronger and clearer.  I get that the benefits are assumed here to some extent, I've been in the same place when engaging in resource development, but perhaps a greater consideration of potential benefits and harms might help balance the focus on privacy and individual control.  Generally when we conduct ethical analysis we consider autonomy (where privacy sits), risks (as potential harms as well as increasingly benefits), and justice.  Notably. others might argue for other principles and values.  While such a comprehensive analysis isn't the focus of this manuscript, incorporating the insights of such an analysis would, in my opinion, strengthen the argument for Open Humans and signal/evidence robust consideration by its designers and authors.
  2. Background

    **Reviewer 2. Birgit Wouters. **

    In this paper, the authors have presented an innovative solution to the complex and multi-faceted problem of sharing personal (health) data. Open Humans, a community-based platform, serves multiple aims: (1) to be ethically justifiable: a. by focusing upon granular, individual consent for each single project, thereby avoiding the issue of compatible purposes for secondary/tertiary/... processing; b. by putting individuals in control of their personal dataset; and c. by involving them in the governance of the ecosystem; (2) to enable both academic and citizen-led research; and (3) to break open existing data silos and allow for the merging of datasets. Serving these aims simultaneously is undoubtedly ambitious. Yet, the authors have demonstrated how Open Humans is designed to do just that. The community-based platform has clearly been carefully designed, and the presentation of the design and the use cases is clear, well-written and easy to follow. Whilst Open Humans is an interesting and promising project, my comments center around the ethical justifiability of this community-based platform. Further clarification and/or elaboration on these comments is strongly recommended. One important goal of Open Humans is for research to be driven by the individuals the data come from by putting them into control of their data. The level of control is described as 'full control'. In addition, putting the participant into control of their data is regarded as important taking into account the more sensitive context of precision medicine. Under "Data Silos", the authors also mention that, next to other legislation, the General Data Protection Regulation is applicable and that the right of data portability has the potential to break open these silos. My main critique is that the article takes into account insufficiently the particularities of the General Data Protection Regulation. WHAT CONSTITUTES CONTROL? Firstly, under the General Data Protection Regulation, the individual has the following rights: right to be informed, right of access, right to rectification, right to be forgotten, right to restriction of processing, right to data portability, the right to object and, albeit less relevant in this context, rights in relation to automated decision-making. Yet, in relation to scientific research, most Member States of the European Union allow for the right of access, the right to rectification, and the right to restriction of processing to be denied. The article very briefly mentions data access, within the context of human subjects research, to be recommended but not legally required. However, it does not make mention of the other two deniable rights (right to rectification + right to restrict processing). It leads to the first main question: what exactly constitutes control? How does Open Humans define control? The article mentions and describes a granular consent and privacy model. However, consent is important, but merely a legal basis for processing. How does Open Humans guarantee the other individual rights as granted by the General Data Protection Regulation? The right to information is shortly described on page 7, and so is the right of data portability, but, if full control is the desirable route, it means guaranteeing all rights granted. However, in the context of reproducibility of scientific research, granting all rights does not seem feasible. In particular, the right of rectification and the right to restrict processing seem problematic. Further clarification/elaboration on this issue is required. Is full control the route Open Humans wants to take, or is Open Humans implementing a limited control for the individual? Apart from granular consent, what other forms of control does Open Humans offer? GRANULAR CONSENT IS DIFFERENT FROM SPECIFIC CONSENT The GDPR requires consent to be freely given, specific, informed and unambiguous (see article 7 and recital 32). Granular consent is needed when one service is involved with multiple processing operations for multiple purposes. In such a case, consent is required for every purpose of processing. This is referred to as granular consent. Whilst closely related, granular consent is therefore different from specific consent. However, in the context of Open Humans, it is doubtful that a situation will arise where one research project will process data for more than one purpose, and thus require granular consent. Research projects work on the basis of a specific research question and/or purpose. RIGHT TO DATA PORTABILITY IS LIMITED TO DATA PROVIDED BY THE INDIVIDUAL The right to data portability is regarded to have the potential to boost the adoption of a system where individuals can recollect and integrate their personal data from different sources, 'as it guarantees individuals in the European Union a right to export their personal data in electronic and other useful formats'. However, Article 20 of the GDPR limits the right to data portability to the personal data that the data subject himself/herself has provided to the controller. Data provided by the data controller do not fall under the scope of the right to data portability. The argument that the right to data portability can lead to the breaking up from the different data silos is therefore less convincing.