Our university IT folks encourage employees to use their box account for data storage, including of sensitive (human subjects research data, medical records, etc.) files. I wasn't pleased to see the Box AI button appear, and asked our IT what exactly it does, and how it impacts file privacy.
We went through several rounds of messages, including these responses: "Yes, the HIPAA protections are still in place with the BOX-AI application. Box AI securely engages an external AI provider to compute embeddings from these text chunks and the user’s question. Advanced embeddings models, such as Azure OpenAI’s ada-02, are utilized for this purpose." and "Box does not allow any access to data to outside vendors other than the isolated environment used to process the data. No retained data is being allowed. The data is processed in a bubble, then the bubble is destroyed when completed essentially."Monday, March 10, 2025
"fun" with Box AI information leakage
Thursday, June 8, 2023
US researchers: use extra caution with participant gender-related info
Last summer I wrote several times about pregnancy-related data suddenly becoming much more sensitive and potentially damaging to the participant if released. Unfortunately, now we must add transgender status, biological sex at birth (required, e.g., for GUID creation), gender identity, relationship status, and more to the list of data requiring extra care.
The NIH Certificate of Confidentiality thankfully means that researchers can't be required to release data on something like someone's abortion or transgender history for criminal or other proceedings. We are responsible for ensuring that our data is handled and stored securely, so that it isn't accidentally (or purposely) shared and then possibly cause the participant harm. I suggest researchers review their data collection with an eye towards information that may be more sensitive now than it was in the past; is this information required? If so, consider how to store and use it securely and safely.
Also consider what you ask potential participants during screening (and how screening is done in general): people may not wish to answer screening questions about things like pregnancy or hormone treatments. If these possibly-sensitive questions must be asked, consider how to do so while minimizing potential discomfort or risk.
For example, one of our studies can't include pregnant people, so we must ask about pregnancy during the initial phone screenings. We used to ask potential participants about pregnancy separately, but then changed the script so that this sensitive question was in a list, and the participant is asked if any apply. This way, the participant doesn't have to state explicitly that they are pregnant, and the researcher doesn't have any specific notes, or even respond in a specific way (e.g., we don't want them to say something like "sorry Jane Doe, but you can't be in our study now because you're pregnant").
Here's the relevant part of the new screening script:
In order to determine your eligibility for our study, I need to ask you some questions. This will take about 15 minutes.Before we collect your demographic information, I will ready you a list of four exclusionary criteria. If any of these describe you, please answer yes after I read them all; if none apply, please answer no. You do not need to say which ones apply.
- You are a non-native English speaker (you learned to speak English as an adult);
- You are over the age of 45 or under the age of 18;
- You are pregnant or breastfeeding;
- You were born prematurely (before 37 weeks, or if twin, before 34 weeks)
Do any of these describe you? Yes (I am sorry, you do not qualify to be in our study) or No (continue with questions)
Friday, September 9, 2022
Update: US researchers CAN guarantee privacy post-Dobbs
The two previous posts described my concerns about the NIH Certificate of Confidentiality exceptions post-Dobbs; the vagueness of the "federal, state, or local laws" "limited circumstances" formulation is troubling, since it seems that it could apply to something like a state-level prosecution for pregnancy termination.
I am happy to relay that the "federal, state, or local laws" exemption is clarified in the "When can Information or Biospecimens Protected by a Certificate of Confidentiality be Disclosed?" section of the What is a Certificate of Confidentiality? | grants.nih.gov site:
[update 7 June 2023: the NIH website has changed a bit, but the key text below is still present, now under section 4.1.4 Confidentiality > 4.1.4.1 Certificates of Confidentiality]
"Disclosure is permitted only when:
- Required by Federal, State, or local laws (e.g., as required by the Federal Food, Drug, and Cosmetic Act, or state laws requiring the reporting of communicable diseases to State and local health departments), excluding instances of disclosure in any Federal, State, or local civil, criminal, administrative, legislative, or other proceeding; [emphasis mine]
- Necessary for the medical treatment of the individual to whom the information, document, or biospecimen pertains and made with the consent of such individual;
- Made with the consent of the individual to whom the information, document, or biospecimen pertains; or
- Made for the purposes of other scientific research that is in compliance with applicable Federal regulations governing the protection of human subjects in research."
The highlighted clause is the key clarification: the "federal, state, or local laws" exemption would not apply to something like a state-level prosecution for pregnancy termination, because that would be a criminal proceeding. And our data isn't only protected from criminal proceedings, but from civil, administrative, legislative, and others as well.
I am relieved by this exclusion, and encourage all universities and groups covered by the Certificate to include it, not only the "*Disclosure of identifiable, sensitive information (i.e., information, physical documents, or biospecimens) protected by a Certificate of Confidentiality must be done when such disclosure is required by other applicable Federal, State, or local laws." formulation.
While I am relieved by this exclusion and find it sufficient guarantee that our participants' data is protected from disclosure, we will continue to minimize the amount of pregnancy-related information we collect, and use indirect phrasing in our screening questions whenever possible. Privacy and sensitivity are always important, but are especially critical now in the United States and when reproduction is involved.
UPDATE 16 September 2022: Many universities already use the longer (with the exclusion) explanation on their HRPO websites when describing the Certificate of Confidentiality protections. A google search for "excluding instances of disclosure in any Federal, State" found many, including Michigan State University, the University of Pittsburgh, the University of Washington, Virginia Commonwealth University, and North Dakota State University. Hopefully these examples can serve as templates for other institutions.
The NIH's example consent language also includes it: "This research is covered by a Certificate of Confidentiality from the National Institutes of Health. This means that the researchers cannot release or use information, documents, or samples that may identify you in any action or suit unless you say it is okay. They also cannot provide them as evidence unless you have agreed. This protection includes federal, state, or local civil, criminal, administrative, legislative, or other proceedings. An example would be a court subpoena."
UPDATE 28 September 2022: Washington University in St. Louis changed the Certificate of Confidentiality description to include the exclusion.
Friday, August 5, 2022
Tracking US universities' post-Dobbs research privacy guarantees
UPDATE 9 September 2022: Good news! The "federal, state, or local laws" exemption is clarified in the "When can Information or Biospecimens Protected by a Certificate of Confidentiality be Disclosed?" section of the What is a Certificate of Confidentiality? | grants.nih.gov site.
This post is now less relevant, so I put it below the "jump".
Friday, July 15, 2022
research in the United States after the fall of Roe v. Wade
UPDATE 9 September 2022: Good news! The "federal, state, or local laws" exemption is clarified in the "When can Information or Biospecimens Protected by a Certificate of Confidentiality be Disclosed?" section of the What is a Certificate of Confidentiality? | grants.nih.gov site.
The NIH Certificate of Confidentiality is sufficient to protect researchers from being forced to release data if one of our participants is charged with abortion, which is great news. However, there are still many ethical concerns about collecting sensitive data unnecessarily, and I believe it is prudent to be extra aware of how pregnancy-related questions are being asked (e.g., in a phone screen), and minimize direct questions whenever possible.
Previous post:
This post is an essay-style, expanded version of messages I’ve posted on twitter (@JosetAEtzel) the last few weeks, responding to the Dobbs v. Jackson Women's Health Organization decision overturning Roe v. Wade in the United States, and Missouri’s subsequent trigger law outlawing abortion except in dire emergency. I hoped these issues would rapidly become outdated, but unfortunately that is not the case; if anything they are compounding, and I very much fear no end is in sight. I am not willing to be silent on the topic of protecting participants, or university ethics more generally.
I am a staff scientist at Washington University in St. Louis, Missouri, USA, and have been here twelve years now. It’s been a good place to do research, and I have great colleagues. I work with data collected on humans, mostly task fMRI. I generally spend my time at work on analysis and hunting for missing or weird images in our datasets, but the last few weeks I’ve spent substantial amount of time hunting for pregnancy-related information in our procedures and datasets, and seeking answers to how the legal changes affect us and our participants.
Our fMRI consenting protocols require the use of screening forms that ask if currently pregnant; high-risk studies (PET-MR) require a pregnancy test be performed immediately before the scan. These signed and dated screening forms are retained indefinitely by the imaging center at the hospital and/or our lab. Imaging studies routinely include pregnancy questions in the phone screening to determine eligibility.
An additional source of pregnancy information in our datasets is via studies using passive sensing data collection (e.g., via an app installed on participants’ phones). These can include GPS and other forms of tracking, which could e.g., show whether the participant spent time at a place where abortions are possible or searched for abortion information. Previous data breaches have happened with this type of research software, and the collection of any GPS or other tracking information raises serious privacy concerns, but my focus here is the security of this data after it is in the researchers’ hands.
We need guarantees that we will never be asked to release this data, even in the (appalling but not totally unprecedented) case that someone is charged with abortion and we are asked by a court to disclose whether the participant said that yes, they were pregnant on a particular date.
NIH Certificates of Confidentiality protect participants’ information from disclosure, but have exceptions in “limited circumstances”. “*Disclosure of identifiable, sensitive information (i.e., information, physical documents, or biospecimens) protected by a Certificate of Confidentiality must be done when such disclosure is required by other applicable Federal, State, or local laws.” At Washington University in St. Louis (as of 11 July 2022) we are being told it might not be sufficient to rely upon the Certificate of Confidentiality; that it is not "bulletproof" for state-level abortion-related lawsuits. University counsel here is still investigating, as I assume are those elsewhere.
I have been hoping that Washington University in St. Louis and other research universities would promise to protect participant (and patient) pregnancy-related information; announcing that they would fight attempts to force disclosure in any abortion-related lawsuits. So far, this has not occurred. Universities often have strong law departments and a pronounced influence on their communities, both as large employers and venerable, respected institutions. Ethics-based statements that some laws will not be complied with could have an outsized influence, and serve as a brake on those pushing enforcement and passing of ever more extreme abortion-related laws.
Since we currently lack pregnancy-related data confidentiality guarantees,
in our group we have begun efforts to lessen the chances of our participants
incurring extra risks from being in our studies – or even from being *asked* to
be in our studies. Reducing our collection of potentially sensitive information
to the absolute minimum is one step: even if subpoenaed or otherwise requested,
we will not have potentially harmful records to disclose. Concretely, we have
submitted changed our screening procedures, such that the participant is asked if any of a group of several exclusion criteria apply, only one of
which is pregnancy (rather than asking about pregnancy separately). The
participant then does not have to verbally state that they are pregnant, nor
the experimenter note which of the exclusion criteria was met.
Participants will still need to complete the screening form immediately before scanning, but presumably anyone that reaches this stage will respond that they are not pregnant; if they are pregnant, the scan is cancelled and the screening form destroyed. This procedure reduces risk if we assume that recording “no, not pregnant” on a particular date has less potential legal trouble for the participant than a “yes, pregnant” response, which hopefully is the case. However, it is not unimaginable that an abortion lawsuit could have proof from elsewhere that the participant was pregnant on a particular date before the experiment, in which case their statement (or test result, in the case of studies requiring one) of not being pregnant on the experiment date could be relevant and damaging. At this time we can’t avoid using the forms with the pregnancy questions, but may start warning participants in advance that they will have to respond to a pregnancy question, and that we can’t guarantee their response will be kept private and only used in the context of the experiment.
The impact of the Dobbs decision (and in our case, Missouri state abortion trigger laws) on non-reproduction-related human subjects research is only a small subset of the harm from these laws, of course, but it is a new risk US-based researchers should consider. Human subjects protections are not trivial and must not be brushed aside, even if we hope no more abortion-related legal actions will occur. As scientists, our ethics, honor, and integrity require us to follow not just the letter but also the spirit of guidelines like the Declaration of Helsinki; we must work towards the absolutely best and strongest participant protections.
I hope that this essay has caused you to consider what data you are collecting, whether it puts your participants at new legal risk, and what you can do to minimize such risk in the short and long terms. Immediate actions such as changing how pregnancy is asked about or stopping collection of especially sensitive information seem to me the minimum ethically appropriate action; stronger, legally-binding guarantees of confidentiality may be needed soon for many types of human subjects research to continue responsibly in the United States.
UPDATE 3 August 2022: Our screening changes were approved, so I edited the relevant text and added a link to the tweet showing the approved version.
Yesterday I tweeted that a source I trust (and in a position to know) told me that Washington University in St. Louis counsel/administration told them that pregnancy-related research records will not be released in an abortion-related lawsuit, regardless of whether the NIH Certificate of Confidentiality is sufficient protection. That is good news, but I am troubled that it came via word of mouth; my source said I shouldn’t “hold out” for an official statement. It is hard to be confident without something concrete; even a technically-phrased memo or HRPO website note would be encouraging. It seems that we are being asked to act as if nothing has changed post-Dobbs, and trust that everything will be fine, but that's an awfully big ask for issues this consequential.
UPDATE 17 August 2022: Last month Jeannie Baumann wrote an article at Bloomberg Law discussing questions about Certificates of Confidentiality protections, including, "It’s unclear how the state law versus certificates would play out because it hasn’t been tested in court."
UPDATE 26 August 2022: Tamara J. Sussman and David Pagliaccio published Pregnancy testing before MRI for neuroimaging research: Balancing risks to fetuses with risks to youth and adult participants