This post is the second in a series introducing "just enough" BIDS; please start with the first. As explained there (and in this talk), organizing a dataset into six top-level directories following the BIDS standard can simplify and improve its sharing, storage, and management.
When introducing each of the six directories I repeatedly explained their contents in general terms; that each should have specific types of information, but not that the files need to be organized and named in a particular way ... except for rawdata. As I put it in that first post,
/rawdata/ is for data files that have been rearranged, renamed, and probably converted to a different format than they were collected (and stored in sourcedata) in, but not fundamentally transformed. I suggest that the rearranging and renaming in rawdata be to follow the BIDS standard more closely, as will be explainedin a later posthere (and the recorded talk).
For "just enough" BIDS, I suggest starting by using the standard's file naming and organization scheme (i.e., subdirectory structure) in rawdata, then start adding key .json files (or their equivalent), and finally consider converting all files to the specified types (e.g., .tsv instead of .csv). I'll explain each of these pieces in turn. (Warning: if you'll be running fMRIPrep or another BIDS App on the rawdata it's not safe to disregard any part of the BIDS standard!)
The screen captures in this post are from various datasets, but always of rawdata. I hope that as you read the explanations you will start to recognize the datasets' consistent structure and logic, and so be able to more easily find the information you're looking for in any BIDS dataset.
BIDS file naming
BIDS file names are usually (very) long, but systematic and interpretable. For example, from the names in the screen capture below we know that these files contain behavioral (_beh) data for participant (sub-) f1027ao's (_ses-) wave1bas lab visit, during which they did the Axcpt, Cuedts, Stern, and Stroop (_task-) twice each (_run-).
The documentation goes into a lot of detail on the abbreviations, keys, ordering, etc. to use in BIDS filenames. I suggest following its entity definitions as closely as possible (and throughout the experiment); consistent use of even just the sub-, _ses-, _task-, and the data type suffix (here, _beh) make it much easier to understand (and navigate) the contents of a dataset. People have all sorts of naming conventions, but I think the clarity and ease of communication of a universal standard outweigh most other considerations (e.g., "visit" instead of "session", aesthetic dislike of long file names, capitalization preferences).
For "just enough" BIDS I recommend closely following the file naming definitions, but relaxing some aspects when reasonable. For example, imagine an online behavioral study in which data was collected from hundreds of individuals in a single day. If the experimenters downloaded that data as a large spreadsheet, reformatting it into many hundreds of separate files (one per person per task, named sub-1_task-A_beh.tsv, sub-1_task-B_beh.tsv, sub-2_task-A_beh.tsv, ....) may seem ridiculous and a waste of time. If so, I'd recommend creating two large files instead: sub-all_task-A_beh.tsv and sub-all_task-B_beh.tsv. sub-all is not valid BIDS (sub- should refer to a single participant), but in my opinion does follow the logic and spirit of BIDS, makes the dataset easier to use and understand, and is far preferable to arbitrary file names and formats (e.g., download.xlsx).
BIDS subdirectories
The BIDS standard is for each person to have their own rawdata subdirectory (named sub- then the subject ID), which in turn has subdirectories for each session (ses-) and/or data type. In the example above, we can see from the subdirectories that participant f1027ao (sub-) came to the lab three times, completing the wave1bas, wave1pro, and wave1rea sessions (ses-), while f1342ku (below) has three additional (wave2) sessions.
Sessions often correspond to separate lab visits, but not always (as put in the BIDS standard, a "session" is "A logical grouping of neuroimaging and behavioral data consistent across subjects."). If each participant only interacted with the study once (like in the online study example), the study doesn't have sessions, and so participants don't need ses- subdirectories.
In the MLX study (below right) participants came to the lab twice for data collection: the "pre" and "post" sessions. But there is a third, "ses-ema", subdirectory for participant 8, because participants were asked to complete surveys on some days between the pre and post sessions. The survey responses weren't collected in the lab nor on a single day, but they are a consistent type of behavioral data collected during a specific time frame, so we decided it would be clearest to group them into a single "session".
Deciding how to group the data can be difficult; there's not always a single correct answer. For example, suppose that the MLX researchers designed the surveys not as a unique part of the overall experiment, but instead as a continuation of the pre session (e.g., the surveys were training to reinforce what participants learned during the first session). In such a study it could be more logical to store the survey files in ses-pre and not have a separate ses-ema subdirectory; the experiment design and planned analyses inform how the dataset should be structured.participants.tsv & participants.json
musings
* A valid BIDS dataset would require (at minimum) two files I didn't discuss above, rawdata/dataset_description.json and rawdata/README. These are important when sharing datasets and I have no objection to them in principle, but we've found a combined description document (.docx, .odt) in the root easier to maintain and more likely for collaborators to read.
below the jump, plain text directory tree versions of the images




