-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Allow participants.tsv to contain a superset of subject directories and subjects listed in phenotype files #2044
Conversation
The participants schema description now contains the comprehensive superset rule from bids-standard#914.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2044 +/- ##
=======================================
Coverage 82.44% 82.44%
=======================================
Files 17 17
Lines 1504 1504
=======================================
Hits 1240 1240
Misses 264 264 ☔ View full report in Codecov by Sentry. |
Committing the good suggestion. Co-authored-by: Chris Markiewicz <[email protected]>
Yes, that looks like it satisfies our need. Thanks for the suggestion @effigies! |
@rwblair We pre-load all phenotype files at the beginning of the run in order to populate RuleName:
selectors:
- datatype == 'phenotype'
- extension == '.tsv'
checks:
- |
allequal(
sorted(intersects(dataset.subjects.participant_id, columns.participant_id)),
sorted(columns.participant_id)
) I'm curious which one would be more inefficient:
It would also be worth considering which one could be optimized under the hood. While it is simplest if the context continues to be serializable to a JSON object, we could consider set-like structures that make it more efficient to run |
me posting above overlapped with @effigies actually providing "howto" ;) |
I crossed my prior note, but reflecting on the rule by @effigies above, do we already provide top level directory ATM no rule mentions it as a datatype, here is the list/counts❯ git grep -h 'datatype ==' | sed -e 's,^ *,,g' | sort | uniq -c | sort -n
1 - datatype == 'fmap'
2 - datatype == "beh"
2 - datatype == "dwi"
2 - datatype == "mrs"
3 - datatype == "anat"
6 - datatype == "motion"
7 - datatype == "micr"
9 - datatype == "fmap"
9 - datatype == "func"
13 - datatype == "ieeg"
17 - datatype == "eeg"
18 - datatype == "pet"
20 - datatype == "perf"
21 - datatype == "meg"
24 - datatype == "nirs" Would we similarly define |
I did pragmatically use it as a datatype in #1672. I don't think there's a call to make stimuli that, as long as there is no constraint on the contents of the stimuli directory. My understanding was your preference was to classify stimuli as a new dataset type and validate its contents separately? |
@ericearl I took a quick pass at updating the schema. Would you mind putting together a small example for bids-examples? Maybe one with |
@effigies I made our draft PR ready for review over on bids-examples at bids-standard/bids-examples#465. You'll want pheno004 for the example you're asking for. |
@effigies What else needs to happen next to finish off this PR? I know there's got to be the two reviews that aren't done by you or I. |
We need to get the examples validating. |
@effigies All 4 or just |
I guess just 004 for this, but if the others aren't going to be fixed, it probably makes sense to pull out into its own PR. |
This comment was marked as off-topic.
This comment was marked as off-topic.
I added just the |
2 independent reviews and more than a week since substantive changes. Merging. |
The
participants
description insrc/schema/objects/files.yaml
now contains the comprehensive superset rule from #914. This change allows phenotype-only participant_ids (participants not present in thesub-XX
folders) to be included in the participants.tsv file. @effigies I believe has a plan to integrate this change into the next BIDS release for the BIDS schema validator.