-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch edit for multiple trees #6196
base: production
Are you sure you want to change the base?
Conversation
The query builder currently only shows rank names to filter on even when you have multiple trees. If you have multiple trees with the same rank name and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE #6196 (comment).
Yeah this is the correct idea! But, shouldn't be needing the params at all. I'm not sure why it was included in the initial query builder by Ben.
RE: #6196 (comment). Yeah, that's what I was stuck on before and was a hard problem. Actually, there are atleast two ways of approaching it: First way:Allow query builder to support specific trees (related: #5251). This would involve a bit of frontend change, although, I've no memory / idea of how hard that will be, best person might actually be Jason. On the backend (assuming treedef is available in queryfieldspec) a. We still need to make the worst case assumption that the user selects a rank from any tree (so we still need to take the maximum of treedefitems). This part is already done. See specify7/specifyweb/stored_queries/query_construct.py Lines 48 to 52 in 7e478b9
b. Need to restrict this to just one tree. specify7/specifyweb/stored_queries/query_construct.py Lines 68 to 71 in 7e478b9
Simplest way will be to add a treeid member field in TreeRankQuery when parsing. Somewhere here: specify7/specifyweb/stored_queries/queryfieldspec.py Lines 189 to 194 in 7e478b9
c. Now, when say a new queryfieldspec with a treeid is encountered, the case-when block will select the correct case where the treedefitem_id is encountered. In the current implementation if say ranks from treedefs 5 and 6 are searched, then both are searched. This is because of the following two places: specify7/specifyweb/stored_queries/query_construct.py Lines 92 to 94 in befd6a6
specify7/specifyweb/stored_queries/query_construct.py Lines 139 to 142 in befd6a6
(The or clause makes the magic. Although, the comment in second is outdated --- I removed model param because it was buggy). d. By the time you get here, the tree query will work for a specific tree. Next is to make it work with batch-edit. All we need to do is make a format like this (taken from #5091 (comment)).
Note: I'm pretty pretty sure treeId was not necessary anymore (double check this, it makes things more simpler. We can still make things work though.). e. Add a function like
f. Replace
by node.name.lower() if not isinstance(node, TreeRankQuery) else node.get_wb_name() .
And that should be all. This is because the relationship name (and hence the rank) will contain treedefid. If the "treeId" prop is required (like in here:
g. You'd not need to make adjustments to parsing of batch-edit pack. h. Things should just work out of the box by this time now. Second way:a. Don't make any changes to the frontend. specify7/specifyweb/stored_queries/batch_edit.py Line 1034 in 7e478b9
That is, suppose you have two trees Fossil and Rocks and both contain rank species. And user's query was (Determination -> Taxon -> Species -> Name), then rewrite this to contain two query fields: (Determination -> Taxon -> Species (Fossil) -> Name) So, each such field will get diverged into however many trees contain it. I can look over the changes in TreeRecord.py more deeply later (on Tuesday/Thursday morning/evening). Querying a specific tree support on the backend is needed regardless. Definitely see if you have a better idea. On the frontend btw, it'll be probably be beneficial to add the treedef name in the list in BatchEdit.tsx that asks you to select extra fields. Also, if you feel things can be simplified in batch-edit in general (even for simple fields PR), definitely do that (and lmk! only fair since I called other code bloated). |
specifyweb/frontend/js_src/lib/components/WbPlanView/navigatorSpecs.ts
Outdated
Show resolved
Hide resolved
Vinnie
You are keeping all this stuff in your head? I'm worried about you.
James H. Beach
Specify Collections Consortium
Biodiversity Institute
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
***@***.***
www.specifysoftware.org
Office: +1 785.864.4645
Cell: +1 785.331.8508
…On Sun, Feb 9, 2025 at 12:51 AM Vinayak Jha ***@***.***> wrote:
RE: #6196 (comment)
<#6196 (comment)>.
Yeah, that's what I was stuck on before and was a hard problem. Actually,
there are atleast two ways of approaching it:
First way:
Allow query builder to support specific trees (related: #5251
<#5251>). This would involve a
bit of frontend change, although, I've no memory / idea of how hard that
will be, best person might actually be Jason. On the backend (assuming
treedef is available in queryfieldspec)
a. We still need to make the worst case assumption that the user selects a
rank from any tree (so we still need to take the maximum of treedefitems).
This part is already done. See
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/query_construct.py#L48-L52
b. Need to restrict this to just one tree.
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/query_construct.py#L68-L71
Simplest way will be to add a treeid member field in TreeRankQuery when
parsing. Somewhere here:
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/queryfieldspec.py#L189-L194
c. Now, when say a new queryfieldspec with a treeid is encountered, the
case-when block will select the correct case where the treedefitem_id is
encountered. In the current implementation if say ranks from treedefs 5 and
6 are searched, then *both* are searched. This is because of the
following two places:
https://github.com/specify/specify7/blob/befd6a625df3ef0999d5df8bc14b4aea4302acd8/specifyweb/stored_queries/query_construct.py#L92-L94
https://github.com/specify/specify7/blob/befd6a625df3ef0999d5df8bc14b4aea4302acd8/specifyweb/stored_queries/query_construct.py#L139-L142
(The *or* clause makes the magic. Although, the comment in second is
outdated --- I removed model param because it was buggy).
d. By the time you get here, the tree query will work for a specific tree.
Next is to make it work with batch-edit. *All* we need to do is make a
format like this (taken from #5091 (comment)
<#5091 (comment)>).
"mustMatchTreeRecord": {
"ranks": {
"SomeTreeName~>Class": {
"treeNodeCols": {
"name": {
"matchBehavior": "ignoreAlways",
"nullAllowed": true,
"default": null,
"column": "Class"
}
},
"treeId": 1
},
}
Note: I'm pretty pretty sure treeId was not necessary anymore (double
check this, it makes things more simpler. We can still make things work
though.).
e. Add a function like get_wb_name to TreeRankQuery. Assuming that there
is a treeId member field, it'll be something like (in pseudocode). *Yes,
getting self.treeId (so the ID) instead of the treeName -- not a mistake!
-- it makes things simpler later*
def get_wb_name(self): return self.treeId + "~>" + self.name # Please use the constant for "~>" though.
f. Replace
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L399
by node.name.lower() if not isinstance(node, TreeRankQuery) else
node.get_wb_name().
And that should be all. This is because the relationship name (and hence
the rank) will contain treedefid. If the "treeId" prop is required (like in
here:
https://github.com/specify/specify7/blob/eb3c8200bad92af9f43c063655ba47f8a02badf2/specifyweb/frontend/js_src/lib/tests/fixtures/uploadplan.1.json#L319).
Trace the code out in TreeRecord.py (as introduced in #5091
<#5091>) and see if it actually
is needed. I can spend time for a more in-depth review laer. That file
(after 5091) is waaay too bloated (and I intended on refactoring as part of
the merge of production and 4929). If it looks like it is actually needed,
construct TreeRankRecord itself. To do that, extract the id from the key
here (which was added by get_wb_name, and then later get the name of the
treedef). The key, in question, is here:
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L934.
Yeah, quite hacky, but things are unnecessarily complicated from bloating
in TreeRecord.py.
g. You'd not need to make adjustments to parsing of batch-edit pack.
h. Things should just work out of the box by this time now.
Second way:
a. Don't make any changes to the frontend.
b. Before getting row plan map in batch-edit query below, augment the
query such that if a rank appears, then you add all trees that contains
that rank (in that discipline). So, essentially, you are making the choice
for the user here (that's why no frontend changes).
https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L1034
That is, suppose you have two trees Fossil and Rocks and both contain rank
species. And user's query was (Determination -> Taxon -> Species -> Name),
then rewrite this to contain two query fields:
(Determination -> Taxon -> Species (Fossil) -> Name)
(Determination -> Taxon -> Species (Rocks) -> Name)
So, each such field will get diverged into however many trees contain it.
If you have some kind of filter, things become more complicated (in the
worst case, you'd need to generate four fields instead of two -- the extra
two will be for OR null).
This approach is quite complicated to implement correctly. However, the
all the backend changes from method 1 will still work here (so those are
common). Maybe ask UX to see what is acceptable?
I can look over the changes in TreeRecord.py more deeply later (on
Tuesday/Thursday morning/evening). Querying a specific tree support on the
backend is needed regardless. Definitely see if you have a better idea. On
the frontend btw, it'll be probably be beneficial to add the treedef name
in the list in BatchEdit.tsx that asks you to select extra fields. Also, if
you feel things can be simplified in batch-edit in general (even for simple
fields PR), definitely do that.
—
Reply to this email directly, view it on GitHub
<#6196 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACTDB4RLNEIZZB2DRCWN5ZT2O33FPAVCNFSM6AAAAABWRIGB6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBWGA4TOMZSHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Eh, just wanted to give Sharad more context about this - I recall how hard it was to fix bugs in xml-editor, and that's after we had videos from Max! If it makes you feel any better, I don't remember much beyond changes I did in batch-edit and nested-to-manys. |
huh that is weird. By the way, is the Deltodus duplicated? What does the query look like? |
Weird, turns out one of the Deltodus actually belongs to Species instead: https://calvertmarinemuseum20250206-production.test.specifysystems.org/specify/view/taxon/38726/ |
Ah that's fine (repeat of that taxon). Curious what the sql query itself looks like on this branch for that query not working. It should be log level debug |
@realVinayak Prod: https://pastebin.com/imefgzi0 Here's a diffcheck in case pastebin doesn't like me: https://www.diffchecker.com/PxUxsDFd/ Prod looks at the treedefid as well in the case whens. Looks like I missed something somewhere |
that's perfect thanks. I can look at that in an hour or so. I did see the treedefid discrepancy in the code, but that would not matter (so yes, the code from production is redundant). |
This is a wacky bug, and it also happens in issue-5413 (so batch edit too) but I think you already knew that. The diffs helped. Specifically, I noticed this: specify7/specifyweb/stored_queries/execution.py Lines 882 to 889 in 4814f11
The reason is because TreeRankQuery instances are dynamically constructed. The proper way to fix this would be use NamedTuples for fields and relationships. See #628. I guess a decent way could be to
EDIT: I jumped the gun on the explanation. The bug originates here:
The way ORs work is by constructing the same queryfieldspec but implicitly ORing them. Above, the fs.fieldspecs are all different in the cases where TreeRankQuery occurs in the join_path (because the class instances are dynamic). Not a problem for regular fields and relationships because they are created just once at the beginning and are then "static". |
NOTE:
|
@realVinayak Since the frontend queries are not to be changed and we want to add missing fields to the dataset directly, I think the second way you mentioned sounds appropriate. Can you clarify what you mean by adding two extra fields for Also, I tried hardcoding some things around to get a feel for what changes need to be made. When I make a change to a Taxon and commit, it gets highlighted as a new cell and a new node gets uploaded to the tree. The behavior is the same in #4929 before the prod merge (commit 2d86ba4). Is this expected? |
Sorry for the delay. Question 1
Note here that the OR is needed. Otherwise, you'd be constructing
which can never be true. So, I thought this could be fixed by making the below (which is what the message above meant)
But the last one is actually wrong because then the cases where both are null will also be added. In conclusion, the first variant is correct. Since only one of the OR conditions can be true at a time anyways, it's good. Question 2 And you bring it in batch-edit, like below and change the name (the yellow), it'll try cloning all the attributes of the previous "TestSpeciesOther", and try inserting like workbench. So, you can never actually do batch-edit on tree ranks. This might seem like there's then no need of supporting tree ranks, but, bulk changing determinations can be done this way (the changed node will be matched and created if not found). Does this make sense? I'd recommend asking grant if he has sometime to discuss this, I did demo this quite a bit before. BTW, here is the corresponding query (for the above screenshots) You can access the query at http://localhost:5050/specify/query/261/. Lmk if that link doesn't work /j. Misc Why do we care about the "highest" rank? Highest means from the bottom. Lower rankid, more higher in the tree. There is no harm in always showing all the ranks to the user. That is, every time a relationship to tree (with ranks) appears, we can show all the ranks for all the trees. This simplifies implementation quite a bit (all that findMissingRank stuff won't be needed). However, this is bad for usability because, you know, some have like 27 ranks + multiple trees can't make it better. So, we show the least ranks in least trees needed to get stuff done. For a single tree, it'll be all the ranks below the highest. But, wait -- do we always need that? If the user selects "Species" and "Genus", are there cases where we can show just those two ranks? Yes, in fact, if taxon is the base table, you can do that. But, if you are something like CO->det->taxon, we need to show all the ranks below the highest (genus). (however, what about below the lowest rank, so instead of everything below genus, what about everything below species? why would that not work? -- think about this! there are cases where this won't work) In the case of multiple tree, we can do the exact same thing, but for all the trees the selected ranks can be part of. In fact, we can only show just those trees that the user selected via ranks (rather than showing all trees). Need to be careful here, there can be more than one path to the exact same tree. Think about "CO->CollectingEvent->CEA->taxon" and "CO->det->taxon". misc again Batch-edit is two steps as you know:
I actually got the first part done for multiple trees yesterday (sunday) - I was curious about it. It does follow the requirements that were mentioned. I don't expect you to use it (that's why I have not made a PR on sp7 repo). However, if you're curious about it, here is the PR equivalent on my fork of sp7 repo (comparing my changes to this branch): https://github.com/realVinayak/specify7/pull/1. The main function you'd be tracing is this. The above screenshots are actually from that branch. You might recognize the different tree names. Here are some brief feature notes: Good things
Bad things:
Reflections |
@realVinayak Thank you! Your approach is a lot cleaner than what I was trying to do. I'm gonna use your code if that's okay. I'm not sure TreeRankRecord is needed for the upload plan. It seems upload works the same as long as the key follows the format |
Yeah that's fine - it's open source and public. Still, if you feel you can simplify logic more, please do so!
Yeah, I ended up making a conservative assumption that I need to use TreeRankRecord. I think the apply_batch_edit_pack for TreeRecord would need some adjustement (since, internally, the batch_edit.py uses the key of format EDIT: I just looked at the treerecord.py file. The changes necessary (from what it seems like) should be here (and not in apply_batch_edit_pack, spoke too fast). specify7/specifyweb/workbench/upload/treerecord.py Lines 1077 to 1081 in aa81431
Specifically, just format the key to be something like
I haven't tested above but that's the idea. Still need to trace out later to see if there are any gotchas from the prod code changes. |
Triggered by 9407376 on branch refs/heads/issue-6127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that many of the hidden relationship fields (i.e. Created By Agent > First name) that were added for #6011 throw errors if you query on them. Should that be addressed in this PR, or should I write it up separately?
Specify 7 Crash Report - 2025-03-10T18_51_50.087Z.txt

I don't agree with ranks from other trees being carried over because it feels inconsistent and confusing. I could kind of understand lower ranks from the same tree being necessary but adding ranks from other trees feels like it would complicate things. This is just an example so not necessarily realistic, but this db has 3 different trees with overlapping ranks and even though I just wanted to map family, genus, species, and subspecies, it ends up adding 17 columns. 03-10_14.31.mp4Aside from narrowing down the query by looking for specific taxa (which could still overlap between trees), could we add another way so you could select what trees you do or don't want before batch editing? I know some work was done on #5251 about being able to select the specific tree and then select the rank for each tree (like how workbench is), is there any plan to implement that or can we make a plan for it? |
@lexiclevenger I think a separate issue for that would be great |
@emenslin That's a very fair concern. We actually discussed this issue in a dev meeting before and came to this conclusion (#6196 (comment)) because we need the ability to query over any rank across all trees in the db - which means we cannot really restrict queries to a specific tree and need to add extra fields directly from the backend instead when batch editing. One potential alternative we discussed was to have tree names in the query (like in the WB) along with an option I'm happy to have this in a separate issue as an improvement though. We will have to implement it if we decide to go ahead with #5251 |
This sounds like the ideal solution in my opinion, I understand that it would likely take a while, but this seems a lot better usability wise than the current implementation. I know that not all users will have multiple trees, but the current implementation would be bad for users who do.
Some other solutions until we could implement that could be allowing users to collapse columns they don’t want to use, a dialog after pressing batch edit that could let users select which trees they want included, having the data set ordered by tree then rank instead of rank than tree. Using my original video for an example: show ranks for paleo taxonomy tree from family -> subspecies, then family -> subspecies for mammalogy, etc., instead of the current implementation where it shows each trees family first, then genus, then species, etc. There's probably some better solutions out there but these were just some that I could think of. Any thoughts @specify/ux-testing? |
@emenslin I like the idea of selecting trees from a dialog. It would add more development time but it would simplify batch edit datasets in #6126. Currently in #6126 when editing CO -> determinations you can end up with a ridiculously large number of tree rank columns because each missing rank from each tree will get added for each determination that exists in a CO. Reordering the extra columns doesn't seem straightforward. I'll look it into it but we may have to push that to a different issue for later Any thoughts on this? @grantfitzsimmons |
To me, it seems that this is a necessary extension if we want to maximize the usefulness of batch edit in any collection with multiple trees / types. Since I believe that's what we want to do, I think it justifies the complexity.
|
Refactored a bunch of stuff to group missing ranks by tree name: The plan is to add checkboxes next to tree name if the user has multiple trees. If there is only a single tree in the db, the dialog will only notify of missing ranks being added for that tree (and no checkbox) Let me know if you have any suggestions @emenslin @grantfitzsimmons |
Triggered by 7f2149d on branch refs/heads/issue-6127
- Adds checkboxes to tree names in missing ranks dialog - Splits the main batch edit file into 4 smaller files
Triggered by 1c29ec9 on branch refs/heads/issue-6127
Fixes #6127
Fixes #6011
Fixes #6315
Warning
This PR does not directly contain a migration but since it's based on #5417, make sure to use a dataset where #5417 was tested.
This PR enables batch editing on Tree tables and enables support for querying on all tree fields when choosing a specific rank (no longer limited to Author and Full Name). Functionally, this PR will work the same as the workbench when uploading to a tree rank.
Batch editing with tree as base table
name
field is required when querying on a tree rankany rank
), Batch Edit expects to have all ranks lower than the chosen rank in the queryTree 1
andTree 2
:Tree 1 -> Species
andTree 1 -> Subspecies
Tree 2 -> Species
(lowest rank)Tree 1 -> Species -> name
Tree 1 -> Species -> name
Tree 1 -> Subspecies -> name
Tree 2 -> Species -> name
Checklist
self-explanatory (or properly documented)
Testing instructions
Warning
NOTE: This PR will have the same capabilities as in the workbench when uploading to a Tree table if you edit the name field (i.e: Edited records will not be considered 'updated' cells but rather be a new upload to the tree).
However, other simple fields of the Tree can still be edited and will highlight as updated cell.
Also, Workbench only allows entering data for one rank in a row. Batch edit will have the same behavior.
The main benefits of this PR (matching a tree record when editing) will be tested in #6283
Test #6011
Test batch edit with a single tree in db
Test batch edit with multiple trees