Batch edit for multiple trees #6196

sharadsw · 2025-02-05T14:48:39Z

Warning

This PR does not directly contain a migration but since it's based on #5417, make sure to use a dataset where #5417 was tested.

This PR enables batch editing on Tree tables and enables support for querying on all tree fields when choosing a specific rank (no longer limited to Author and Full Name). Functionally, this PR will work the same as the workbench when uploading to a tree rank.

Batch editing with tree as base table

The name field is required when querying on a tree rank
When choosing a specific rank (and not any rank), Batch Edit expects to have all ranks lower than the chosen rank in the query
- If the user does not add the "missing" fields, it will be added to the dataset implicitly after notifying the user
- Example: Say you have the following ranks for trees Tree 1 and Tree 2:
  - Tree 1 -> Species and Tree 1 -> Subspecies
  - Tree 2 -> Species (lowest rank)
- And you make a query with:
  - Tree 1 -> Species -> name
- Batch edit will implicitly add missing ranks and augment them to the dataset. The dataset will then look like
  - Tree 1 -> Species -> name
  - Tree 1 -> Subspecies -> name
  - Tree 2 -> Species -> name
When using a Tree as a base table with Batch Edit, the records are not edited directly. Rather the record is cloned and uploaded to the tree, which means functionally it would work the same as the workbench. The implementation in this PR will be more apparent in Batch edit for relationships #6283 when trying to edit tree records through a relationship. eg: Determinations (more info here: Batch edit for multiple trees #6196 (comment))
To-many relationships in trees are not allowed for batch editing and will lead to an error
- For example a query like this has a to-many children field
- Batch editing would lead to an error like this

Checklist

Self-review the PR after opening it to make sure the changes look good and
self-explanatory (or properly documented)
Add automated tests
Add relevant issue to release milestone
Add relevant documentation (Tester - Dev)
Add a reverse migration if a migration is present in the PR

Testing instructions

Warning

NOTE: This PR will have the same capabilities as in the workbench when uploading to a Tree table if you edit the name field (i.e: Edited records will not be considered 'updated' cells but rather be a new upload to the tree).

However, other simple fields of the Tree can still be edited and will highlight as updated cell.

Also, Workbench only allows entering data for one rank in a row. Batch edit will have the same behavior.

The main benefits of this PR (matching a tree record when editing) will be tested in #6283

Test #6011

Create a query with a Tree base table (eg: Taxon, Storage, Geography, Lithostrat, Chronostrat, Tectonic Unit)
Verify all tree fields can be queried with

Test batch edit with a single tree in db

Make a tree query
Create a batch edit dataset
Make changes
Test commit
Test rollback

Test batch edit with multiple trees

Make a tree query with a rank that exists in multiple trees
Verify there is a missing rank error dialog
Click on continue
Verify missing fields were added to the batch edit dataset (ordering of new columns is expected to vary)
Test commit
Test rollback

specifyweb/stored_queries/query_construct.py

sharadsw · 2025-02-07T22:19:17Z

The query builder currently only shows rank names to filter on even when you have multiple trees. If you have multiple trees with the same rank name and rankId, that rank will only have one entry in the QB and it will refer to both trees. I'm not sure what we can do to avoid duplicate names or the long list of names in the error here unless we make the QB mapper have the same behavior as the Workbench mapper where you can specify which tree you want to query on. (see: #6127 (comment))

realVinayak

RE #6196 (comment).
Yeah this is the correct idea! But, shouldn't be needing the params at all. I'm not sure why it was included in the initial query builder by Ben.

specifyweb/stored_queries/query_construct.py

realVinayak · 2025-02-09T06:50:41Z

RE: #6196 (comment).

Yeah, that's what I was stuck on before and was a hard problem. Actually, there are atleast two ways of approaching it:

First way:

Allow query builder to support specific trees (related: #5251). This would involve a bit of frontend change, although, I've no memory / idea of how hard that will be, best person might actually be Jason. On the backend (assuming treedef is available in queryfieldspec)

a. We still need to make the worst case assumption that the user selects a rank from any tree (so we still need to take the maximum of treedefitems). This part is already done. See

specify7/specifyweb/stored_queries/query_construct.py

Lines 48 to 52 in 7e478b9

    
           treedefs = get_treedefs(query.collection, table.name) 
        
           # We need to take the max here. Otherwise, it is possible that the same rank 
        
           # name may not occur at the same level across tree defs. 
        
           max_depth = max(depth for _, depth in treedefs)

b. Need to restrict this to just one tree.

specify7/specifyweb/stored_queries/query_construct.py

Lines 68 to 71 in 7e478b9

    
           treedefs_with_ranks: List[Tuple[int, int]] = [tup for tup in [ 
        
               (treedef_id, _safe_filter(item_model.objects.filter(treedef_id=treedef_id, name=tree_rank).values_list('id', flat=True))) 
        
               for treedef_id, _ in treedefs 
        
               ] if tup[1] is not None]

Simplest way will be to add a treeid member field in TreeRankQuery when parsing. Somewhere here:

specify7/specifyweb/stored_queries/queryfieldspec.py

Lines 189 to 194 in 7e478b9

    
           tree_rank = TreeRankQuery( 
        
               name=tree_rank_name,  
        
               relatedModelName=node.name,  
        
               type="many-to-one", 
        
               column=node.idField.column 
        
           )

c. Now, when say a new queryfieldspec with a treeid is encountered, the case-when block will select the correct case where the treedefitem_id is encountered. In the current implementation if say ranks from treedefs 5 and 6 are searched, then both are searched. This is because of the following two places:

specify7/specifyweb/stored_queries/query_construct.py

Lines 92 to 94 in befd6a6

    
           # We don't want to include treedef if the rank is not present. 
        
           new_filters = [*query.internal_filters, getattr(node, treedef_column).in_(defs_to_filter_on)] 
        
           query = query._replace(internal_filters=new_filters)

specify7/specifyweb/stored_queries/query_construct.py

Lines 139 to 142 in befd6a6

    
           # To make things "simpler", it doesn't apply any filters, but returns a single predicate 
        
           # @model is an input parameter, because cannot guess if it is aliased or not (callers are supposed to know that) 
        
           def get_internal_filters(self): 
        
               return sql.or_(*self.internal_filters)

(The or clause makes the magic. Although, the comment in second is outdated --- I removed model param because it was buggy).

d. By the time you get here, the tree query will work for a specific tree. Next is to make it work with batch-edit. All we need to do is make a format like this (taken from #5091 (comment)).

"mustMatchTreeRecord": { 
   "ranks": { 
     "SomeTreeName~>Class": { 
       "treeNodeCols": { 
         "name": { 
           "matchBehavior": "ignoreAlways", 
           "nullAllowed": true, 
           "default": null, 
           "column": "Class" 
         } 
       }, 
       "treeId": 1 
     }, 
 }

Note: I'm pretty pretty sure treeId was not necessary anymore (double check this, it makes things more simpler. We can still make things work though.).

e. Add a function like get_wb_name to TreeRankQuery. Assuming that there is a treeId member field, it'll be something like (in pseudocode). Yes, getting self.treeId (so the ID) instead of the treeName -- not a mistake! -- it makes things simpler later

def get_wb_name(self): return self.treeId + "~>"  + self.name # Please use the constant for "~>" though.

f. Replace

specify7/specifyweb/stored_queries/batch_edit.py

Line 399 in 7e478b9

node.name.lower() if not isinstance(node, TreeRankQuery) else node.name

by node.name.lower() if not isinstance(node, TreeRankQuery) else node.get_wb_name().

And that should be all. This is because the relationship name (and hence the rank) will contain treedefid. If the "treeId" prop is required (like in here:

specify7/specifyweb/frontend/js_src/lib/tests/fixtures/uploadplan.1.json

Line 319 in eb3c820

"treeId": 1

). Trace the code out in TreeRecord.py (as introduced in #5091) and see if it actually is needed. I can spend time for a more in-depth review laer. That file (after 5091) is waaay too bloated (and I intended on refactoring as part of the merge of production and 4929). If it looks like it is actually needed, construct TreeRankRecord itself. To do that, extract the id from the key here (which was added by get_wb_name, and then later get the name of the treedef). The key, in question, is here:

specify7/specifyweb/stored_queries/batch_edit.py

Line 934 in 7e478b9

for (key, upload_table) in to_one_upload_tables.items()

. Yeah, quite hacky, but things are unnecessarily complicated from bloating in TreeRecord.py.

g. You'd not need to make adjustments to parsing of batch-edit pack.

h. Things should just work out of the box by this time now.

Second way:

a. Don't make any changes to the frontend.
b. Before getting row plan map in batch-edit query below, augment the query such that if a rank appears, then you add all trees that contains that rank (in that discipline). So, essentially, you are making the choice for the user here (that's why no frontend changes).

specify7/specifyweb/stored_queries/batch_edit.py

Line 1034 in 7e478b9

row_plan = RowPlanMap.get_row_plan(visible_fields)

That is, suppose you have two trees Fossil and Rocks and both contain rank species. And user's query was (Determination -> Taxon -> Species -> Name), then rewrite this to contain two query fields:

(Determination -> Taxon -> Species (Fossil) -> Name)
(Determination -> Taxon -> Species (Rocks) -> Name)

So, each such field will get diverged into however many trees contain it.
If you have some kind of filter, things become more complicated (in the worst case, you'd need to generate four fields instead of two -- the extra two will be for OR null).
This approach is quite complicated to implement correctly. However, the all the backend changes from method 1 will still work here (so those are common). Maybe ask UX to see what is acceptable?

I can look over the changes in TreeRecord.py more deeply later (on Tuesday/Thursday morning/evening). Querying a specific tree support on the backend is needed regardless. Definitely see if you have a better idea. On the frontend btw, it'll be probably be beneficial to add the treedef name in the list in BatchEdit.tsx that asks you to select extra fields. Also, if you feel things can be simplified in batch-edit in general (even for simple fields PR), definitely do that (and lmk! only fair since I called other code bloated).

specifyweb/frontend/js_src/lib/components/WbPlanView/navigatorSpecs.ts

beach53 · 2025-02-09T22:09:27Z

Vinnie You are keeping all this stuff in your head? I'm worried about you. James H. Beach Specify Collections Consortium Biodiversity Institute University of Kansas 1345 Jayhawk Boulevard Lawrence, KS 66045, USA ***@***.*** www.specifysoftware.org Office: +1 785.864.4645 Cell: +1 785.331.8508

…

On Sun, Feb 9, 2025 at 12:51 AM Vinayak Jha ***@***.***> wrote: RE: #6196 (comment) <#6196 (comment)>. Yeah, that's what I was stuck on before and was a hard problem. Actually, there are atleast two ways of approaching it: First way: Allow query builder to support specific trees (related: #5251 <#5251>). This would involve a bit of frontend change, although, I've no memory / idea of how hard that will be, best person might actually be Jason. On the backend (assuming treedef is available in queryfieldspec) a. We still need to make the worst case assumption that the user selects a rank from any tree (so we still need to take the maximum of treedefitems). This part is already done. See https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/query_construct.py#L48-L52 b. Need to restrict this to just one tree. https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/query_construct.py#L68-L71 Simplest way will be to add a treeid member field in TreeRankQuery when parsing. Somewhere here: https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/queryfieldspec.py#L189-L194 c. Now, when say a new queryfieldspec with a treeid is encountered, the case-when block will select the correct case where the treedefitem_id is encountered. In the current implementation if say ranks from treedefs 5 and 6 are searched, then *both* are searched. This is because of the following two places: https://github.com/specify/specify7/blob/befd6a625df3ef0999d5df8bc14b4aea4302acd8/specifyweb/stored_queries/query_construct.py#L92-L94 https://github.com/specify/specify7/blob/befd6a625df3ef0999d5df8bc14b4aea4302acd8/specifyweb/stored_queries/query_construct.py#L139-L142 (The *or* clause makes the magic. Although, the comment in second is outdated --- I removed model param because it was buggy). d. By the time you get here, the tree query will work for a specific tree. Next is to make it work with batch-edit. *All* we need to do is make a format like this (taken from #5091 (comment) <#5091 (comment)>). "mustMatchTreeRecord": { "ranks": { "SomeTreeName~>Class": { "treeNodeCols": { "name": { "matchBehavior": "ignoreAlways", "nullAllowed": true, "default": null, "column": "Class" } }, "treeId": 1 }, } Note: I'm pretty pretty sure treeId was not necessary anymore (double check this, it makes things more simpler. We can still make things work though.). e. Add a function like get_wb_name to TreeRankQuery. Assuming that there is a treeId member field, it'll be something like (in pseudocode). *Yes, getting self.treeId (so the ID) instead of the treeName -- not a mistake! -- it makes things simpler later* def get_wb_name(self): return self.treeId + "~>" + self.name # Please use the constant for "~>" though. f. Replace https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L399 by node.name.lower() if not isinstance(node, TreeRankQuery) else node.get_wb_name(). And that should be all. This is because the relationship name (and hence the rank) will contain treedefid. If the "treeId" prop is required (like in here: https://github.com/specify/specify7/blob/eb3c8200bad92af9f43c063655ba47f8a02badf2/specifyweb/frontend/js_src/lib/tests/fixtures/uploadplan.1.json#L319). Trace the code out in TreeRecord.py (as introduced in #5091 <#5091>) and see if it actually is needed. I can spend time for a more in-depth review laer. That file (after 5091) is waaay too bloated (and I intended on refactoring as part of the merge of production and 4929). If it looks like it is actually needed, construct TreeRankRecord itself. To do that, extract the id from the key here (which was added by get_wb_name, and then later get the name of the treedef). The key, in question, is here: https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L934. Yeah, quite hacky, but things are unnecessarily complicated from bloating in TreeRecord.py. g. You'd not need to make adjustments to parsing of batch-edit pack. h. Things should just work out of the box by this time now. Second way: a. Don't make any changes to the frontend. b. Before getting row plan map in batch-edit query below, augment the query such that if a rank appears, then you add all trees that contains that rank (in that discipline). So, essentially, you are making the choice for the user here (that's why no frontend changes). https://github.com/specify/specify7/blob/7e478b967a4b2be2e5cfbc868535a574f8535c88/specifyweb/stored_queries/batch_edit.py#L1034 That is, suppose you have two trees Fossil and Rocks and both contain rank species. And user's query was (Determination -> Taxon -> Species -> Name), then rewrite this to contain two query fields: (Determination -> Taxon -> Species (Fossil) -> Name) (Determination -> Taxon -> Species (Rocks) -> Name) So, each such field will get diverged into however many trees contain it. If you have some kind of filter, things become more complicated (in the worst case, you'd need to generate four fields instead of two -- the extra two will be for OR null). This approach is quite complicated to implement correctly. However, the all the backend changes from method 1 will still work here (so those are common). Maybe ask UX to see what is acceptable? I can look over the changes in TreeRecord.py more deeply later (on Tuesday/Thursday morning/evening). Querying a specific tree support on the backend is needed regardless. Definitely see if you have a better idea. On the frontend btw, it'll be probably be beneficial to add the treedef name in the list in BatchEdit.tsx that asks you to select extra fields. Also, if you feel things can be simplified in batch-edit in general (even for simple fields PR), definitely do that. — Reply to this email directly, view it on GitHub <#6196 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACTDB4RLNEIZZB2DRCWN5ZT2O33FPAVCNFSM6AAAAABWRIGB6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBWGA4TOMZSHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

realVinayak · 2025-02-10T02:55:22Z

Eh, just wanted to give Sharad more context about this - I recall how hard it was to fix bugs in xml-editor, and that's after we had videos from Max! If it makes you feel any better, I don't remember much beyond changes I did in batch-edit and nested-to-manys.

sharadsw · 2025-02-18T22:05:17Z

NOTE: Querying on Taxon names that involve different trees don't work correctly in this PR.
In prod:

Deltodus and Test Genus belong to different trees and the results work correctly. In this branch, a query like this returns no result at all.

realVinayak · 2025-02-19T16:29:32Z

huh that is weird. By the way, is the Deltodus duplicated? What does the query look like?

sharadsw · 2025-02-19T16:45:34Z

Weird, turns out one of the Deltodus actually belongs to Species instead: https://calvertmarinemuseum20250206-production.test.specifysystems.org/specify/view/taxon/38726/

realVinayak · 2025-02-19T16:57:15Z

Ah that's fine (repeat of that taxon). Curious what the sql query itself looks like on this branch for that query not working. It should be log level debug

sharadsw · 2025-02-19T17:20:36Z

@realVinayak Prod: https://pastebin.com/imefgzi0
Batch edit: https://pastebin.com/ADt4HrGx

Here's a diffcheck in case pastebin doesn't like me: https://www.diffchecker.com/PxUxsDFd/

Prod looks at the treedefid as well in the case whens. Looks like I missed something somewhere

realVinayak · 2025-02-19T17:50:29Z

that's perfect thanks. I can look at that in an hour or so. I did see the treedefid discrepancy in the code, but that would not matter (so yes, the code from production is redundant).

realVinayak · 2025-02-19T21:01:09Z

This is a wacky bug, and it also happens in issue-5413 (so batch edit too) but I think you already knew that. The diffs helped. Specifically, I noticed this:

That is, bug had to be here:

specify7/specifyweb/stored_queries/execution.py

Lines 882 to 889 in 4814f11

    
           if props.implicit_or: 
        
               implicit_ors = [ 
        
                   reduce(sql.or_, ps) for ps in predicates_by_field.values() if ps 
        
               ] 
        
               if implicit_ors: 
        
                   where = reduce(sql.and_, implicit_ors) 
        
                   query = query.filter(where)

The reason is because TreeRankQuery instances are dynamically constructed. The proper way to fix this would be use NamedTuples for fields and relationships. See #628.

I guess a decent way could be to

class TreeRankQuery(Relationship):
    # FUTURE: used to remember what the previous value was. Useless after 6 retires
    original_field: str

    def __hash__(self):
        return hash((TreeRankQuery, self.relatedModelName, self.name))

    def __eq__(self, value):
        return (
            isinstance(value, TreeRankQuery)
            and value.name == self.name
            and value.relatedModelName == self.relatedModelName
        )

EDIT: I jumped the gun on the explanation. The bug originates here:

specify7/specifyweb/stored_queries/execution.py

Line 880 in 38a632d

predicates_by_field[fs.fieldspec].append(predicate)

The way ORs work is by constructing the same queryfieldspec but implicitly ORing them. Above, the fs.fieldspecs are all different in the cases where TreeRankQuery occurs in the join_path (because the class instances are dynamic). Not a problem for regular fields and relationships because they are created just once at the beginning and are then "static".

- see: #6196 (comment)

sharadsw · 2025-02-20T22:02:25Z

NOTE:
After internal discussion, the conclusion that was reached for batch editing with multiple trees was:

No change to frontend queries as we need the ability to search on ranks across all trees
The error dialog (Batch edit for multiple trees #6196 (comment)) will instead be a prompt that notifies the users what fields are missing
The missing fields will then be added on the backend when creating the batch edit dataset

sharadsw · 2025-02-20T22:30:08Z

@realVinayak Since the frontend queries are not to be changed and we want to add missing fields to the dataset directly, I think the second way you mentioned sounds appropriate. Can you clarify what you mean by adding two extra fields for OR null?

Also, I tried hardcoding some things around to get a feel for what changes need to be made. When I make a change to a Taxon and commit, it gets highlighted as a new cell and a new node gets uploaded to the tree. The behavior is the same in #4929 before the prod merge (commit 2d86ba4). Is this expected?

realVinayak · 2025-02-24T07:07:10Z

Sorry for the delay.

Question 1
Yeah, the second one seems like the way to go. Actually, I tried implementing it myself, and realized I misspoke when I said adding two extra fields for the null. My initial thought process was to make it so the user makes a query like (Species->name == "whatever"), then this needs to be broken into (assuming species occurs in treedefs 1 and 2)

(Species->name == "whatever" and treedef_id == 1) OR (Species->name == "whatever" and treedef_id == 2)

Note here that the OR is needed. Otherwise, you'd be constructing

(Species->name == "whatever" and treedef_id == 1) AND (Species->name == "whatever" and treedef_id == 2)

which can never be true. So, I thought this could be fixed by making the below (which is what the message above meant)

([Species->name == "whatever" OR Species->name is null] and treedef_id == 1)  AND ([Species->name == "whatever" OR Species->name is null] and treedef_id == 2)

But the last one is actually wrong because then the cases where both are null will also be added.

In conclusion, the first variant is correct. Since only one of the OR conditions can be true at a time anyways, it's good.

Question 2
Yes, that's "expected". Actually, we can't sanely do batch-edit on tree ranks (notice the specificity), it's all a lie. There's just isn't a clear definition for it, so I just made it exactly like workbench, but that record data is cloned. That is, say you have something like

And you bring it in batch-edit, like below and change the name (the yellow), it'll try cloning all the attributes of the previous "TestSpeciesOther", and try inserting like workbench. So, you can never actually do batch-edit on tree ranks.

This might seem like there's then no need of supporting tree ranks, but, bulk changing determinations can be done this way (the changed node will be matched and created if not found). Does this make sense? I'd recommend asking grant if he has sometime to discuss this, I did demo this quite a bit before.

BTW, here is the corresponding query (for the above screenshots)

You can access the query at http://localhost:5050/specify/query/261/. Lmk if that link doesn't work /j.

Misc

Why do we care about the "highest" rank?
Just to be on the same page about trees.

Highest means from the bottom. Lower rankid, more higher in the tree.

There is no harm in always showing all the ranks to the user. That is, every time a relationship to tree (with ranks) appears, we can show all the ranks for all the trees. This simplifies implementation quite a bit (all that findMissingRank stuff won't be needed). However, this is bad for usability because, you know, some have like 27 ranks + multiple trees can't make it better. So, we show the least ranks in least trees needed to get stuff done. For a single tree, it'll be all the ranks below the highest.

But, wait -- do we always need that? If the user selects "Species" and "Genus", are there cases where we can show just those two ranks? Yes, in fact, if taxon is the base table, you can do that. But, if you are something like CO->det->taxon, we need to show all the ranks below the highest (genus).

(however, what about below the lowest rank, so instead of everything below genus, what about everything below species? why would that not work? -- think about this! there are cases where this won't work)

In the case of multiple tree, we can do the exact same thing, but for all the trees the selected ranks can be part of. In fact, we can only show just those trees that the user selected via ranks (rather than showing all trees).

Need to be careful here, there can be more than one path to the exact same tree. Think about "CO->CollectingEvent->CEA->taxon" and "CO->det->taxon".

misc again

Batch-edit is two steps as you know:

Query to dataset
Batch-edit within workbench framework

I actually got the first part done for multiple trees yesterday (sunday) - I was curious about it. It does follow the requirements that were mentioned. I don't expect you to use it (that's why I have not made a PR on sp7 repo). However, if you're curious about it, here is the PR equivalent on my fork of sp7 repo (comparing my changes to this branch): https://github.com/realVinayak/specify7/pull/1. The main function you'd be tracing is this. The above screenshots are actually from that branch. You might recognize the different tree names. Here are some brief feature notes:

Good things

Fields are added on the backend. Currently, they are not localized. However, the treedef name are added to the column name (to better disambiguate) -- you can see that in the above workbench screenshot (that's from the above PR), where "botany" and "paleo taxonomy tree" is added.
Only the most required fields, for the least ranks, in the least trees are shown. That is, if you selected Genus, and there happens to be ["Species"] and ["Species", "Subspecies"] below in two different trees, it'll just show those three. For example, in the above screenshot, the other ranks from the different trees were automatically added.
If you select "Species", and there is another tree with ranks ["rank1", "rank2"], that later tree won't be included (because ranks aren't there)
Filters are automatically, and correctly handled. The OR trick did it.
Currently, it acts on the RowPlanMap, so if there are more than 1 path to the same tree, they are still correctly handled.
It's generic for all trees

Bad things:

Doesn't show missing fields on the frontend. Currently, if just directly constructs the dataset from the query (without a popup). However, this can still be done -- it's more simpler than backend logic.
Doesn't do any localizing -- needs hacks.
This is just the first part of batch-edit. haven't spent more than 5 minutes on the workbench/upload/tree_record.py file when coding this, I don't know if the unit tests run correctly (probably need changes) and if the workbench part (component 2) of batch-edit works or not (probably doesn't). However (this might help you in your implementation) -- batch edit for treerecord is significantly more simpler than for uploadtables (because we can't have in-place edits, so all we're doing is just clone). I'd recommend going back to 4929 before the prod merge and checking it out for yourself how less complicated it is, compared to uploadtables (non-tree relationship) edits.
It doesn't necessarily order the columns correctly -- that's complicated.
It's quite complicated in general, will require unit tests
Might make future changes (if sp7 supports multiple treedef query) harder, although, it's quite considerate towards it.

Reflections
I did think about different variations -- doing insertions of fields on the frontend, vs on the backend. Since all the query construction happens on the backend, it does make things a bit easier -- at the cost of localizing, but that can be fixed with hacks... BUT, any of this is not supposed to mean there isn't a better and simpler implementation than what I did in that PR.

sharadsw · 2025-02-24T18:47:25Z

@realVinayak Thank you! Your approach is a lot cleaner than what I was trying to do. I'm gonna use your code if that's okay.

I'm not sure TreeRankRecord is needed for the upload plan. It seems upload works the same as long as the key follows the format <treename>~><rankname> but might as well keep TreeRankRecord just to keep it consistent between WB and Batch Edit. Upload will use the same code regardless

realVinayak · 2025-02-24T20:39:33Z

I'm gonna use your code if that's okay.

Yeah that's fine - it's open source and public. Still, if you feel you can simplify logic more, please do so!

It seems upload works the same as long as the key follows the format......

Yeah, I ended up making a conservative assumption that I need to use TreeRankRecord. I think the apply_batch_edit_pack for TreeRecord would need some adjustement (since, internally, the batch_edit.py uses the key of format <treename>~><rankname>~><treedef_id> but that won't be a hard change (just would require change to apply_batch_edit_pack).

EDIT: I just looked at the treerecord.py file. The changes necessary (from what it seems like) should be here (and not in apply_batch_edit_pack, spoke too fast).

specify7/specifyweb/workbench/upload/treerecord.py

Lines 1077 to 1081 in aa81431

    
           if tdi.name not in self.batch_edit_pack: 
        
               continue 
        
           columns = [pr.column for pr in self.parsedFields[tdi.name]] 
        
           info = ReportInfo(tableName=self.name, columns=columns, treeInfo=None) 
        
           pack = self.batch_edit_pack[tdi.name]

Specifically, just format the key to be something like <treename>~><rankname>~><treedef_id> format. Seems like the only change necessary here (to get it working, not necessarily bug free) would be to make those lines to be

ref_key = f"{tdi.treedef.name}~>{tdi.name}~>{tdi.treedef_id}"
tree_rank_record = TreeRankRecord(tdi.name, tdi.treedef_id)
if ref_key not in self.batch_edit_pack:
    continue
columns = [pr.column for pr in self.parsedFields[tree_rank_record]]
info = ReportInfo(tableName=self.name, columns=columns, treeInfo=None)
pack = self.batch_edit_pack[ref_key]

I haven't tested above but that's the idea. Still need to trace out later to see if there are any gotchas from the prod code changes.

Triggered by 9407376 on branch refs/heads/issue-6127

lexiclevenger

I noticed that many of the hidden relationship fields (i.e. Created By Agent > First name) that were added for #6011 throw errors if you query on them. Should that be addressed in this PR, or should I write it up separately?

Specify 7 Crash Report - 2025-03-10T18_51_50.087Z.txt

emenslin · 2025-03-10T19:32:34Z

I don't agree with ranks from other trees being carried over because it feels inconsistent and confusing. I could kind of understand lower ranks from the same tree being necessary but adding ranks from other trees feels like it would complicate things. This is just an example so not necessarily realistic, but this db has 3 different trees with overlapping ranks and even though I just wanted to map family, genus, species, and subspecies, it ends up adding 17 columns.

03-10_14.31.mp4

Aside from narrowing down the query by looking for specific taxa (which could still overlap between trees), could we add another way so you could select what trees you do or don't want before batch editing? I know some work was done on #5251 about being able to select the specific tree and then select the rank for each tree (like how workbench is), is there any plan to implement that or can we make a plan for it?

sharadsw · 2025-03-10T19:40:55Z

@lexiclevenger I think a separate issue for that would be great

sharadsw · 2025-03-10T20:12:45Z

@emenslin That's a very fair concern. We actually discussed this issue in a dev meeting before and came to this conclusion (#6196 (comment)) because we need the ability to query over any rank across all trees in the db - which means we cannot really restrict queries to a specific tree and need to add extra fields directly from the backend instead when batch editing.

One potential alternative we discussed was to have tree names in the query (like in the WB) along with an option (any tree) similar to (any rank) which would allow us to still have the ability to query across any rank+tree. However, we decided against it because it would be big in scope as it has implications in Queries as well as the Workbench.

I'm happy to have this in a separate issue as an improvement though. We will have to implement it if we decide to go ahead with #5251

emenslin · 2025-03-11T15:34:15Z

This sounds like the ideal solution in my opinion, I understand that it would likely take a while, but this seems a lot better usability wise than the current implementation. I know that not all users will have multiple trees, but the current implementation would be bad for users who do.

One potential alternative we discussed was to have tree names in the query (like in the WB) along with an option (any tree) similar to (any rank) which would allow us to still have the ability to query across any rank+tree

Some other solutions until we could implement that could be allowing users to collapse columns they don’t want to use, a dialog after pressing batch edit that could let users select which trees they want included, having the data set ordered by tree then rank instead of rank than tree. Using my original video for an example: show ranks for paleo taxonomy tree from family -> subspecies, then family -> subspecies for mammalogy, etc., instead of the current implementation where it shows each trees family first, then genus, then species, etc.

There's probably some better solutions out there but these were just some that I could think of. Any thoughts @specify/ux-testing?

sharadsw · 2025-03-11T21:32:50Z

@emenslin I like the idea of selecting trees from a dialog. It would add more development time but it would simplify batch edit datasets in #6126. Currently in #6126 when editing CO -> determinations you can end up with a ridiculously large number of tree rank columns because each missing rank from each tree will get added for each determination that exists in a CO.

Reordering the extra columns doesn't seem straightforward. I'll look it into it but we may have to push that to a different issue for later

Any thoughts on this? @grantfitzsimmons

grantfitzsimmons · 2025-03-12T01:08:16Z

To me, it seems that this is a necessary extension if we want to maximize the usefulness of batch edit in any collection with multiple trees / types. Since I believe that's what we want to do, I think it justifies the complexity.

@emenslin I like the idea of selecting trees from a dialog. It would add more development time but it would simplify batch edit datasets in #6126. Currently in #6126 when editing CO -> determinations you can end up with a ridiculously large number of tree rank columns because each missing rank from each tree will get added for each determination that exists in a CO.

sharadsw · 2025-03-12T20:55:19Z

Refactored a bunch of stuff to group missing ranks by tree name:

The plan is to add checkboxes next to tree name if the user has multiple trees. If there is only a single tree in the db, the dialog will only notify of missing ranks being added for that tree (and no checkbox)

Let me know if you have any suggestions @emenslin @grantfitzsimmons

Triggered by 7f2149d on branch refs/heads/issue-6127

- Adds checkboxes to tree names in missing ranks dialog - Splits the main batch edit file into 4 smaller files

sharadsw · 2025-03-13T19:54:52Z

This is what I have now

Users can pick which trees they want in case of conflicting ranks
Batch edit will only add ranks from selected trees
If no trees are selected, batch edit will use ALL trees like before

When selected ranks do not conflict but there are missing ranks:

Triggered by 1c29ec9 on branch refs/heads/issue-6127

Enable trees in queries

419e37c

sharadsw changed the base branch from production to issue-5413 February 5, 2025 14:48

sharadsw and others added 3 commits February 7, 2025 16:23

Use query construct code from #4929

93e2a0a

Update unit test

cff6f56

Merge branch 'issue-5413' into issue-6127

7e478b9

sharadsw commented Feb 7, 2025

View reviewed changes

specifyweb/stored_queries/query_construct.py Outdated Show resolved Hide resolved

realVinayak reviewed Feb 9, 2025

View reviewed changes

specifyweb/stored_queries/query_construct.py Outdated Show resolved Hide resolved

specifyweb/stored_queries/query_construct.py Outdated Show resolved Hide resolved

realVinayak reviewed Feb 9, 2025

View reviewed changes

specifyweb/frontend/js_src/lib/components/WbPlanView/navigatorSpecs.ts Outdated Show resolved Hide resolved

sharadsw added 3 commits February 13, 2025 17:40

Remove param_count

7d7cdd0

Display tree name in query error

b389372

Merge remote-tracking branch 'origin/issue-5413' into issue-6127

536fe0c

Update TreeRankQuery to fix implicit ORs

aa81431

- see: #6196 (comment)

Allow removing last row in Batch Edit

52e312f

Make a missing rank info dialog which proceeds to dataset creation

444eb32

sharadsw and others added 6 commits March 4, 2025 14:17

Fix frontend to many tree error

9407376

Lint code with ESLint and Prettier

937e865

Triggered by 9407376 on branch refs/heads/issue-6127

Restrict to manys only for tree fields

ebd827f

Merge remote-tracking branch 'origin/issue-6127' into issue-6127

f666936

Merge remote-tracking branch 'origin/issue-5413' into issue-6127

246d394

Merge branch 'issue-5413' into issue-6127

5e7d844

Base automatically changed from issue-5413 to production March 10, 2025 16:24

lexiclevenger reviewed Mar 10, 2025

View reviewed changes

lexiclevenger mentioned this pull request Mar 11, 2025

Batch Edit: Some tree rank relationship fields throw errors in the QB #6315

Open

sharadsw added 2 commits March 11, 2025 17:54

Fix multiple rank in row error

bf153f9

Merge remote-tracking branch 'origin/issue-6127' into issue-6127

1550bb8

sharadsw modified the milestones: 7.10.2, 7.10.3 Mar 11, 2025

sharadsw added 3 commits March 12, 2025 11:45

Fix navigator

ae42fdd

Fix tests

2835687

Group missing ranks by tree

7f2149d

sharadsw added 3 commits March 12, 2025 20:56

Lint code with ESLint and Prettier

9c46e15

Triggered by 7f2149d on branch refs/heads/issue-6127

Pass filtered treedef ids to the backend

b1173ff

- Adds checkboxes to tree names in missing ranks dialog - Splits the main batch edit file into 4 smaller files

Merge remote-tracking branch 'origin/issue-6127' into issue-6127

1c29ec9

sharadsw added 3 commits March 13, 2025 19:55

Lint code with ESLint and Prettier

d09d325

Triggered by 1c29ec9 on branch refs/heads/issue-6127

Filter trees used when rewriting batch edit dataset

e8d138e

Fix tests

1589ed8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch edit for multiple trees #6196

Batch edit for multiple trees #6196

sharadsw commented Feb 5, 2025 •

edited

Loading

sharadsw commented Feb 7, 2025 •

edited

Loading

realVinayak left a comment •

edited

Loading

realVinayak commented Feb 9, 2025 •

edited

Loading

beach53 commented Feb 9, 2025 via email

realVinayak commented Feb 10, 2025

sharadsw commented Feb 18, 2025

realVinayak commented Feb 19, 2025

sharadsw commented Feb 19, 2025

realVinayak commented Feb 19, 2025 •

edited

Loading

sharadsw commented Feb 19, 2025 •

edited

Loading

realVinayak commented Feb 19, 2025

realVinayak commented Feb 19, 2025 •

edited

Loading

sharadsw commented Feb 20, 2025 •

edited

Loading

sharadsw commented Feb 20, 2025

realVinayak commented Feb 24, 2025 •

edited

Loading

sharadsw commented Feb 24, 2025

realVinayak commented Feb 24, 2025 •

edited

Loading

lexiclevenger left a comment

emenslin commented Mar 10, 2025

sharadsw commented Mar 10, 2025

sharadsw commented Mar 10, 2025

emenslin commented Mar 11, 2025

sharadsw commented Mar 11, 2025 •

edited

Loading

grantfitzsimmons commented Mar 12, 2025

sharadsw commented Mar 12, 2025 •

edited

Loading

sharadsw commented Mar 13, 2025

Batch edit for multiple trees #6196

Are you sure you want to change the base?

Batch edit for multiple trees #6196

Conversation

sharadsw commented Feb 5, 2025 • edited Loading

Batch editing with tree as base table

Checklist

Testing instructions

Test #6011

Test batch edit with a single tree in db

Test batch edit with multiple trees

sharadsw commented Feb 7, 2025 • edited Loading

realVinayak left a comment • edited Loading

Choose a reason for hiding this comment

realVinayak commented Feb 9, 2025 • edited Loading

First way:

Second way:

beach53 commented Feb 9, 2025 via email

realVinayak commented Feb 10, 2025

sharadsw commented Feb 18, 2025

realVinayak commented Feb 19, 2025

sharadsw commented Feb 19, 2025

realVinayak commented Feb 19, 2025 • edited Loading

sharadsw commented Feb 19, 2025 • edited Loading

realVinayak commented Feb 19, 2025

realVinayak commented Feb 19, 2025 • edited Loading

sharadsw commented Feb 20, 2025 • edited Loading

sharadsw commented Feb 20, 2025

realVinayak commented Feb 24, 2025 • edited Loading

sharadsw commented Feb 24, 2025

realVinayak commented Feb 24, 2025 • edited Loading

lexiclevenger left a comment

Choose a reason for hiding this comment

emenslin commented Mar 10, 2025

sharadsw commented Mar 10, 2025

sharadsw commented Mar 10, 2025

emenslin commented Mar 11, 2025

sharadsw commented Mar 11, 2025 • edited Loading

grantfitzsimmons commented Mar 12, 2025

sharadsw commented Mar 12, 2025 • edited Loading

sharadsw commented Mar 13, 2025

sharadsw commented Feb 5, 2025 •

edited

Loading

sharadsw commented Feb 7, 2025 •

edited

Loading

realVinayak left a comment •

edited

Loading

realVinayak commented Feb 9, 2025 •

edited

Loading

realVinayak commented Feb 19, 2025 •

edited

Loading

sharadsw commented Feb 19, 2025 •

edited

Loading

realVinayak commented Feb 19, 2025 •

edited

Loading

sharadsw commented Feb 20, 2025 •

edited

Loading

realVinayak commented Feb 24, 2025 •

edited

Loading

realVinayak commented Feb 24, 2025 •

edited

Loading

sharadsw commented Mar 11, 2025 •

edited

Loading

sharadsw commented Mar 12, 2025 •

edited

Loading