Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datamodel defaults not being respected in Workbench Tree Matching #6319

Open
melton-jason opened this issue Mar 13, 2025 · 3 comments
Open
Labels
2 - WorkBench Issues that are related to the WorkBench

Comments

@melton-jason
Copy link
Contributor

melton-jason commented Mar 13, 2025

Describe the bug
If a tree field which has a default value defined in the datamodel (like isAccepted or isHybrid) is explicitly included in a WorkBench Data Set and does not contain any value for a row (i.e., is blank/null), Specify will still use a NULL value for searching and matching purposes.

For a concrete example, consider three columns for any Tree rank in a Data Set: Genus -> name, Species -> name, and Species -> isHybrid, where Species -> isHybrid has the default matching behavior (Never Ignore, Allow Null Values, and Don't use a Default Value) :

Genus Name Species Name Species isHybrid
TestGenus TestSpecies
TestGenus TestSpecies

In a simplified explanation that demonstrates the Issue, on the first row Specify will search for any existing Genus records with the name TestGenus. If one exists, then Specify will match to that record and otherwise creating one. Specify will also search for a Species record which has the name TestSpecies, has the TestGenus parent from the previous step, and has an empty (NULL) isHybrid- matching to the record if it exists or creating it otherwise (this will always result in creating a new node, as there can not be a Taxon record without an isHybrid value).
When Specify creates the TestSpecies node, it "passes through" the datamodel and sees that isHybrid has a default value defined: false; Specify replaces the empty isHybrid value with false before sending it to the database.

The process the repeats for the second row, with the exception in behavior that Specify will always match the TestGenus to an existing Taxon record (it matches to the TestGenus of the previous row if it was created).
Specify will not match the TestSpecies because it searches for Taxon records with an empty isHybrid.

duplicate_ranks.mov

To Reproduce
Steps to reproduce the behavior:

  1. Create a Data Set which minimally contains two columns (there can be other columns), each mapping to a specific Tree's rank (i.e., both columns map to one of Species, Genus, Family, etc.): one mapping to an identifying/text field (like Name, Author, etc.), and the other mapping to isHybrid or isAccepted
  2. Populate the Data Set with data
    a. Ensure all rows have identical information (same name, author, etc.) for the Tree record mapped with the isHybrid/isAccepted columns
    b. Leave the isHybrid or isAccepted columns blank
  3. Validate and/or upload the Data Set and observe that rows which should have matched with the Tree record instead are created as new, duplicate records

Expected behavior
Specify should do one or more of the following:

  • For matching purposes, if a field with a default has a NULL value, replace it with the field's default value
  • For fields with a default value, automatically have the Use Default Value option checked and filled with the field's default value

Current Workarounds

  • Use the Ignore When Blank matching behavior for columns mapped to fields with default values
  • Use the Use Default Value option and explicitly define a default for the column
  • Modify the data in the Data Set such that all rows have data in the isHybrid and isAccepted columns

Please fill out the following information manually:

  • OS: macOS Sonoma (14.3)
  • Browser: Google Chrome 134.0.6998.45 (Official Build) (arm64)
  • Specify 7 Version: Reproduced in v7.9.6.2 and v7.10.0

Reported By
Fedor Steeman from the Natural History Museum of Denmark via the Speciforum: https://discourse.specifysoftware.org/t/workbench-wont-match-with-known-species/2398

@melton-jason melton-jason added the 2 - WorkBench Issues that are related to the WorkBench label Mar 13, 2025
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Back-End Backlog Mar 13, 2025
@melton-jason
Copy link
Contributor Author

For developer reference, the construction of the query for matching of tree records happened in the below lines:
(which is a snippet of v7.9.6.2, but the code seems similar in production)

matches = list(model.objects.filter(
definitionitem_id=to_match.treedefitem.id,
**filters,
**({'__'.join(["parent_id"]*(d+1)): parent['id']} if parent is not None else {})
).values('id', 'name', 'definitionitem__name', 'definitionitem__rankid')[:10])

At a low level of abstraction/cause, the direct cause of the Issue is largely in the construction of the filters dictionary (which matches the field to the "incorrect" NULL value):

filters = {field: value for r in to_match.results for field, value in r.filter_on.items()}

I'm not familiar enough with the recent upcoming changes to say whether the WorkBench changes for BatchEdit will resolve this Issue, or if it needs to be implemented.

@specifysoftware
Copy link

This issue has been mentioned on Specify Community Forum. There might be relevant details there:

https://discourse.specifysoftware.org/t/workbench-wont-match-with-known-species/2398/5

@grantfitzsimmons grantfitzsimmons added regression This is behavior that once worked that has broken. Must be resolved before the next release. and removed regression This is behavior that once worked that has broken. Must be resolved before the next release. labels Mar 13, 2025
@sharadsw
Copy link
Contributor

Potentially fixed with batch edit changes already but needs more testing

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - WorkBench Issues that are related to the WorkBench
Projects
Status: 📋 Backlog
Development

No branches or pull requests

4 participants