Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_key_migration_functions django command #6308

Open
wants to merge 29 commits into
base: production
Choose a base branch
from
Open

Conversation

acwhite211
Copy link
Member

@acwhite211 acwhite211 commented Mar 7, 2025

Fixes #6266
Fixes #6265
Fixes #6264
Fixes #6263
Fixes #6298

Creates a Django command to re-run the key functions from the Django migration process associated with the v7.10 release.

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone

Pre-Testing Instructions

  • Download the database from the test panel, set it up in your local docker db, restart docker, then run the django command, then test out Specify
  • Load up Specify 7 with this database sp6_new_div and run docker exec -it specify7-specify7-1 ve/bin/python manage.py run_key_migration_functions
  • To retry, reload the database before trying the command again by running docker exec -it specify7-mariadb-1 mysql -uroot -proot -e "drop database sp6_new_div; create database sp6_new_div;"; docker exec -it specify7-mariadb-1 sh -c 'mysql -uroot -proot sp6_new_div < /docker-entrypoint-initdb.d/sp6_new_div.sql';
  • To do a full test on a new database, you'll need to take a database that has gone through all the Django migration steps, then open that DB in Specify 6, then create a new collection or division or discipline, then open up the database again in Specify 7, then run the Django command, then test.

I added sp6_new_div to the test-panel.

  • For UX-testing, please perform general testing, with a focus on geo features. Use the sp6_new_div after it has been fixed by the Django command.
  • Make sure each discipline and taxon tree has an associated cog type.
  • Check the correctness of the schema-config for collection, division, discipline, cot, and cog. (more details in testing instructions)
  • Check that permissions for each of the users works as intended.

For testing context, here are related PRs regarding migration functions:

specify schema config migration step functions involved:

business rule migration step functions involved:

permissions migration step functions involved:

Testing Instructions

Schema config

  • Open up the schema config page.
  • Check that the following schema config table pages load without error, and the field captions and descriptions start with a capital letter:
    • CollectionObjectType
    • CollectionObjectGroupType
    • CollectionObjectGroup
    • CollectionObjectGroupJoin
    • SpUserExternalId
    • SpAttachmentDataSet
    • UniquenessRule
    • UniquenessRuleField
    • Message
    • SpMerging
    • UserPolicy
    • UserRole
    • Role
    • RolePolicy
    • LibraryRole
    • LibraryRolePolicy
    • SpDataSet
    • AbsoluteAge
    • RelativeAge
    • TectonicUnitTreeDef
    • TectonicUnitTreeDefItem
    • TectonicUnit
    • RelativeAgeCitation
    • RelativeAgeAttachment
    • AbsoluteAgeCitation
    • AbsoluteAgeAttachment
    • Collectionobject
    • Collection
    • Geographytreedef
    • Geologictimeperiodtreedef
    • Lithostrattreedef
    • Storage
  • Verify that the fields described in this file appear correctly in the schema config field caption and description https://github.com/specify/specify7/blob/production/specifyweb/specify/migration_utils/sp7_schemaconfig.py
Table Name Field Name Type Value
SpLocalContainer table Description Changes relating to local container tables in migrations
SpUserExternalId table Description Stores provider identifiers and tokens for users who sign in using Single Sign On (SSO).
CollectionObject relativeAges field relativeAges
CollectionObject absoluteAges field absoluteAges
CollectionObject cojo field cojo
CollectionObjectGroup children field children
CollectionObjectGroup parentCog field parentCog
PaleoContext tectonicUnit field tectonicUnit
CollectionObjectGroup guid field GUID for Collection Object Group
CollectionObjectGroup cojo field Parent COG; connects a Collection Object Group to its parent Collection Object Group for managing hierarchy
CollectionObjectGroup cogType Caption Type
CollectionObjectGroup cogType Description Determines the logic Specify should use when managing the children within that COG
CollectionObjectGroup isPrimary field Designates the primary Collection Object within a Collection Object Group
CollectionObjectGroup isSubstrate field Indicates the Collection Object serving as the physical base for other items within the Collection Object Group
CollectionObjectGroup igsn field International Generic Sample Number (IGSN) for unique identification of physical samples
CollectionObjectGroup yesno2 field yesno2
CollectionObjectGroup yesno3 field yesno3
SpUserExternalId table Description Stores provider identifiers and tokens for users who sign in using Single Sign On (SSO).
UniquenessRule table Description Stores table names in the data model that have uniqueness rules configured for each discipline
UniquenessRuleField field Description Stores field names linked to UniquenessRule records for uniqueness rule configuration
AbsoluteAge yesno2 field yesno2
RelativeAge yesno2 field yesno2
CollectionObjectType cogTypeId field Collection Object Group Type ID
CollectionObjectType taxonTreeDef field Taxon Tree associated with this Collection Object Type
TectonicUnit guid field GUID for Tectonic Unit
TectonicUnit tectonicUnitId field Tectonic Unit ID
TectonicUnitTreeDefItem createdbyagent field Created By Agent
TectonicUnitTreeDefItem rankId field Rank ID
LibraryRole table Description Stores names and descriptions of default roles available to any collection
LibraryRolePolicy field Description Stores resource and action permissions for default roles applicable to collections
Role table Description Stores user-created role names, descriptions, and collection associations
RolePolicy field Description Permissions for resources and actions for user-created roles
CollectionObjectGroup collection field Collection
CollectionObjectType collectionObjectTypeId field Collection Object Type ID
StorageTreeDef institution field institution
Storage uniqueIdentifier field uniqueIdentifier
CollectionObjectGroupJoin childCo field Child Collection Object
CollectionObjectGroupJoin childCog field Child Collection Object Group
CollectionObjectGroupJoin ParentCog field Parent Collection Object Group
CollectionObjectGroupJoin isPrimary field Indicates the primary Collection Object in a Consolidated COG
CollectionObjectGroupJoin isSubstrate field Indicates substrate Collection Object within a COG
AbsoluteAgeCitation absoluteAgeCitationId field absoluteAgeCitationId
RelativeAgeCitation relativeAgeCitationId field relativeAgeCitationId
  • Repeat checking the schema config for each collection

Business Rules

  • Click on Trees
  • Verify Tectonic Unit is now an option (NOTE: If you click on Tectonic Unit you'll get a blank page. See: Make tectonic tree datamodel consistent with other trees #5318 (comment))
  • Go the SchemaConfig page for the CollectionObject table
  • Ensure the "CatalogNumber must be unique to Collection" Uniqueness Rule which used to be readonly is now editable
  • Modify or delete the Uniqueness Rule
  • Save a CollectionObject with a duplicate catalognumber in the collection

Permissions

  • Test that permissions for attachment dataset are enforced for batch image upload.
  • iI you don't have create permission, you cannot create it.
  • If you don't have upload permission, you cannot save it.
  • If you don't have upload permission, you cannot upload it.
  • If you don't have rollback permission, you cannot rollback it.
  • If you don't have delete permission, you cannot delete it
  • Test that assigning Batch Attachment Import role correctly assigns that permissions.
  • Use a specify user that doesn't have read / update permission for collection object. And doesn't have read / create / delete permission for collection object attachment. And doesn't have read / create / delete permission for attachment.
  • First test that you cannot use batch import to upload attachments (It should say permission error in progress column)
  • Assign Batch Attachment Import role, and make sure you can upload attachments

@acwhite211 acwhite211 added this to the 7.10.1 milestone Mar 7, 2025
@acwhite211 acwhite211 self-assigned this Mar 7, 2025
@acwhite211 acwhite211 changed the title run_key_migrations_functions django command run_key_migration_functions django command Mar 7, 2025
@acwhite211
Copy link
Member Author

Working on getting the permission operations logged in the autolog table.

@acwhite211 acwhite211 marked this pull request as ready for review March 10, 2025 14:10
Triggered by ec58255 on branch refs/heads/issue-6266
@grantfitzsimmons grantfitzsimmons modified the milestones: 7.10.1, 7.10.2 Mar 10, 2025
@acwhite211
Copy link
Member Author

@specify/dev-testing testing can be performed. @grantfitzsimmons can you use your database from sp6 for testing?

@acwhite211
Copy link
Member Author

acwhite211 commented Mar 10, 2025

Related PRs regarding migration functions

specify schema config migration step functions involved:

business rule migration step functions involved:

permissions migration step functions involved:

Copy link
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look into it, the scope of this issue is really big.

It fixes 5 large PRs and combines all of the logic into a single Django command. This updates a huge number of critical migrations.

I really think we need to test this thoroughly with a number of databases in various states:

  1. On 7.10 database with all migrations
    • Unchanged
    • With a new division, discipline, and collection
    • With only a new discipline & collection
    • With only a new collection
  2. On 7.9.6.2 database with all migrations
    • Unchanged
    • With a new division, discipline, and collection
    • With only a new discipline & collection
    • With only a new collection
  3. On old databases with older migrations (>7.8)
  4. On a database freshly created with Specify 6
    • Unchanged
    • With a new division, discipline, and collection
    • With only a new discipline & collection
    • With only a new collection

Also, we'll want to prioritize databases that have 5-10+ disciplines already to make sure duplicate records are not created. It is not immediately obvious if we duplicate splocale* records or if we reassign permissions, especially if we only test using the admin account. We need to create new accounts in 7 and 6 to verify the behavior is consistent/desirable.

We'll even need to:

  • Delete the default COT for a collection and verify it is recreated upon restart
  • Remove all COT assignments for COs and verify they are set correctly
  • Make sure permissions are not re-assigned based on Specify 6 permissions when the container is restarted if they have been changed in 7
    • Create a new user in Specify 7, assign a Specify 6 user group, and save. Verify that permissions are not set based on that user group.
    • Verify that Specify 7 permissions are not overridden by the migration
    • Create a new user in Specify 6, assign a set of permissions, and save. Verify that permissions are created automatically for that user.

In short, we need really comprehensive testing instructions for this. There may be some things mentioned above that are not handled in this PR, but in any case, we need to understand what the ramifications of these changes are and have detailed instructions about what to look out for.

Summary

  1. We need to make sure, above all else, that permissions are not being reassigned or granted when that is undesirable.
  2. User permissions need to be set so they can work in the database.
  3. We need to make sure duplicate records are not introduced (in the splocale* tables, COTs, uniqueness rules, etc.).
  4. We need to make sure that migrations work correctly in a number of environments, with databases of varying complexity (both databases with 1 div, 1 disc, and 1 coll and databases with dozens of disciplines, collections, etc. and custom configurations, various states of being updated)
  5. We need to make sure that new collections and disciplines added in Specify 6 have the appropriate defaults set for them
  6. When a collection is missing a default COT, this needs to be set to the default tree for the discipline.
  7. This needs to be documented behavior– for instance, if we are adding a new COT for each taxon tree in a discipline, that needs to be documented. This won't occur until the container restarts, introducing a discrepancy between the process in the UI when the user creates a new tree and the automatic actions that take place when the container starts.

@CarolineDenis CarolineDenis requested a review from a team March 11, 2025 14:18
@acwhite211
Copy link
Member Author

I added the Django command to the docker-entypoint file. So, the Django command will now run when starting up the instance.

Copy link
Collaborator

@lexiclevenger lexiclevenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema config

  • Open up the schema config page.
  • Check that the following schema config table pages load without error, and the field captions and descriptions start with a capital letter:
    • CollectionObjectType
    • CollectionObjectGroupType
    • CollectionObjectGroup
    • CollectionObjectGroupJoin
    • SpUserExternalId
    • SpAttachmentDataSet
    • UniquenessRule
    • UniquenessRuleField
    • Message
    • SpMerging
    • UserPolicy
    • UserRole
    • Role
    • RolePolicy
    • LibraryRole
    • LibraryRolePolicy
    • SpDataSet
    • AbsoluteAge
    • RelativeAge
    • TectonicUnitTreeDef
    • TectonicUnitTreeDefItem
    • TectonicUnit
    • RelativeAgeCitation
    • RelativeAgeAttachment
    • AbsoluteAgeCitation
    • AbsoluteAgeAttachment
    • Collectionobject
    • Collection
    • Geographytreedef
    • Geologictimeperiodtreedef
    • Lithostrattreedef
    • Storage
  • Verify that the fields described in this file appear correctly in the schema config field caption and description https://github.com/specify/specify7/blob/production/specifyweb/specify/migration_utils/sp7_schemaconfig.py

Only tested the Schema Config so far. I tested on the InvertPaleo collection on the sp6_new_div database. Based on the table provided, here are the issues I found.

  1. AbsoluteAges, RelativeAges, and cojo are duplicated in the Schema Config CO table.
  2. CollectionObjectGroup table has cojo field instead of parentCOG
  3. CollectionObjectGroup has duplicates of nearly all fields
  4. CollectionObjectGroup guid description is "GUID"
  5. CO and CollectionObjectGroup cojo description is "This connects a Collection Object Group to its parent Collection Object Group, which is used for managing a hierarchy."
  6. CollectionObjectGroup cogType caption is "Type"
  7. CollectionObjectGroup cogType description is "Cog Type"
  8. CollectionObjectGroup has no isPrimary or isSubstrate
  9. CollectionObjectGroup igsn description is "An International Generic Sample Number (IGSN) provides an unambiguous globally unique and persistent identifier for physical samples."
  10. CollectionObjectGroup and AbsoluteAge yesNo fields have different capitalization patterns, but this is also present in production.
  11. UniquenessRuleField description is "Stores field names in the data model that have uniqueness rules configured for each discipline, linked to UniquenessRule records."
  12. CollectionObjectType has several duplicate fields
  13. CollectionObjectType has no cogTypeID field
  14. CollectionObjectType taxonTreeDef description is "CollectionObjectType."
  15. TectonicUnit guid description is "GUID"
  16. LibraryRole description is "Stores names and descriptions of default roles that can be added to any collection."
  17. LibraryRolePolicy description is "Stores resource and action permissions for library roles within a collection."
  18. Role description is "Stores names, descriptions, and collection information for user-created roles."
  19. RolePolicy description is "Stores resource and action permissions for user-created roles within a collection."
  20. Storage has no uniqueIdentifier field
  21. Collectionobjectgroupjoin childCO description is "Child Co"
  22. Collectionobjectgroupjoin childCOG description is "Child Cog"
  23. Collectionobjectgroupjoin parentCOG description is "Parent Cog"
  24. Collectionobjectgroupjoin isPrimary description is "Is Primary"
  25. Collectionobjectgroupjoin isSubstrate description is "Is Substrate"

@lexiclevenger lexiclevenger requested a review from a team March 11, 2025 19:48
@CarolineDenis
Copy link
Contributor

Here is a link to the updated caption and description in schema config https://docs.google.com/spreadsheets/d/1mOXnCpCrwc2X-Sl2MeMmiN7dihCVpQVpQ9n5LiD_cIY/edit?gid=2096956804#gid=2096956804

@CarolineDenis
Copy link
Contributor

CarolineDenis commented Mar 11, 2025

NOTES:

  • Some tables have all their fields duplicated in schema config
  • Captions and descriptions are correct in the collections I opened
  • Some collections have the cat num uniqueness rule twice
  • Cannot verify COGType default from instructions
  • Cannot create new COGType
  • When creating a new COType, then using QB to search for a taxon tree there is no taxon tree returned. Same if directly query on TTD table in QB
  • In tectonic unit data entry form, cannot use the magnifying glass to open search for parent
  • Tectonic unit tree is a blank page but shouldn't as of: Add tectonic default ranks #5316
  • There is 0 form def in the app resources, is that expected?

@acwhite211
Copy link
Member Author

Found an issue where duplicate schema config fields were being creating in the update_hidden_prop function from the 0023 migration. Pushed a fix. Looking into creating some other fixes as well.

acwhite211 and others added 7 commits March 12, 2025 01:05
Previously, CO rules containing catalogNumber as a field and
collection as a scope were being considered as candidates.

e.g., CO catalogNumber and text1 must be unique in collection
would be a candidate and returned.

We're only interested in the exact rule: CollectionObject
catalogNumber must be unique in Collection
Copy link
Contributor

@melton-jason melton-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wowieee what a big PR! A big PR for a big problem I suppose!
I haven't been through all of the changes yet (I've looked through the business rule and permission related functionality), but I figured I'd leave the review so discussion and feedback on the parts that have comments can commence.

I did push some changes, most of which have only been code-cleanup and refactoring (but there were some big fixes as well).
While some of the changes have been addressed in the comments of this review a complete overview of the commits I have pushed can be found at: https://github.com/specify/specify7/pull/6308/files/5a1875d8c839240220726d28072a776c02e693fd..44fd03e186ade8b759f2b61c761fa7b6b7da3c93

I'm really looking forward to seeing how this PR turns out!
Might not seem like it now, but I imagine it's going to save a lot of hours and headaches for us and eventually users 🏆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes in ecbb2df to both:

  • move the catnum_rule_editable and catnum_rule_uneditable to a migration related file instead of a uniquenessrule related file
    • Just personal preference here. Specifically, these should only ever be called in the context of migrations. I would argue that these functions are more migration-related than they are uniquenessrule-related.
    • Let me know what you think of the move!
  • fix a potential bug where rules only containing catalogNumber as a field and collection as scope would be considered as candidate rules. More details in the commit description!

collection=None,
specifyuser_id=user.id,
resource="%",
action="%",
)
if is_new:
auditlog.insert(user_policy, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 11498e2 (#6308) I went through the usages of the audit log in the permission migration and removed the usage of the Specifyusers' (user) as the second argument in the auditlog calls.
This argument is supposed to represent the Agent the auditing changes are being attributed to (specifically the createdByAgent and modifiedByAgent of the Spauditlog records).

We could instead follow something familiar to the changes prior to 11498e2 and attribute the changes to the Agent (if applicable) of the User whose permission we just modified/created.

Personally, I think it makes the most sense to not attribute these migration changes to any Agent. An Agent did not explicitly make the changes, and I think it's intuitive that if a change has a NULL Agent it could have been done by the system.

Although, I am not too familiar with the use case defined in #6265, so whichever approach would be the most beneficial!

For reference, here is the "sink" of the auditing functions:

agent_id = agent if isinstance(agent, int) else (agent and agent.id)

createdbyagent_id=agent_id,
modifiedbyagent_id=agent_id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning on adding more auditing to the permission migrations?
I guess: how much auditing coverage do we need to meet the requirements of #6265?

I know the Issue states:

The command must log all changes made during the migration process.

and if this is the case, then we'll need to also log every SpUserRole, LibraryRole, and all of their policies which get deleted (if we do want to wipe all permission) and created.

This sounds like it would create relatively a LOT of audit log entries.
Currently the system to automatically clean the AuditLog is fragile and dependent on the user having a AUDIT_LIFESPAN_MONTHS global preference: if one is not defined, then the AuditLog is never cleaned.

match = re.search(r'AUDIT_LIFESPAN_MONTHS=(.+)', get_global_prefs())

Space might not be a concern for some institutions, just as long as we're considerate of setups with potentially limited storage (it could be us one day! 😅)

f"Field does not exist in latest state of the datamodel, skipping Schema Config entry for: {table_name} -> {field_name}"
)
return
except AttributeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note)

Just curious, and I can check if you don't recall, but do you remember which fields were giving you an AttributeError error here?
I assume the error was caused because some field in Table.all_fields or Table.virtual_fields was None?

def get_field_strict(self, fieldname: str) -> Union["Field", "Relationship"]:
fieldname = fieldname.lower()
for field in self.all_fields:
if field.name.lower() == fieldname:
return field
for field in self.virtual_fields:
if field.name.lower() == fieldname:
return field
# if self.table == 'collectionobject' and fieldname == 'age': # TODO: This is temporary for testing, more conprehensive solution to come.
# return Field(name='age', column='age', indexed=False, unique=False, required=False, type='java.lang.Integer', length=0)
raise FieldDoesNotExistError(_("Field %(field_name)s not in table %(table_name)s. ") % {'field_name':fieldname, 'table_name':self.name} +

Comment on lines 45 to 48
def fix_permissions():
initialize(True, apps)
add_permission(apps)
add_stats_edit_permission(apps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the initialize function, I don't think we want to wipe all Specify 7 permission related information.

def wipe_permissions(apps = apps) -> None:
RolePolicy = apps.get_model('permissions', 'RolePolicy')
UserRole = apps.get_model('permissions', 'UserRole')
Role = apps.get_model('permissions', 'Role')
LibraryRolePolicy = apps.get_model('permissions', 'LibraryRolePolicy')
LibraryRole = apps.get_model('permissions', 'LibraryRole')
UserPolicy = apps.get_model('permissions', 'UserPolicy')
RolePolicy.objects.all().delete()
UserRole.objects.all().delete()
Role.objects.all().delete()
LibraryRolePolicy.objects.all().delete()
LibraryRole.objects.all().delete()
UserPolicy.objects.all().delete()

The wipe is completely global, and all Specify 7 related permission information will be removed (and then reset to defaults) for every Division, Discipline, Collection, etc.


And in a similar vein to the above comment about uniqueness rules, (especially once the wipe is removed from the initialize function) a lot of the operations in these functions will create duplicates for already existing permission information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Dev Attention Needed
5 participants