-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_key_migration_functions django command #6308
base: production
Are you sure you want to change the base?
Conversation
Working on getting the permission operations logged in the autolog table. |
Triggered by ec58255 on branch refs/heads/issue-6266
Triggered by fd1f76e on branch refs/heads/issue-6266
@specify/dev-testing testing can be performed. @grantfitzsimmons can you use your database from sp6 for testing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I look into it, the scope of this issue is really big.
It fixes 5 large PRs and combines all of the logic into a single Django command. This updates a huge number of critical migrations.
I really think we need to test this thoroughly with a number of databases in various states:
- On 7.10 database with all migrations
- Unchanged
- With a new division, discipline, and collection
- With only a new discipline & collection
- With only a new collection
- On 7.9.6.2 database with all migrations
- Unchanged
- With a new division, discipline, and collection
- With only a new discipline & collection
- With only a new collection
- On old databases with older migrations (>7.8)
- On a database freshly created with Specify 6
- Unchanged
- With a new division, discipline, and collection
- With only a new discipline & collection
- With only a new collection
Also, we'll want to prioritize databases that have 5-10+ disciplines already to make sure duplicate records are not created. It is not immediately obvious if we duplicate splocale*
records or if we reassign permissions, especially if we only test using the admin account. We need to create new accounts in 7 and 6 to verify the behavior is consistent/desirable.
We'll even need to:
- Delete the default COT for a collection and verify it is recreated upon restart
- Remove all COT assignments for COs and verify they are set correctly
- Make sure permissions are not re-assigned based on Specify 6 permissions when the container is restarted if they have been changed in 7
- Create a new user in Specify 7, assign a Specify 6 user group, and save. Verify that permissions are not set based on that user group.
- Verify that Specify 7 permissions are not overridden by the migration
- Create a new user in Specify 6, assign a set of permissions, and save. Verify that permissions are created automatically for that user.
In short, we need really comprehensive testing instructions for this. There may be some things mentioned above that are not handled in this PR, but in any case, we need to understand what the ramifications of these changes are and have detailed instructions about what to look out for.
Summary
- We need to make sure, above all else, that permissions are not being reassigned or granted when that is undesirable.
- User permissions need to be set so they can work in the database.
- We need to make sure duplicate records are not introduced (in the
splocale*
tables, COTs, uniqueness rules, etc.). - We need to make sure that migrations work correctly in a number of environments, with databases of varying complexity (both databases with 1 div, 1 disc, and 1 coll and databases with dozens of disciplines, collections, etc. and custom configurations, various states of being updated)
- We need to make sure that new collections and disciplines added in Specify 6 have the appropriate defaults set for them
- When a collection is missing a default COT, this needs to be set to the default tree for the discipline.
- This needs to be documented behavior– for instance, if we are adding a new COT for each taxon tree in a discipline, that needs to be documented. This won't occur until the container restarts, introducing a discrepancy between the process in the UI when the user creates a new tree and the automatic actions that take place when the container starts.
I added the Django command to the docker-entypoint file. So, the Django command will now run when starting up the instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema config
- Open up the schema config page.
- Check that the following schema config table pages load without error, and the field captions and descriptions start with a capital letter:
- CollectionObjectType
- CollectionObjectGroupType
- CollectionObjectGroup
- CollectionObjectGroupJoin
- SpUserExternalId
- SpAttachmentDataSet
- UniquenessRule
- UniquenessRuleField
- Message
- SpMerging
- UserPolicy
- UserRole
- Role
- RolePolicy
- LibraryRole
- LibraryRolePolicy
- SpDataSet
- AbsoluteAge
- RelativeAge
- TectonicUnitTreeDef
- TectonicUnitTreeDefItem
- TectonicUnit
- RelativeAgeCitation
- RelativeAgeAttachment
- AbsoluteAgeCitation
- AbsoluteAgeAttachment
- Collectionobject
- Collection
- Geographytreedef
- Geologictimeperiodtreedef
- Lithostrattreedef
- Storage
- Verify that the fields described in this file appear correctly in the schema config field caption and description https://github.com/specify/specify7/blob/production/specifyweb/specify/migration_utils/sp7_schemaconfig.py
Only tested the Schema Config so far. I tested on the InvertPaleo collection on the sp6_new_div database. Based on the table provided, here are the issues I found.
- AbsoluteAges, RelativeAges, and cojo are duplicated in the Schema Config CO table.
- CollectionObjectGroup table has cojo field instead of parentCOG
- CollectionObjectGroup has duplicates of nearly all fields
- CollectionObjectGroup guid description is "GUID"
- CO and CollectionObjectGroup cojo description is "This connects a Collection Object Group to its parent Collection Object Group, which is used for managing a hierarchy."
- CollectionObjectGroup cogType caption is "Type"
- CollectionObjectGroup cogType description is "Cog Type"
- CollectionObjectGroup has no isPrimary or isSubstrate
- CollectionObjectGroup igsn description is "An International Generic Sample Number (IGSN) provides an unambiguous globally unique and persistent identifier for physical samples."
- CollectionObjectGroup and AbsoluteAge yesNo fields have different capitalization patterns, but this is also present in production.
- UniquenessRuleField description is "Stores field names in the data model that have uniqueness rules configured for each discipline, linked to UniquenessRule records."
- CollectionObjectType has several duplicate fields
- CollectionObjectType has no cogTypeID field
- CollectionObjectType taxonTreeDef description is "CollectionObjectType."
- TectonicUnit guid description is "GUID"
- LibraryRole description is "Stores names and descriptions of default roles that can be added to any collection."
- LibraryRolePolicy description is "Stores resource and action permissions for library roles within a collection."
- Role description is "Stores names, descriptions, and collection information for user-created roles."
- RolePolicy description is "Stores resource and action permissions for user-created roles within a collection."
- Storage has no uniqueIdentifier field
- Collectionobjectgroupjoin childCO description is "Child Co"
- Collectionobjectgroupjoin childCOG description is "Child Cog"
- Collectionobjectgroupjoin parentCOG description is "Parent Cog"
- Collectionobjectgroupjoin isPrimary description is "Is Primary"
- Collectionobjectgroupjoin isSubstrate description is "Is Substrate"
Here is a link to the updated caption and description in schema config https://docs.google.com/spreadsheets/d/1mOXnCpCrwc2X-Sl2MeMmiN7dihCVpQVpQ9n5LiD_cIY/edit?gid=2096956804#gid=2096956804 |
NOTES:
|
Found an issue where duplicate schema config fields were being creating in the update_hidden_prop function from the 0023 migration. Pushed a fix. Looking into creating some other fixes as well. |
Previously, CO rules containing catalogNumber as a field and collection as a scope were being considered as candidates. e.g., CO catalogNumber and text1 must be unique in collection would be a candidate and returned. We're only interested in the exact rule: CollectionObject catalogNumber must be unique in Collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wowieee what a big PR! A big PR for a big problem I suppose!
I haven't been through all of the changes yet (I've looked through the business rule and permission related functionality), but I figured I'd leave the review so discussion and feedback on the parts that have comments can commence.
I did push some changes, most of which have only been code-cleanup and refactoring (but there were some big fixes as well).
While some of the changes have been addressed in the comments of this review a complete overview of the commits I have pushed can be found at: https://github.com/specify/specify7/pull/6308/files/5a1875d8c839240220726d28072a776c02e693fd..44fd03e186ade8b759f2b61c761fa7b6b7da3c93
I'm really looking forward to seeing how this PR turns out!
Might not seem like it now, but I imagine it's going to save a lot of hours and headaches for us and eventually users 🏆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes in ecbb2df to both:
- move the
catnum_rule_editable
andcatnum_rule_uneditable
to a migration related file instead of a uniquenessrule related file- Just personal preference here. Specifically, these should only ever be called in the context of migrations. I would argue that these functions are more migration-related than they are uniquenessrule-related.
- Let me know what you think of the move!
- fix a potential bug where rules only containing catalogNumber as a field and collection as scope would be considered as candidate rules. More details in the commit description!
collection=None, | ||
specifyuser_id=user.id, | ||
resource="%", | ||
action="%", | ||
) | ||
if is_new: | ||
auditlog.insert(user_policy, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 11498e2
(#6308) I went through the usages of the audit log in the permission migration and removed the usage of the Specifyusers' (user
) as the second argument in the auditlog calls.
This argument is supposed to represent the Agent the auditing changes are being attributed to (specifically the createdByAgent and modifiedByAgent of the Spauditlog records).
We could instead follow something familiar to the changes prior to 11498e2 and attribute the changes to the Agent (if applicable) of the User whose permission we just modified/created.
Personally, I think it makes the most sense to not attribute these migration changes to any Agent. An Agent did not explicitly make the changes, and I think it's intuitive that if a change has a NULL Agent it could have been done by the system.
Although, I am not too familiar with the use case defined in #6265, so whichever approach would be the most beneficial!
For reference, here is the "sink" of the auditing functions:
specify7/specifyweb/specify/auditlog.py
Line 98 in fe8397d
agent_id = agent if isinstance(agent, int) else (agent and agent.id) |
specify7/specifyweb/specify/auditlog.py
Lines 123 to 124 in fe8397d
createdbyagent_id=agent_id, | |
modifiedbyagent_id=agent_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning on adding more auditing to the permission migrations?
I guess: how much auditing coverage do we need to meet the requirements of #6265?
I know the Issue states:
The command must log all changes made during the migration process.
and if this is the case, then we'll need to also log every SpUserRole, LibraryRole, and all of their policies which get deleted (if we do want to wipe all permission) and created.
This sounds like it would create relatively a LOT of audit log entries.
Currently the system to automatically clean the AuditLog is fragile and dependent on the user having a AUDIT_LIFESPAN_MONTHS
global preference: if one is not defined, then the AuditLog is never cleaned.
specify7/specifyweb/specify/auditlog.py
Line 144 in fe8397d
match = re.search(r'AUDIT_LIFESPAN_MONTHS=(.+)', get_global_prefs()) |
Space might not be a concern for some institutions, just as long as we're considerate of setups with potentially limited storage (it could be us one day! 😅)
f"Field does not exist in latest state of the datamodel, skipping Schema Config entry for: {table_name} -> {field_name}" | ||
) | ||
return | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note)
Just curious, and I can check if you don't recall, but do you remember which fields were giving you an AttributeError error here?
I assume the error was caused because some field
in Table.all_fields
or Table.virtual_fields
was None?
specify7/specifyweb/specify/load_datamodel.py
Lines 171 to 181 in fe8397d
def get_field_strict(self, fieldname: str) -> Union["Field", "Relationship"]: | |
fieldname = fieldname.lower() | |
for field in self.all_fields: | |
if field.name.lower() == fieldname: | |
return field | |
for field in self.virtual_fields: | |
if field.name.lower() == fieldname: | |
return field | |
# if self.table == 'collectionobject' and fieldname == 'age': # TODO: This is temporary for testing, more conprehensive solution to come. | |
# return Field(name='age', column='age', indexed=False, unique=False, required=False, type='java.lang.Integer', length=0) | |
raise FieldDoesNotExistError(_("Field %(field_name)s not in table %(table_name)s. ") % {'field_name':fieldname, 'table_name':self.name} + |
specifyweb/specify/management/commands/run_key_migration_functions.py
Outdated
Show resolved
Hide resolved
def fix_permissions(): | ||
initialize(True, apps) | ||
add_permission(apps) | ||
add_stats_edit_permission(apps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the initialize
function, I don't think we want to wipe all Specify 7 permission related information.
specify7/specifyweb/permissions/initialize.py
Lines 10 to 23 in fe8397d
def wipe_permissions(apps = apps) -> None: | |
RolePolicy = apps.get_model('permissions', 'RolePolicy') | |
UserRole = apps.get_model('permissions', 'UserRole') | |
Role = apps.get_model('permissions', 'Role') | |
LibraryRolePolicy = apps.get_model('permissions', 'LibraryRolePolicy') | |
LibraryRole = apps.get_model('permissions', 'LibraryRole') | |
UserPolicy = apps.get_model('permissions', 'UserPolicy') | |
RolePolicy.objects.all().delete() | |
UserRole.objects.all().delete() | |
Role.objects.all().delete() | |
LibraryRolePolicy.objects.all().delete() | |
LibraryRole.objects.all().delete() | |
UserPolicy.objects.all().delete() |
The wipe is completely global, and all Specify 7 related permission information will be removed (and then reset to defaults) for every Division, Discipline, Collection, etc.
And in a similar vein to the above comment about uniqueness rules, (especially once the wipe is removed from the initialize
function) a lot of the operations in these functions will create duplicates for already existing permission information.
Fixes #6266
Fixes #6265
Fixes #6264
Fixes #6263
Fixes #6298
Creates a Django command to re-run the key functions from the Django migration process associated with the v7.10 release.
Checklist
self-explanatory (or properly documented)
Pre-Testing Instructions
sp6_new_div
and rundocker exec -it specify7-specify7-1 ve/bin/python manage.py run_key_migration_functions
docker exec -it specify7-mariadb-1 mysql -uroot -proot -e "drop database sp6_new_div; create database sp6_new_div;"; docker exec -it specify7-mariadb-1 sh -c 'mysql -uroot -proot sp6_new_div < /docker-entrypoint-initdb.d/sp6_new_div.sql';
I added
sp6_new_div
to the test-panel.sp6_new_div
after it has been fixed by the Django command.For testing context, here are related PRs regarding migration functions:
specify schema config migration step functions involved:
SpLocaleContainerItem
migration for collectionObjectType #5191isPrimary
andisSubstrate
#5275 Add cojo to Schema Config #5388business rule migration step functions involved:
permissions migration step functions involved:
Testing Instructions
Schema config
Business Rules
Permissions
Batch Attachment Import
role correctly assigns that permissions.Batch Attachment Import
role, and make sure you can upload attachments