-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image can loose the "is_section_image" status #312
Comments
I know I've seen this issue before in production (which I thought was patched) and I think it's somewhat related to: #232 Upon some cursory research on the subject, it is possible that at scale the document's state isn't consistent across workloads based on the information that it gets from Elasticsearch (especially if a lot of operations are happening to the document at a given point in time): https://stackoverflow.com/questions/25030726/elasticsearch-does-not-return-existing-document-sometimes We could set the refresh settings to be much shorter or force a refresh upon index ( There are suggestions to use upserts or Upon inspecting the index settings, I noticed that we don't set anything for the refresh interval: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#dynamic-index-settings So in theory even if the shard was idle, it should've performed a refresh because we fetch the document, if it exists, to compare the One thing we could do at the API-level is perform a search across the Result index to see if the file in question has been identified as being a section image (on the condition that the file document indicates that it isn't a section image) and patch the document. ie. if not file_obj['type'].startswith("image/"):
return make_api_response({}, "This file is not allowed to be downloaded as a datastream.", 403)
else:
if not file_obj.get('is_section_image', False):
# Are we sure this file isn't section image?
if STORAGE.result.search(
query=f"response.supplementary.sha256:{sha256} AND response.supplementary.is_section_image:true",
rows=0, track_total_hits=True,
as_obj=False)['total']:
# Patch file object with the correct information
file_obj["is_section_image"] = True
STORAGE.file.save(sha256, file_obj)
# If the file has been confirmed to not be a section image at this time, block download
if not file_obj.get('is_section_image', False):
return make_api_response({}, "This file is not allowed to be downloaded as a datastream.", 403) To mind, I don't think there would be an service that would drop an image as supplementary file and for that file to not be a section image in the same result record. For example, if I wanted to check if {
"response": {
"supplementary": [
{"sha256": "hash1", "is_image_section": true},
{"sha256": "hash2", "is_image_section": false},
]
}
} In which case, there would have to be more inspection involved for confirmation: if not file_obj['type'].startswith("image/"):
return make_api_response({}, "This file is not allowed to be downloaded as a datastream.", 403)
else:
if not file_obj.get('is_section_image', False):
# Are we sure this file isn't section image?
for result in STORAGE.result.search(
query=f"response.supplementary.sha256:{sha256} AND response.supplementary.is_section_image:true",
fl="response.supplementary.sha256,response.supplementary.is_section_image",
as_obj=False)['items']:
# Confirm that the file mentioned is a section image by inspecting the results
for file in result["response"]["supplmentary"]:
if file["sha256"] == sha256 and file["is_section_image"] == True:
# Patch file object with the correct information
file_obj["is_section_image"] = True
STORAGE.file.save(sha256, file_obj)
break
if file_obj["is_section_image"] == True:
# File has been updated, no need to iterate over search results anymore
break
# If the file has been confirmed to not be a section image at this time, block download
if not file_obj.get('is_section_image', False):
return make_api_response({}, "This file is not allowed to be downloaded as a datastream.", 403) @cccs-douglass thoughts? |
URLDownloader is the service that comes to mind. It will add all images (non-svg) as supplementary files, but not to be used as a section image. |
We'll work on a patch that states once a file has been identified as a section image, it will remain a section image forever (or until it's been expired). |
This should be resolved in the 4.5.0.74 release along with services rebuilt under this release. |
Describe the bug
I'm struggling with an issue that I see quite often "broken" image instead of the image previews (in all services generating previews - including document previews, images seen in PE files etc.). It doesn't happen always, and I've noticed I see it also with images that (I'm almost sure) were previously rendered correctly.
I thought that images were just gone from the datastore or so, but I've noticed HTTP 403 errors: This file is not allowed to be downloaded as a datastream. This is generated only here: https://github.com/CybercentreCanada/assemblyline-ui/blob/167c2001833fbc1b9629c57d5db248b576f398ef/assemblyline_ui/api/v4/file.py#L499 if the image doesn't have the
is_section_image
flag.Digging into it, in the
save_or_freshen_file
method (https://github.com/CybercentreCanada/assemblyline-base/blob/be3bd33c3f2d22032c9e06aa0eafdbb1f2b56a48/assemblyline/datastore/helper.py#L1186), in the loop in L1208, most file attributes are updated to the latest state. If the newerfileinfo
definition hasis_section_image
set tofalse
, it will overwrite the earlier state.I've found so far one (veeery rare) edge case when it happens: if you upload the generated preview picture for the analysis. I don't believe it's the issue I see, but it proves that something can happen.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Old preview stays.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment (please complete the following information if pertinent):
Additional context
This is really an edge case. I'm still looking into why it happens at scale, I have some suspicious that maybe e.g. expiration process somehow overwrites the flag, but I haven't found any real way why yet.
The text was updated successfully, but these errors were encountered: