Signature update services may not expose new signatures for workers immediately #232

kam193 · 2024-06-04T21:13:20Z

Describe the bug
I've observed it originally when testing my service, but I was also able to reproduce it with the YARA service. When updating signatures from a source, the updater service (base from AL library) sends new data to Elasticsearch (do_source_update), and then notifies another thread to get a new signature package from Elasticsearch (do_local_update), and then finally serves it to worker services.

When uploading data to Elastic, the updater through signature client makes a synchronous call, but does not request to wait for shreds to be refreshed. By default, Elastic finishes the bulk request independently of the refreshing. Effectively, if the Elastic isn't quick enough or the updater slow enough, the new signatures are not visible yet when the updater asks for a new signature package. Effectively, new updates are not downloaded back to the updater and are not exposed to the workers until another update of local files (worst case: on next scheduled update, e.g. next day).

To Reproduce
Steps to reproduce the behavior:

Set up YARA service with an update source you control, for example with a one signature.
Start the updater, wait until the data is downloaded. Check your data in the signature viewer and the signature in the updater file in the container.
Change your source data, e.g. by editing the signature. Trigger update (with source in GitHub, I needed to wait a little until they refreshed caches).
See update happening in logs (e.q. Imported 1/1 signatures from example into Assemblyline), but also a log No signature updates available. shortly after.
Check data in the signature viewer - should be the newest version - and compare with data in the signature package in the service - should still have the older version (!).
Trigger update once more, observe refreshed signature package.

Expected behavior
After successful update from source, data are immediately or in a short time available to download by workers.

The bulk API from Elastic exposes a parameter refresh requesting Elastic to e.g. wait for the refresh. By default, it does not wait. I did tests by hardcording wait_for value in the bulk() method in datastore/collection.py from assemblyline_base repo, and it fixed the problem. However, I don't know if it was intentionally not set or it has any other side effects somewhere else.

Screenshots

Environment (please complete the following information if pertinent):

Assemblyline Version: 4.5.0.29, Yara 4.5.10
Elasticsearch still v7

Additional context
During the debugging, I've confirmed that the Elasticsearch is just returning old last modified timestamp, it's not the issue in the service itself. This is a race condition, and I'm aware it may be harder to spot with more sources or different Elastic configuration. I believe it should generally be consistent, and if it's not just that the Elasticsearch in my configuration is slow, the impact may sometimes be rather big (e.g. Yara rules used a day later than expected).

The text was updated successfully, but these errors were encountered:

cccs-rs · 2025-02-10T23:32:16Z

I think in this case, we could have the Signature Updaters perform a datastore.signature.commit() or do a refresh=wait_for to ensure the data that is pushed is available to the other thread before the signal is sent.

Ensure data is committed to Elasticsearch before returning response (dev)

Ensure data is committed to Elasticsearch before returning response

cccs-rs · 2025-02-21T19:57:46Z

This should be resolved in services built under the 4.5.0.76 release

kam193 added assess We still haven't decided if this will be worked on or not bug Something isn't working labels Jun 4, 2024

gdesmar assigned cccs-rs Jun 10, 2024

cccs-rs added core accepted This issue was accepted, we will work on this at some point and removed assess We still haven't decided if this will be worked on or not labels Dec 2, 2024

cccs-rs mentioned this issue Feb 9, 2025

Image can loose the "is_section_image" status #312

Closed

cccs-rs added service-base and removed core labels Feb 10, 2025

cccs-rs mentioned this issue Feb 11, 2025

Ensure data is committed to Elasticsearch before returning response CybercentreCanada/assemblyline-v4-service#800

Merged

cccs-rs added the pending review label Feb 11, 2025

cccs-rs added a commit to CybercentreCanada/assemblyline-v4-service that referenced this issue Feb 13, 2025

Merge pull request #801 from CybercentreCanada/assemblyline/issues/232

05e40fa

Ensure data is committed to Elasticsearch before returning response (dev)

cccs-rs added a commit to CybercentreCanada/assemblyline-v4-service that referenced this issue Feb 21, 2025

Merge pull request #800 from CybercentreCanada/assemblyline/issues/232

0544fd5

Ensure data is committed to Elasticsearch before returning response

cccs-rs closed this as completed Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signature update services may not expose new signatures for workers immediately #232

Signature update services may not expose new signatures for workers immediately #232

kam193 commented Jun 4, 2024 •

edited

Loading

cccs-rs commented Feb 10, 2025 •

edited

Loading

cccs-rs commented Feb 21, 2025

Signature update services may not expose new signatures for workers immediately #232

Signature update services may not expose new signatures for workers immediately #232

Comments

kam193 commented Jun 4, 2024 • edited Loading

cccs-rs commented Feb 10, 2025 • edited Loading

cccs-rs commented Feb 21, 2025

kam193 commented Jun 4, 2024 •

edited

Loading

cccs-rs commented Feb 10, 2025 •

edited

Loading