Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mandiant] fix: infinitely crashes if the state is empty #3018

Merged
merged 5 commits into from
Nov 27, 2024

Conversation

flavienSindou
Copy link
Contributor

@flavienSindou flavienSindou commented Nov 22, 2024

Proposed changes

  • Implement an Exception, handling connector state external reset.

Related issues

Checklist

  • I consider the submitted work as finished
  • I tested the code for its functionality using different use cases
  • I added/update the relevant documentation (either on github or on notion)
  • Where necessary I refactored code to improve the overall quality

Further comments

Investigation page

Simplified behavior changes

stateDiagram-v2
    classDef RED fill:red
    classDef ORANGE fill:orange, color:black
    classDef NEW stroke:yellow
    classDef YELLOW fill: yellow, color:black

    [*] --> ContextSetup
    ContextSetup --> ConnectorInit 
    ConnectorInit --> WaitingForNewScheduledJob : Initial State Registered
    WaitingForNewScheduledJob --> Processing : New Job Detected
    Processing --> AttemptReadState 

    AttemptReadState --> StatePresent : STATE Found
    AttemptReadState --> StateNull : STATE Null

    StatePresent --> SuccessCase : Data Received
    SuccessCase --> UpdateState : Update STATE (on scheduler ping)
    UpdateState:::ORANGE --> WaitingForNewScheduledJob : State Updated

    StateNull --> Error : PREVOUSLY - "TypeError NoneType object is not subscriptable"
    Error:::RED--> WaitingForNewScheduledJob : STATE stays null
    
    StateNull:::ORANGE -->HandledError : UPDATED -  "STATE is Null, Reseting Connector to Basic config"
    HandledError:::YELLOW --> ConnectorInit 
    ConnectorInit:::NEW --> WaitingForNewScheduledJob

    WaitingForNewScheduledJob --> ManuallyReset : STATE Manually Reset
    ManuallyReset --> ResetState : Send State Reset Ping
    ResetState --> WaitingForNewScheduledJob : STATE Reset Completed
Loading

@flavienSindou flavienSindou self-assigned this Nov 22, 2024
@flavienSindou flavienSindou linked an issue Nov 22, 2024 that may be closed by this pull request
Comment on lines 572 to 630
else:
# Fix problem when end state is in the future
if (
Timestamp.from_iso(
self.get_state_value(
collection_name=collection, state_key=STATE_END
)
).value
> Timestamp.now().value
):
self.set_state_value(
collection_name=collection, state_key=STATE_END, value=None
)
end_short_format = Timestamp.from_iso(
self.get_state_value(
collection_name=collection, state_key=STATE_END
)
).short_format

if (
first_run is False
and date_now_value - collection_interval < last_run_value
):
diff_time = round(
((date_now_value - last_run_value).total_seconds()) / 60
# Additional information for the "work" depending on the collection (offset, epoch)
start_work = (
start_short_format
if collection in collection_with_start_epoch
else start_offset
)
remaining_time = round(
(
end_work = (
end_short_format
if collection in collection_with_start_epoch
else end_offset
)

import_start_date = (
self.mandiant_indicator_import_start_date
if collection == "indicators"
else self.mandiant_import_start_date
)

if collection in collection_with_start_epoch:
first_run = (
self.get_state_value(
collection_name=collection, state_key=STATE_START
)
== Timestamp.from_iso(import_start_date).iso_format
)
else:
first_run = start_offset == 0

"""
We check that after each API call the collection respects the interval,
either the default or the one specified in the config.
If it does not, we terminate the job and move on to the next collection.
"""

if (
first_run is False
and date_now_value - collection_interval < last_run_value
):
diff_time = round(
((date_now_value - last_run_value).total_seconds()) / 60
)
remaining_time = round(
(
(
collection_interval - timedelta(minutes=diff_time)
).total_seconds()
(
collection_interval - timedelta(minutes=diff_time)
).total_seconds()
)
/ 60
)
/ 60
)
self.helper.connector_logger.info(
f"Ignore the '{collection}' collection because the collection interval in the config is '{collection_interval}', the remaining time until the next collection pull: {remaining_time} min"
)
continue

work_id = self.helper.api.work.initiate_work(
self.helper.connect_id,
f"{collection.title()} {start_work} - {end_work}",
)
self.helper.connector_logger.info(
f"Ignore the '{collection}' collection because the collection interval in the config is '{collection_interval}', the remaining time until the next collection pull: {remaining_time} min"
)
continue

work_id = self.helper.api.work.initiate_work(
self.helper.connect_id,
f"{collection.title()} {start_work} - {end_work}",
)
try:
Copy link
Contributor Author

@flavienSindou flavienSindou Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for the reviewer : Re indented to be in the try: except clause.

collection_interval = getattr(self, f"mandiant_{collection}_interval")

last_run_value = Timestamp.from_iso(
self.get_state_value(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of safe get_state_value responsible for handling the Error rather than direct key access

# API types related to simple offset
collection_with_offset = ["malwares", "actors", "campaigns"]
# Start and End, Offset
start_offset = self.get_state_value(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of safe get_state_value responsible for handling the Error rather than direct key access

]
# Start and End, Timestamp short format
start_date = Timestamp.from_iso(
self.get_state_value(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of safe get_state_value responsible for handling the Error rather than direct key access

"""
# If no end date, put the proper period using delta
if (
self.get_state_value(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of safe get_state_value responsible for handling the Error rather than direct key access

collection_name=collection, state_key=STATE_END, value=None
)
end_short_format = Timestamp.from_iso(
self.get_state_value(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of safe get_state_value responsible for handling the Error rather than direct key access

).value
> Timestamp.now().value
):
self.set_state_value(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of safe set_state_value method,responsible for handling the Error, rather than direct key access


if collection in collection_with_start_epoch:
first_run = (
self.get_state_value(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of safe get_state_value responsible for handling the Error rather than direct key access

@@ -646,13 +737,21 @@ def process_message(self):
self.helper.connector_logger.info("Connector stop")
sys.exit(0)

except StateError as err:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core of the PR

@@ -6,7 +6,7 @@ class Timestamp:
format = "%Y-%m-%dT%H:%M:%S.%fZ"

def __init__(self, value):
if type(value) == datetime:
if isinstance(value, datetime):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: fix E721

@flavienSindou flavienSindou changed the title Issue/3001 mandiant crash if the state is empty [Mandiant] fix: infinitely crashes if the state is empty Nov 22, 2024
@flavienSindou flavienSindou added bug use for describing something not working as expected filigran team use to identify PR from the Filigran team filigran support [optional] use to identify an issue related to feature developed & maintained by Filigran. labels Nov 22, 2024
@flavienSindou flavienSindou removed bug use for describing something not working as expected filigran support [optional] use to identify an issue related to feature developed & maintained by Filigran. labels Nov 22, 2024
@@ -646,13 +737,21 @@ def process_message(self):
self.helper.connector_logger.info("Connector stop")
sys.exit(0)

except StateError as err:
self.helper.connector_logger.error(str(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be interesting to clarify the message ?

Suggested change
self.helper.connector_logger.error(str(err))
self.helper.connector_logger.error(
"Failed du to connector state error", {"error": str(err)}
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion !

if work_id is not None:
self.helper.api.work.to_processed(
work_id, "Failed du to connector state error", in_error=True
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if work_id is not None:
    self.helper.api.work.to_processed(
        work_id, "Failed du to connector state error", in_error=True
    )
    work_id = None
    break
  • Why suggest "work_id = None" :
    When we intercept the error (stateError) and the work_id exists, we perform a to_processed (and indeed, we see a return of this information on the front end!). However, because of the finally block, a second to_processed is executed, as at this point the work_id still exists. This is why we're considering setting the work_id to None before entering the finally block, to avoid this behavior.

  • Why suggest "break" :
    If we don't add a break when a StateError is triggered, then the same error message will be generated for each of the “active” collections. Since a problem at state level affects all collections, it would be appropriate to exit the loop (for collection in self.mandiant_collections:) and wait for the next connector schedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion !

Copy link
Member

@Megafredo Megafredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems ok ;)

@flavienSindou flavienSindou merged commit 436ceaf into master Nov 27, 2024
4 checks passed
@flavienSindou flavienSindou deleted the issue/3001-mandiant-crash-if-the-state-is-empty branch November 28, 2024 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
filigran team use to identify PR from the Filigran team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Mandiant] Crash if the state is empty
2 participants