Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added enhanced support for "Auto extraction" with observables for the emlparser analyzer #399

Closed
wants to merge 8 commits into from

Conversation

jeffrey-e
Copy link

To enhance the workflow of analists who analyse Scam E-mails, we wanted to make it possible to extract observables automatically out of the data generated by the emlparser (that is, header, body and such).
As we developed it specifically for this analyzer, I imported the class responsible for this feature and modified it as required.
I created the merge request as there are a few other teams that might benefit this enhancement.

The regexes seem to be pretty solid after a few weeks of testing (haven't heard any complaints from the team).

@jeffrey-e
Copy link
Author

Probably should use a better regex for the IP address extraction. Something like this: (25[0-5]|2[0-4]\d|[1]\d\d|[1-9]\d|[1-9])(.(25[0-5]|2[0-4]\d|[1]\d\d|[1-9]\d|\d)){3}

@3c7
Copy link
Contributor

3c7 commented Jan 3, 2019

You could also add + to the first part of the e-mail regex to properly parse e-mail addresses including filters.

@3c7 3c7 added category:enhancement Issue is related to an existing feature to improve scope:analyzer Issue is analyzer related status:pr-submitted status:needs-review labels Jan 3, 2019
@jeffrey-e
Copy link
Author

@3c7 Can you give an example as there already is a + present so I am having some trouble understanding what you mean.

@3c7
Copy link
Contributor

3c7 commented Jan 3, 2019

@gekkeharry13 The addresses can look like this https://regex101.com/r/kPRWoJ/1 so I added \+ to the first character list.

@jeffrey-e
Copy link
Author

oohh, I did not knew this syntax (or mail filters in general) exist. I will add it and update my fork.

@3c7
Copy link
Contributor

3c7 commented Jan 3, 2019

In general those characters can be used in the local part of an e-mail address:

  • uppercase and lowercase Latin letters A to Z and a to z
  • digits 0 to 9
  • printable characters others than letters and digit !#$%&'*+-/=?^_{|}~`

The + is not a filter in general, but is used as one in some mail servers. Using that quite often. :)

@jeffrey-e
Copy link
Author

Thanks for the info @3c7!

Jeffrey Everling added 2 commits February 14, 2019 17:40
@jeffrey-e
Copy link
Author

Hi all, I am having some issues with my git stupidity. I have made these changes in master and now I have some other analyzers, responders that I am building, making, but I am kinda stuck here.

Do you guys like this proposal, because then I can tidy it for the merge or if you do not like it let me know. I think this needs to merge or be removed in order to place my next pull request (without it being dirty).

@nadouani
Copy link
Contributor

nadouani commented Feb 18, 2019

Hi @gekkeharry13 I like the changes you added to the extractor, but I don't think it's a good idea to have a custom extractor within the analyzer.

What could be great it to try to add your changes in https://github.com/TheHive-Project/Cortex-Analyzers/blob/master/contrib/cortexutils/extractor.py which is part of cortexutils so that your options can be shared with the other analyzers.

That being said, I don't know what does the CustomExtractor bring.

@jeffrey-e
Copy link
Author

Ah yes, that would be great indeed, it should work as I literally copied that class.
The other thing is, in order to get more extracted observables I was thinking of creating some specific regexes that work on the mail header. This can enable us to extract hostnames without false positives where if the regex would be more generic it will generate false positives. That was the reason behind adding the class to the analyser.
For these regexes I can create a PR for cortexutils. I will let you know when it's done and then you can close this one.

@nadouani
Copy link
Contributor

I think this is a good idea.

@3c7 what do you think about that?

@3c7
Copy link
Contributor

3c7 commented Feb 18, 2019

Yeah, I like that, too.

@nadouani
Copy link
Contributor

Instead of customizing the extractor with just a regex, I would customize it with a dataType + an extraction function. The function could be a regex test.

@jeffrey-e
Copy link
Author

Yeah I agree. Something that extends the functionality of the automated extractor. I will change my repo and let you know when I think I have a good solution :)

@jeffrey-e
Copy link
Author

jeffrey-e commented Feb 19, 2019

Hi guys, so I have been working on the new approach today. The cortexutils fork can be found here:
https://github.com/gekkeharry13/cortexutils/tree/extractor-improvement/cortexutils

I am a bit stuck at how I can properly inherit the class (or make it a function) so that "def artifacts" isn't required in the analyzer itself. Maybe some of you are a bit more experienced and have a solution for this?

The latest changes I have added removed most of the code from the analyzer, back to the cortexutils.

@nadouani
Copy link
Contributor

@gekkeharry13 you need to fork this entire repo and then update the files under contrib/cortexutils so that we can see the diff inteoduced by your PR.

If you want to create anther PR for that, you can, it will be easier for both of us

Thanks

@jeffrey-e
Copy link
Author

Hereby the PR 👍
TheHive-Project/cortexutils#1

@ninSmith
Copy link
Contributor

Poping in the conversation, someone's just asked me about attachment.
Like an eml with attachment, would be nice to add the attachment as file observable to the case.

What do you guys think ?

@nadouani
Copy link
Contributor

This is not possible for the moment, but it can be considered for the next version of cortex

@jeffrey-e
Copy link
Author

jeffrey-e commented Feb 20, 2019

Agree, the functionality can be extracted from synapse I guess? As we pop our mails through Synapse and files are extracted just fine. When Cortex 3 is released we should check :)

@ninSmith
Copy link
Contributor

That's what I thought (and said).
Thanks for confirming.

@nadouani
Copy link
Contributor

nadouani commented Apr 4, 2019

This is not handled in cortexutils dedicated repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:enhancement Issue is related to an existing feature to improve scope:analyzer Issue is analyzer related status:needs-review status:pr-submitted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants