[NEW] Jupyter analyzer+responder for Cortex #1199

LetMeR00t · 2023-07-22T19:29:14Z

Hello everyone,

This is a new analyzer+responder for Cortex which is able to execute notebooks in Jupyter.

Please, let me know if I have to change something but this analyzer seems to be a must have for me.

Thank you

…/long report doc

…o be empty

LetMeR00t · 2023-08-08T19:47:24Z

@jeromeleonard
Please note that I've provided a quick fix due to the issue you encountered.
I'm assuming that you were using input or ouput notebook configuration to something not starting with "http" or "https" resulting to not be considered as a HTTP handler by Papermill and causing, within my code, to return an empty structure when it should be initialized.
You should have a different error now and if you want to use HTTP/HTTPS as recommended, please mention hostname starting with "http" or "https".
Thank you

EDIT: I've added a screenshot of a settings example if you need so within the READMEs.

jeromeleonard · 2023-08-09T10:15:27Z

@LetMeR00t thanks for the update.
I've updated the code of papermill to run with the Analyzer/Responder:

but can't find the menu in Jupyterhub to add tags in cells:

and still have this error message:

Input notebook does not contain a cell with tag 'parameters'/usr/local/lib/python3.9/dist-packages/IPython/paths.py:69: UserWarning: IPython parent '/home/cortex' is not a writable location, using a temp directory.  warn("IPython parent '{0}' is not a writable location,"/usr/local/lib/python3.9/dist-packages/IPython/paths.py:69: UserWarning: IPython parent '/home/cortex' is not a writable location, using a temp directory.  warn("IPython parent '{0}' is not a writable location,"Traceback (most recent call last):  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
line 625, in <module>    Jupyter().run()  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
line 566, in run    nb_output = pm.execute_notebook(  File "/usr/local/lib/python3.9/dist-packages/papermill/execute.py", 
line 131, in execute_notebook    write_ipynb(nb, output_path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
line 502, in write_ipynb    papermill_io.write(nbformat.writes(nb), path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
line 106, in write    return self.get_handler(path, extensions).write(buf, path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
line 194, in write    result = requests.put(path, json=json.loads(payload))  File "/usr/lib/python3.9/json/__init__.py", 
line 339, in loads    raise TypeError(f'the JSON object must be str, bytes or bytearray, 'TypeError: the JSON object must be str, bytes or bytearray, not dict

LetMeR00t · 2023-08-09T11:11:35Z

Hi @jeromeleonard
Fix seems to be as expected so no issue on that.
Your instance of JupyterHub doesn’t look like JupyterHub in my side (installed within Docker), could you precise how did you installed it please?
It might help me to identify what is wrong on my side because even if it’s not JupyterHub, it means that it’s a case I’m not covering.
My guess for now would be that you have a JupyterServer instance instead of JupyterHub and that in this case you should not apply the fix and specify that you’re not running on a JupyterHub instance in the configuration, but this is just an assumption for now.
Keep me posted please

jeromeleonard · 2023-08-09T12:06:47Z

I tried to install and launch Jupiterhub both using process and docker, same results. I followed both guides:

LetMeR00t · 2023-08-09T12:54:05Z

Hi @jeromeleonard,
I looked into the code to find the issue.
Regarding the steps you followed, I assume you effectively installed JupyterHub as expected.
The issue raised is happening when the notebook is trying to be written so when we are on the output step so I assume that the notebook is executed as expected.

First remark is that your output folder setting you sent me by email isn’t having slashes as mentionned in the documentation, please review it and see the sample screenshot within the doc to have an example if needed. My current assumption is that the output url isn’t built as expected as the folder doesn’t have slashes when they are needed. This might returned an error from Jupyter (because url isn’t valid as it’s not existing) and you should see this error in Jupyter logs.

My second remark would be to enable the remote execution setting of the analyzer that will use a different path in the code and let you execute your notebook on the Jupyter instance directly by communicating with the kernel directly (and avoid to have to install all the libraries required by the notebook itself on the cortex instance as requirements)

Let me know if this is helping

jeromeleonard · 2023-08-09T17:13:45Z

I updated the configuration.
I have this in Jupyterhub logs:

[I 2023-08-09 17:08:17.450 JupyterHub log:191] 200 GET /hub/api/users/jerome ([email protected]) 93.01ms
[I 2023-08-09 17:08:17.457 JupyterHub users:725] Server jerome:cortex_job is already started
[I 2023-08-09 17:08:17.457 JupyterHub log:191] 200 GET /hub/api/users/jerome/servers/cortex_job/progress ([email protected]) 4.69ms
[I 2023-08-09 17:08:17.464 JupyterHub log:191] 200 GET /hub/api/users/jerome ([email protected]) 5.45ms
[I 2023-08-09 17:08:17.471 JupyterHub users:725] Server jerome:cortex_job is already started
[I 2023-08-09 17:08:17.472 JupyterHub log:191] 200 GET /hub/api/users/jerome/servers/cortex_job/progress ([email protected]) 4.74ms
[I 2023-08-09 17:08:17.499 ServerApp] Kernel started: d9fb0e67-65b9-4a77-98dd-7599f2941f8c
[I 2023-08-09 17:08:17.500 ServerApp] 201 POST /user/jerome/cortex_job/api/kernels ([email protected]) 21.66ms
[W 2023-08-09 17:08:17.504 ServerApp] No session ID specified
[I 2023-08-09 17:08:18.145 ServerApp] 101 GET /user/jerome/cortex_job/api/kernels/d9fb0e67-65b9-4a77-98dd-7599f2941f8c/channels ([email protected]) 642.61ms
[I 2023-08-09 17:08:18.146 ServerApp] Connecting to kernel d9fb0e67-65b9-4a77-98dd-7599f2941f8c.
[I 2023-08-09 17:08:18.460 ServerApp] 200 GET /user/jerome/cortex_job/api/contents/analyzer-test.ipynb?token=[secret] ([email protected]) 308.76ms
[I 2023-08-09 17:08:19.023 ServerApp] Starting buffering for d9fb0e67-65b9-4a77-98dd-7599f2941f8c:3babc272-91a3c4f71dfc9aac97c48457

I don't know if this line means something: [W 2023-08-09 17:08:17.504 ServerApp] No session ID specified

and new error message from the analyzer:

  Traceback (most recent call last):  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
  line 625, in <module>    Jupyter().run()  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
  line 507, in run    results = IOLoop.current().run_sync(self.execute_notebook_remotely)  File "/usr/local/lib/python3.9/dist-packages/tornado/ioloop.py", 
  line 527, in run_sync    return future_cell[0].result()  File "/usr/local/lib/python3.9/dist-packages/tornado/gen.py", 
  line 786, in run    yielded = self.gen.send(value)  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
  line 474, in execute_notebook_remotely    pm.iorw.write_ipynb(nb_output, output_path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
  line 501, in write_ipynb    papermill_io.write(nbformat.writes(nb), path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
  line 106, in write    return self.get_handler(path, extensions).write(buf, path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
  line 193, in write    result = requests.put(path, json=json.loads(payload))  File "/usr/lib/python3.9/json/__init__.py", 
  line 339, in loads    raise TypeError(f'the JSON object must be str, bytes or bytearray, 'TypeError: the JSON object must be str, bytes or bytearray, not dict

LetMeR00t · 2023-08-09T18:24:50Z

It’s not an issue to have none session id, a new one will be created.
Activating the remote execution is working here but the issue is still linked to the same piece of code.
Did you updated the output folder too ?
If so, did the input path is starting with a slash too ?
if not, please try this.
If it’s not yet a success, please share the cortex connector settings again.

if the issue is coming from the missing slashes, I’ll probably review the way I’m building the URL

LetMeR00t · 2023-08-09T18:29:58Z

Please note also that the output path should end with a slash too (as in the screenshot sample).
This is highly the issue here and I’ll review it if so

jeromeleonard · 2023-08-10T05:43:31Z

jeromeleonard · 2023-08-10T05:46:53Z

I reviewed the readme several times, and honestly don't know what i've missed.
I'm also wondering if my jupyterhub setup is really fine...

LetMeR00t · 2023-08-10T06:04:09Z

Still the same issue even if those changes then ?
Let’s try to debug it
Could you add just before this line :

Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py

Line 474 in 47f777c

pm.iorw.write_ipynb(nb_output, output_path)

a simple :

self.error(str(output_path))

And see how the url is built ?

Other point, is the output folder already existing ?

Thank you and sorry for the inconvenience of the issues encountered

jeromeleonard · 2023-08-10T06:34:26Z

http://10.10.0.165:8000/user/jerome/cortex_job/api/contents/analyzer-test/2023-08-10-1.1.1.1-analyzer-test.ipynb?token=66175423f43a4fe3ece18eebebf55879

I have already created the folder analyzer-test/ at the root of the view in Jupyterhub, the same where I have the file analyzer-test.ipynb.

LetMeR00t · 2023-08-10T07:51:55Z

Hi @jeromeleonard
Everything seems fine so I got back to the original error raised.
I can’t check my own code for now but what is weird is that I’m using a json.loads function on a dictionary when it should be something like json.dumps instead. Could you try a json.dumps instead of json.loads at the line mentionned by the error please ?
I’ll check as soon as I can on my own code if I did something different that what is proposed in the README code for the patch

LetMeR00t · 2023-08-10T07:59:18Z

Other solution could also be just that I did a mistake on the location of the json.loads

As the JSON is parsed completely and I’m getting the « buf » variable as input that is a string, I just remember that it would be more for the buf variable instead to do the json.loads

so if the first solution isn’t working, try to replace the « json=json.loads(payload) » by « json=payload » and « payload["content"] = buf » by « payload["content"] = json.loads(buf) »

let me know if this is helping

jeromeleonard · 2023-08-10T13:42:41Z

made the changes. New behaviour:

Traceback (most recent call last):  
File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 626, in <module>    Jupyter().run()  
File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 622, in run    self.report(report_results)  
File "/usr/local/lib/python3.9/dist-packages/cortexutils/analyzer.py", line 113, in report    'artifacts': self.artifacts(full_report),  
File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 201, in artifacts    for cell in notebook["cells"]:KeyError: 'cells'

I printed notebook.keys() : dict_keys(['name', 'duration', 'output_notebook', 'html'])

jeromeleonard · 2023-08-10T13:48:01Z

This time i get the output notebook in the output folder. But still get this error message regarding cells key.

jeromeleonard · 2023-08-10T14:10:39Z

I managed to make it work 🎉 I forgot to set the option any_only_html to False.

…output URLs

LetMeR00t · 2023-08-10T20:57:10Z

Hi @jeromeleonard,
Good to know that you make it work at last!
But this little experience is bringing some changes required ;)

First thing, I've checked my patch code and indeed, it was the latest version I talked you about with the wrong location of the json.loads. I've updated it in the READMEs and in the issue I created accordingly.

Second thing is that I replaced the parameter "any_only_html" to "any_generate_html" to avoid any confusion. The idea is that within the script, we are generating a "beautiful render" of the notebooks but it's causing the response from the connector to be heavy (as it's embedding the HTML code). In order to reduce that, we propose to not forcely generate the HTML code if the users don't want to.

Third thing is to review the way I'm building the URLs to make it easier. Additionally, I've added the feature to support datime format within the ouput folder name. An explanation is provided within the documentation.

I let you review/check those changes again hoping it will get your life easier :) Be aware that your current parameters for the cortex analyzer/responder aren't good anymore, see the screenshot sample to get some help.

Thank you for the review. If you need anything else from me, let me know

jeromeleonard · 2023-08-11T08:58:52Z

Thank you for the updated. Tested and everything is working fine.
I will make an update to chmod +x jupyter.py; permissions should be updated for the program to make it work.
I am also working to make it work when running it as Docker, still have issues atm.

My Dockerfile:

FROM python:3.9-slim
WORKDIR /worker
COPY . Jupyter_Analyzer
RUN test ! -e Jupyter_Analyzer/requirements.txt || pip install --no-cache-dir -rJupyter_Analyzer/requirements.txt
COPY <<-"EOT" /Jupyter_Analyzer/papermill_iorw.patch
--- iorw.py       2023-08-11 05:49:49.302149767 +0000
+++ iorw.py     2023-08-11 05:48:38.553642098 +0000
@@ -180,7 +180,7 @@
 class HttpHandler(object):
     @classmethod
     def read(cls, path):
-        return requests.get(path, headers={'Accept': 'application/json'}).text
+        return json.dumps(requests.get(path, headers={'Accept': 'application/json'}).json()["content"])
 
     @classmethod
     def listdir(cls, path):
@@ -188,7 +188,9 @@
 
     @classmethod
     def write(cls, buf, path):
-        result = requests.put(path, json=json.loads(buf))
+        payload = {"type": "notebook", "format": "json", "path": path}
+        payload["content"] = json.loads(buf)
+        result = requests.put(path, json=payload)
         result.raise_for_status()
 
     @classmethod
EOT
RUN pip install papermill
RUN apt update
RUN apt install patch
RUN patch $(python3 -c "from papermill import iorw; print(iorw.__file__)") /Jupyter_Analyzer/papermill_iorw.patch 
ENTRYPOINT Jupyter_Analyzer/jupyter.py

The container is run but get this error:

 Traceback (most recent call last): 
  File "/usr/local/lib/python3.9/dist-packages/requests/models.py", line 960, in json    return complexjson.loads(self.content.decode(encoding), **kwargs) 
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads    return _default_decoder.decode(s) 
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
  File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode    raise JSONDecodeError("Expecting value", s, err.value) from Nonejson.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)During handling of the above exception, another exception occurred:Traceback (most recent call last): 
  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 614, in <module>    Jupyter().run() 
  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 27, in __init__    self.input_configuration = self.initialize_path( 
   File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 148, in initialize_path    result["server"] = self.start_server( 
   File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 270, in start_server    user_model = r.json() 
  File "/usr/local/lib/python3.9/dist-packages/requests/models.py", line 968, in json    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I am currently investigating

LetMeR00t · 2023-08-11T09:30:19Z

Hi,
Seeing where it's happening, it seems to be the first call made through the API. This is weird but I assume you don't have any issue with your own environment as you mentionned that you tested the patches from yesterday successfully.
If you can try to log the response as text from the request, it might help to know what is the issue because it's not even sure that the response here is in JSON.
Let me know if I can help you with something

LetMeR00t · 2023-08-11T09:31:11Z

How do you configure your analyzer with such Dockerfile ?

jeromeleonard · 2023-08-11T11:35:38Z

working in running a mitproxy to log everything.
to use it, you have to update the json file and replace the command key with "dockerImage": "jupyter_run_notebook_analyzer:devel".

Then you just have to disable your analyzer, "Refresh" them, and enable it again.
You should see this in the list of Analyzers:

I use the same configuration.

LetMeR00t · 2023-08-11T13:42:01Z

Perfect
Do you need anything else from me ?
Thank you

jeromeleonard · 2023-08-16T10:25:47Z

I had some issues using it with a docker image and spent some time to find the reason. I found a missing python requirement: black. Now the analyzer works with process and docker. For me. this is ready to be released.

LetMeR00t · 2023-08-16T11:47:02Z

Thank you
It’s weird for black as it should be used only for code formatting ? …
Anyway, if you have any issue with the connector, please let me know
Thank you again

LetMeR00t added 15 commits July 21, 2023 21:31

feat: add first version of usable jupyter analyze

059e025

fix: missing hostname data type

0187fae

refactor: applying black format

76b9ddd

fix: add missing librairies

4fa03e9

doc: fill the first version of the README file

4625502

feat: support artifacts extraction

d213724

doc: add information about artifacts way of working and prepare short…

039f47e

…/long report doc

feat: add the taxonomy logic

981fb7f

doc: detailed explanations on the taxonomies

8c06926

refactor: rename folders for future responder

9dbe1b5

refactor: review configuration and update folder name

ba0ed5c

refactor: centralize parameters

9265c69

refactor: centralize id for output notebook path

f1b4bb2

refactor: use better get_param function

ae7fc24

feat: support Jupyter responder

2dd7df9

LetMeR00t changed the title ~~[NEW] Jupyter analyzer for Cortex~~ [NEW] Jupyter analyzer+responder for Cortex Jul 28, 2023

LetMeR00t added 3 commits July 28, 2023 09:47

doc: write first doc for responder

694b919

doc: fix typo

4a04edf

doc: add nutshell summary and fix typo

2cea761

jeromeleonard self-requested a review August 1, 2023 08:18

jeromeleonard added scope:analyzer Issue is analyzer related scope:responder Issues/PRs pertaining to responders category:new-analyzer New analyzer submitted category:new-responder priotiry:high labels Aug 1, 2023

jeromeleonard added this to the 3.3.0 milestone Aug 1, 2023

fix: wrong indentation causing input/output configuration structure t…

7636cde

…o be empty

doc: update documentation with http handler usage example

47f777c

LetMeR00t added 3 commits August 10, 2023 21:05

fix: wrong patch applied, provide a workable solution

9076923

fix: switch any_only_html to any_generate_html parameter

ab9623e

feat: support datetime format in output folder + review way to build …

8deecda

…output URLs

doc: fix typo

b853ec3

doc: add one faq question

4f484f3

jeromeleonard added a commit that referenced this pull request Aug 16, 2023

#1199 add Dockerfile, missing requirements, and fix program permissions

2288f37

jeromeleonard merged commit 787e62a into TheHive-Project:develop Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] Jupyter analyzer+responder for Cortex #1199

[NEW] Jupyter analyzer+responder for Cortex #1199

LetMeR00t commented Jul 22, 2023 •

edited

Loading

LetMeR00t commented Aug 8, 2023 •

edited

Loading

jeromeleonard commented Aug 9, 2023

LetMeR00t commented Aug 9, 2023 •

edited

Loading

jeromeleonard commented Aug 9, 2023 •

edited

Loading

LetMeR00t commented Aug 9, 2023

jeromeleonard commented Aug 9, 2023

LetMeR00t commented Aug 9, 2023 •

edited

Loading

LetMeR00t commented Aug 9, 2023

jeromeleonard commented Aug 10, 2023

jeromeleonard commented Aug 10, 2023

LetMeR00t commented Aug 10, 2023 •

edited

Loading

jeromeleonard commented Aug 10, 2023

LetMeR00t commented Aug 10, 2023

LetMeR00t commented Aug 10, 2023

jeromeleonard commented Aug 10, 2023 •

edited

Loading

jeromeleonard commented Aug 10, 2023 •

edited

Loading

jeromeleonard commented Aug 10, 2023

LetMeR00t commented Aug 10, 2023 •

edited

Loading

jeromeleonard commented Aug 11, 2023

LetMeR00t commented Aug 11, 2023

LetMeR00t commented Aug 11, 2023

jeromeleonard commented Aug 11, 2023 •

edited

Loading

LetMeR00t commented Aug 11, 2023

jeromeleonard commented Aug 16, 2023 •

edited

Loading

LetMeR00t commented Aug 16, 2023 •

edited

Loading

[NEW] Jupyter analyzer+responder for Cortex #1199

[NEW] Jupyter analyzer+responder for Cortex #1199

Conversation

LetMeR00t commented Jul 22, 2023 • edited Loading

LetMeR00t commented Aug 8, 2023 • edited Loading

jeromeleonard commented Aug 9, 2023

LetMeR00t commented Aug 9, 2023 • edited Loading

jeromeleonard commented Aug 9, 2023 • edited Loading

LetMeR00t commented Aug 9, 2023

jeromeleonard commented Aug 9, 2023

LetMeR00t commented Aug 9, 2023 • edited Loading

LetMeR00t commented Aug 9, 2023

jeromeleonard commented Aug 10, 2023

jeromeleonard commented Aug 10, 2023

LetMeR00t commented Aug 10, 2023 • edited Loading

jeromeleonard commented Aug 10, 2023

LetMeR00t commented Aug 10, 2023

LetMeR00t commented Aug 10, 2023

jeromeleonard commented Aug 10, 2023 • edited Loading

jeromeleonard commented Aug 10, 2023 • edited Loading

jeromeleonard commented Aug 10, 2023

LetMeR00t commented Aug 10, 2023 • edited Loading

jeromeleonard commented Aug 11, 2023

LetMeR00t commented Aug 11, 2023

LetMeR00t commented Aug 11, 2023

jeromeleonard commented Aug 11, 2023 • edited Loading

LetMeR00t commented Aug 11, 2023

jeromeleonard commented Aug 16, 2023 • edited Loading

LetMeR00t commented Aug 16, 2023 • edited Loading

LetMeR00t commented Jul 22, 2023 •

edited

Loading

LetMeR00t commented Aug 8, 2023 •

edited

Loading

LetMeR00t commented Aug 9, 2023 •

edited

Loading

jeromeleonard commented Aug 9, 2023 •

edited

Loading

LetMeR00t commented Aug 9, 2023 •

edited

Loading

LetMeR00t commented Aug 10, 2023 •

edited

Loading

jeromeleonard commented Aug 10, 2023 •

edited

Loading

jeromeleonard commented Aug 10, 2023 •

edited

Loading

LetMeR00t commented Aug 10, 2023 •

edited

Loading

jeromeleonard commented Aug 11, 2023 •

edited

Loading

jeromeleonard commented Aug 16, 2023 •

edited

Loading

LetMeR00t commented Aug 16, 2023 •

edited

Loading