Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Jupyter analyzer+responder for Cortex #1199

Merged

Conversation

LetMeR00t
Copy link
Contributor

@LetMeR00t LetMeR00t commented Jul 22, 2023

Hello everyone,

This is a new analyzer+responder for Cortex which is able to execute notebooks in Jupyter.

Please, let me know if I have to change something but this analyzer seems to be a must have for me.

Thank you

@LetMeR00t LetMeR00t changed the title [NEW] Jupyter analyzer for Cortex [NEW] Jupyter analyzer+responder for Cortex Jul 28, 2023
@jeromeleonard jeromeleonard self-requested a review August 1, 2023 08:18
@jeromeleonard jeromeleonard added scope:analyzer Issue is analyzer related scope:responder Issues/PRs pertaining to responders category:new-analyzer New analyzer submitted category:new-responder priotiry:high labels Aug 1, 2023
@jeromeleonard jeromeleonard added this to the 3.3.0 milestone Aug 1, 2023
@LetMeR00t
Copy link
Contributor Author

LetMeR00t commented Aug 8, 2023

@jeromeleonard
Please note that I've provided a quick fix due to the issue you encountered.
I'm assuming that you were using input or ouput notebook configuration to something not starting with "http" or "https" resulting to not be considered as a HTTP handler by Papermill and causing, within my code, to return an empty structure when it should be initialized.
You should have a different error now and if you want to use HTTP/HTTPS as recommended, please mention hostname starting with "http" or "https".
Thank you

EDIT: I've added a screenshot of a settings example if you need so within the READMEs.

@jeromeleonard
Copy link
Contributor

@LetMeR00t thanks for the update.
I've updated the code of papermill to run with the Analyzer/Responder:
2023-08-09_10-30-32

but can't find the menu in Jupyterhub to add tags in cells:
2023-08-09_10-32-47

and still have this error message:

Input notebook does not contain a cell with tag 'parameters'/usr/local/lib/python3.9/dist-packages/IPython/paths.py:69: UserWarning: IPython parent '/home/cortex' is not a writable location, using a temp directory.  warn("IPython parent '{0}' is not a writable location,"/usr/local/lib/python3.9/dist-packages/IPython/paths.py:69: UserWarning: IPython parent '/home/cortex' is not a writable location, using a temp directory.  warn("IPython parent '{0}' is not a writable location,"Traceback (most recent call last):  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
line 625, in <module>    Jupyter().run()  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
line 566, in run    nb_output = pm.execute_notebook(  File "/usr/local/lib/python3.9/dist-packages/papermill/execute.py", 
line 131, in execute_notebook    write_ipynb(nb, output_path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
line 502, in write_ipynb    papermill_io.write(nbformat.writes(nb), path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
line 106, in write    return self.get_handler(path, extensions).write(buf, path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
line 194, in write    result = requests.put(path, json=json.loads(payload))  File "/usr/lib/python3.9/json/__init__.py", 
line 339, in loads    raise TypeError(f'the JSON object must be str, bytes or bytearray, 'TypeError: the JSON object must be str, bytes or bytearray, not dict

@LetMeR00t
Copy link
Contributor Author

LetMeR00t commented Aug 9, 2023

Hi @jeromeleonard
Fix seems to be as expected so no issue on that.
Your instance of JupyterHub doesn’t look like JupyterHub in my side (installed within Docker), could you precise how did you installed it please?
It might help me to identify what is wrong on my side because even if it’s not JupyterHub, it means that it’s a case I’m not covering.
My guess for now would be that you have a JupyterServer instance instead of JupyterHub and that in this case you should not apply the fix and specify that you’re not running on a JupyterHub instance in the configuration, but this is just an assumption for now.
Keep me posted please

@jeromeleonard
Copy link
Contributor

jeromeleonard commented Aug 9, 2023

I tried to install and launch Jupiterhub both using process and docker, same results. I followed both guides:

@LetMeR00t
Copy link
Contributor Author

Hi @jeromeleonard,
I looked into the code to find the issue.
Regarding the steps you followed, I assume you effectively installed JupyterHub as expected.
The issue raised is happening when the notebook is trying to be written so when we are on the output step so I assume that the notebook is executed as expected.

First remark is that your output folder setting you sent me by email isn’t having slashes as mentionned in the documentation, please review it and see the sample screenshot within the doc to have an example if needed. My current assumption is that the output url isn’t built as expected as the folder doesn’t have slashes when they are needed. This might returned an error from Jupyter (because url isn’t valid as it’s not existing) and you should see this error in Jupyter logs.

My second remark would be to enable the remote execution setting of the analyzer that will use a different path in the code and let you execute your notebook on the Jupyter instance directly by communicating with the kernel directly (and avoid to have to install all the libraries required by the notebook itself on the cortex instance as requirements)

Let me know if this is helping

@jeromeleonard
Copy link
Contributor

I updated the configuration.
I have this in Jupyterhub logs:

[I 2023-08-09 17:08:17.450 JupyterHub log:191] 200 GET /hub/api/users/jerome ([email protected]) 93.01ms
[I 2023-08-09 17:08:17.457 JupyterHub users:725] Server jerome:cortex_job is already started
[I 2023-08-09 17:08:17.457 JupyterHub log:191] 200 GET /hub/api/users/jerome/servers/cortex_job/progress ([email protected]) 4.69ms
[I 2023-08-09 17:08:17.464 JupyterHub log:191] 200 GET /hub/api/users/jerome ([email protected]) 5.45ms
[I 2023-08-09 17:08:17.471 JupyterHub users:725] Server jerome:cortex_job is already started
[I 2023-08-09 17:08:17.472 JupyterHub log:191] 200 GET /hub/api/users/jerome/servers/cortex_job/progress ([email protected]) 4.74ms
[I 2023-08-09 17:08:17.499 ServerApp] Kernel started: d9fb0e67-65b9-4a77-98dd-7599f2941f8c
[I 2023-08-09 17:08:17.500 ServerApp] 201 POST /user/jerome/cortex_job/api/kernels ([email protected]) 21.66ms
[W 2023-08-09 17:08:17.504 ServerApp] No session ID specified
[I 2023-08-09 17:08:18.145 ServerApp] 101 GET /user/jerome/cortex_job/api/kernels/d9fb0e67-65b9-4a77-98dd-7599f2941f8c/channels ([email protected]) 642.61ms
[I 2023-08-09 17:08:18.146 ServerApp] Connecting to kernel d9fb0e67-65b9-4a77-98dd-7599f2941f8c.
[I 2023-08-09 17:08:18.460 ServerApp] 200 GET /user/jerome/cortex_job/api/contents/analyzer-test.ipynb?token=[secret] ([email protected]) 308.76ms
[I 2023-08-09 17:08:19.023 ServerApp] Starting buffering for d9fb0e67-65b9-4a77-98dd-7599f2941f8c:3babc272-91a3c4f71dfc9aac97c48457

I don't know if this line means something: [W 2023-08-09 17:08:17.504 ServerApp] No session ID specified

and new error message from the analyzer:

  Traceback (most recent call last):  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
  line 625, in <module>    Jupyter().run()  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
  line 507, in run    results = IOLoop.current().run_sync(self.execute_notebook_remotely)  File "/usr/local/lib/python3.9/dist-packages/tornado/ioloop.py", 
  line 527, in run_sync    return future_cell[0].result()  File "/usr/local/lib/python3.9/dist-packages/tornado/gen.py", 
  line 786, in run    yielded = self.gen.send(value)  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", 
  line 474, in execute_notebook_remotely    pm.iorw.write_ipynb(nb_output, output_path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
  line 501, in write_ipynb    papermill_io.write(nbformat.writes(nb), path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
  line 106, in write    return self.get_handler(path, extensions).write(buf, path)  File "/usr/local/lib/python3.9/dist-packages/papermill/iorw.py", 
  line 193, in write    result = requests.put(path, json=json.loads(payload))  File "/usr/lib/python3.9/json/__init__.py", 
  line 339, in loads    raise TypeError(f'the JSON object must be str, bytes or bytearray, 'TypeError: the JSON object must be str, bytes or bytearray, not dict

@LetMeR00t
Copy link
Contributor Author

LetMeR00t commented Aug 9, 2023

It’s not an issue to have none session id, a new one will be created.
Activating the remote execution is working here but the issue is still linked to the same piece of code.
Did you updated the output folder too ?
If so, did the input path is starting with a slash too ?
if not, please try this.
If it’s not yet a success, please share the cortex connector settings again.

if the issue is coming from the missing slashes, I’ll probably review the way I’m building the URL

@LetMeR00t
Copy link
Contributor Author

Please note also that the output path should end with a slash too (as in the screenshot sample).
This is highly the issue here and I’ll review it if so

@jeromeleonard
Copy link
Contributor

2023-08-10_07-41-04

@jeromeleonard
Copy link
Contributor

I reviewed the readme several times, and honestly don't know what i've missed.
I'm also wondering if my jupyterhub setup is really fine...

@LetMeR00t
Copy link
Contributor Author

LetMeR00t commented Aug 10, 2023

Still the same issue even if those changes then ?
Let’s try to debug it
Could you add just before this line :

pm.iorw.write_ipynb(nb_output, output_path)

a simple :

self.error(str(output_path))

And see how the url is built ?

Other point, is the output folder already existing ?

Thank you and sorry for the inconvenience of the issues encountered

@jeromeleonard
Copy link
Contributor

http://10.10.0.165:8000/user/jerome/cortex_job/api/contents/analyzer-test/2023-08-10-1.1.1.1-analyzer-test.ipynb?token=66175423f43a4fe3ece18eebebf55879

I have already created the folder analyzer-test/ at the root of the view in Jupyterhub, the same where I have the file analyzer-test.ipynb.

@LetMeR00t
Copy link
Contributor Author

Hi @jeromeleonard
Everything seems fine so I got back to the original error raised.
I can’t check my own code for now but what is weird is that I’m using a json.loads function on a dictionary when it should be something like json.dumps instead. Could you try a json.dumps instead of json.loads at the line mentionned by the error please ?
I’ll check as soon as I can on my own code if I did something different that what is proposed in the README code for the patch

@LetMeR00t
Copy link
Contributor Author

Other solution could also be just that I did a mistake on the location of the json.loads

As the JSON is parsed completely and I’m getting the « buf » variable as input that is a string, I just remember that it would be more for the buf variable instead to do the json.loads

so if the first solution isn’t working, try to replace the « json=json.loads(payload) » by « json=payload » and « payload["content"] = buf » by « payload["content"] = json.loads(buf) »

let me know if this is helping

@jeromeleonard
Copy link
Contributor

jeromeleonard commented Aug 10, 2023

made the changes. New behaviour:

Traceback (most recent call last):  
File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 626, in <module>    Jupyter().run()  
File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 622, in run    self.report(report_results)  
File "/usr/local/lib/python3.9/dist-packages/cortexutils/analyzer.py", line 113, in report    'artifacts': self.artifacts(full_report),  
File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 201, in artifacts    for cell in notebook["cells"]:KeyError: 'cells'

I printed notebook.keys() : dict_keys(['name', 'duration', 'output_notebook', 'html'])

@jeromeleonard
Copy link
Contributor

jeromeleonard commented Aug 10, 2023

This time i get the output notebook in the output folder. But still get this error message regarding cells key.

@jeromeleonard
Copy link
Contributor

I managed to make it work 🎉 I forgot to set the option any_only_html to False.

@LetMeR00t
Copy link
Contributor Author

LetMeR00t commented Aug 10, 2023

Hi @jeromeleonard,
Good to know that you make it work at last!
But this little experience is bringing some changes required ;)

First thing, I've checked my patch code and indeed, it was the latest version I talked you about with the wrong location of the json.loads. I've updated it in the READMEs and in the issue I created accordingly.

Second thing is that I replaced the parameter "any_only_html" to "any_generate_html" to avoid any confusion. The idea is that within the script, we are generating a "beautiful render" of the notebooks but it's causing the response from the connector to be heavy (as it's embedding the HTML code). In order to reduce that, we propose to not forcely generate the HTML code if the users don't want to.

Third thing is to review the way I'm building the URLs to make it easier. Additionally, I've added the feature to support datime format within the ouput folder name. An explanation is provided within the documentation.

I let you review/check those changes again hoping it will get your life easier :) Be aware that your current parameters for the cortex analyzer/responder aren't good anymore, see the screenshot sample to get some help.

Thank you for the review. If you need anything else from me, let me know

@jeromeleonard
Copy link
Contributor

Thank you for the updated. Tested and everything is working fine.
I will make an update to chmod +x jupyter.py; permissions should be updated for the program to make it work.
I am also working to make it work when running it as Docker, still have issues atm.

My Dockerfile:

FROM python:3.9-slim
WORKDIR /worker
COPY . Jupyter_Analyzer
RUN test ! -e Jupyter_Analyzer/requirements.txt || pip install --no-cache-dir -rJupyter_Analyzer/requirements.txt
COPY <<-"EOT" /Jupyter_Analyzer/papermill_iorw.patch
--- iorw.py       2023-08-11 05:49:49.302149767 +0000
+++ iorw.py     2023-08-11 05:48:38.553642098 +0000
@@ -180,7 +180,7 @@
 class HttpHandler(object):
     @classmethod
     def read(cls, path):
-        return requests.get(path, headers={'Accept': 'application/json'}).text
+        return json.dumps(requests.get(path, headers={'Accept': 'application/json'}).json()["content"])
 
     @classmethod
     def listdir(cls, path):
@@ -188,7 +188,9 @@
 
     @classmethod
     def write(cls, buf, path):
-        result = requests.put(path, json=json.loads(buf))
+        payload = {"type": "notebook", "format": "json", "path": path}
+        payload["content"] = json.loads(buf)
+        result = requests.put(path, json=payload)
         result.raise_for_status()
 
     @classmethod
EOT
RUN pip install papermill
RUN apt update
RUN apt install patch
RUN patch $(python3 -c "from papermill import iorw; print(iorw.__file__)") /Jupyter_Analyzer/papermill_iorw.patch 
ENTRYPOINT Jupyter_Analyzer/jupyter.py

The container is run but get this error:

 Traceback (most recent call last): 
  File "/usr/local/lib/python3.9/dist-packages/requests/models.py", line 960, in json    return complexjson.loads(self.content.decode(encoding), **kwargs) 
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads    return _default_decoder.decode(s) 
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode    obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
  File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode    raise JSONDecodeError("Expecting value", s, err.value) from Nonejson.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)During handling of the above exception, another exception occurred:Traceback (most recent call last): 
  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 614, in <module>    Jupyter().run() 
  File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 27, in __init__    self.input_configuration = self.initialize_path( 
   File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 148, in initialize_path    result["server"] = self.start_server( 
   File "/opt/Cortex-Analyzers/analyzers/Jupyter_Analyzer/jupyter.py", line 270, in start_server    user_model = r.json() 
  File "/usr/local/lib/python3.9/dist-packages/requests/models.py", line 968, in json    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I am currently investigating

@LetMeR00t
Copy link
Contributor Author

Hi,
Seeing where it's happening, it seems to be the first call made through the API. This is weird but I assume you don't have any issue with your own environment as you mentionned that you tested the patches from yesterday successfully.
If you can try to log the response as text from the request, it might help to know what is the issue because it's not even sure that the response here is in JSON.
Let me know if I can help you with something

@LetMeR00t
Copy link
Contributor Author

How do you configure your analyzer with such Dockerfile ?

@jeromeleonard
Copy link
Contributor

jeromeleonard commented Aug 11, 2023

working in running a mitproxy to log everything.
to use it, you have to update the json file and replace the command key with "dockerImage": "jupyter_run_notebook_analyzer:devel".

Then you just have to disable your analyzer, "Refresh" them, and enable it again.
You should see this in the list of Analyzers:

2023-08-11_13-34-48

I use the same configuration.

@LetMeR00t
Copy link
Contributor Author

Perfect
Do you need anything else from me ?
Thank you

@jeromeleonard
Copy link
Contributor

jeromeleonard commented Aug 16, 2023

I had some issues using it with a docker image and spent some time to find the reason. I found a missing python requirement: black. Now the analyzer works with process and docker. For me. this is ready to be released.

@jeromeleonard jeromeleonard merged commit 787e62a into TheHive-Project:develop Aug 16, 2023
@LetMeR00t
Copy link
Contributor Author

LetMeR00t commented Aug 16, 2023

Thank you
It’s weird for black as it should be used only for code formatting ? …
Anyway, if you have any issue with the connector, please let me know
Thank you again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:new-analyzer New analyzer submitted category:new-responder priotiry:high scope:analyzer Issue is analyzer related scope:responder Issues/PRs pertaining to responders
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants