-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thehive4 with Cassandra very slow compared to TH3.4 on the same data #1341
Comments
I have the same experience on my signle host system, but I'm still feeding data into TheHive, so I wasn't sure if this is the root-cause. |
On a single Node cluster with 32 GB RAM and 2 CPUs a 2.2 GHz it takes about 6 Seconds to mark an Alert as unread. The CPU is running 2 processes on 100% :-( |
Oh yes, now I also checked the cpu load. During a "UI operation" in TH4, both java processes (thehive, and cassandra) go as high as 300%-400% cpu usage. Means each of them consumes like 3-4 cpu cores at 100%. And this goes for a few seconds, until the loading of data is done. Comparably, the elastic java process on the prod instance rarely goes over 15-30% cpu, during normal Theive-UI-operations. |
TheHive 4 is slower than TheHive 3, it must manage the relationship between entities and must check permission and visibility. Moreover, the database is now transnational. But it should be slightly slower, not as much as you describe it. <?xml version="1.0" encoding="UTF-8"?>
<configuration debug="false">
<!-- [...] -->
<logger name="org.thp" level="TRACE"/> <!-- add this line -->
<root level="INFO">
<appender-ref ref="ASYNCFILE"/>
<appender-ref ref="ASYNCSTDOUT"/>
</root>
</configuration> |
I have enabled the TRACE level. |
@martinr103 you can create a gist |
Oh, I just found the hint at the bottom of this message box - that you can actually drag&drop files into it , to create attachments. Should have seen it earlier :) Ok, so attached is the requested log from the operation "mark alert as read". I haven't measured the time, but it took several seconds - from the click on the envelope icon until the screen refreshed. |
These logs comes from console, not from log file (/var/log/thehive/application.log). They doesn't contain timestamps. Nevertheless, I can see:
38ms is not long.
There are heavy processes behind there requests because the user permission must be check for each alerts and because they still use TheHive3 compatible query format. I'll try to optimize them. |
Ah yes, sorry, below is the same type of logs, but now with timestamps. Maybe it helps a little more. We have ~4700 alerts in this database. Basically I can confirm... yes all the searches take long (like: search for all open cases, search for all closed cases, search for alerts, search for artifacts of 1 case, etc...) |
I'd like to check if thehive3 query format is the real reason of performance problem. Can you execute the following command:
Replace |
Getting an error from the described command:
|
Here some results from my installation to compare it.
|
I have similar results. I've been running the migration since 20 May and it's still running (after several resumes, due to errors)
|
@martinr103 the attribute @crackytsi @mamoedo thank you. There is an important difference.
@crackytsi I'm surprised by your results. 3 seconds to retrieve the 15 first cases is very long. I think your platform is overloaded. |
Thanks @To-om. With the modified command
Out of curiosity I also tried one of @crackytsi searches, and got these numbers:
On second attempt, the same query:
So yeah... it's consistently slow ... 4,5,7 seconds for the alert search. |
@To-om: Yes platform might be overloaded. Actually it was the same host that runs without any bigger CPU spikes TH3.4 on 2.2 Ghz with 6000 Cases. For the Cassandra Update, I added an additional CPU but the system is still under heavy load.
Never the less browsing through cases is still slow (takes arround 5-10 seconds per page-load) |
This issue has been fixed with #1731, included in release 4.1.0. |
Bug?
Work Environment
Problem Description
I observe a massive performance degradation with TH4/Cassandra, compared to TH3.4/Elastic.
And I would like to ask here if other people have similar experiences, or am I "alone" with this issue.
We run the Test-Instance of Cassandra and TH4 as Docker containers, data has been imported from the production environment (TH3.4) using the provided document.
There is a little less then 5000 cases, each case has a "handful" of observables (like 1.. 2... 5) -- none of the cases has more than 20 obs.
So all in all, this is in my opinion a rather small amount of data to be handled by a database.
In fact, our production instance (TH3.4) runs and "feels" very fast in the UI. Loading the list of all cases (or say, by quick filters, "only closed cases"... "only open cases") is instant. No delay whatsoever.
Also opening a single case, checking its details, checking its observables - goes instantly.
Compared to this, the TH4 UI "feels" terribly slow. Loading the list of cases takes 3+ seconds.
Applying any filters on the case list - the same.
Opening a single case details, and switching to the Observables tab takes "ages" before the UI actually displays the obs. (5-6 or more seconds).
This makes TH4 actually unusable as it currently is.
Has anybody observed similar slowness ?
Or is it super-fast for most users ? (which would indicate that something is pretty wrong in our local setup)
Complementary information
Here are a few lines of thehive log (filtered to show the times the searches take) :
(look at: case/_search and case/artifact/_search numbers)
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/task/log/_search?range=0-100&nparent=1 took 4305ms and returned 200 2 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/task/_stats took 9293ms and returned 200 11 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_stats took 13057ms and returned 200 28 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_stats took 12622ms and returned 200 92 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_stats took 12626ms and returned 200 28 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_stats took 12634ms and returned 200 31 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_search?range=all&sort=-startDate&nstats=true took 12713ms and returned 200 645 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/stream took 3ms and returned 200 20 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/v0/query took 402ms and returned 200 125878 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/alert/_stats took 2798ms and returned 200 40 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/_stats took 2947ms and returned 200 703 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/_stats took 2973ms and returned 200 61 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/_search?range=0-15&sort=-flag&sort=-startDate&nstats=true took 4621ms and returned 200 31861 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/_stats took 2159ms and returned 200 14 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/task/_stats took 5206ms and returned 200 39 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/task/_stats took 5671ms and returned 200 11 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/_stats took 2832ms and returned 200 61 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/_search?range=0-15&sort=-flag&sort=-startDate&nstats=true took 3062ms and returned 200 31358 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/_search?range=0-15&sort=-flag&sort=-startDate&nstats=true took 3133ms and returned 200 30828 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/connector/cortex/action/_search?range=0-100&sort=-startDate took 13ms and returned 200 2 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/alert/_stats took 1252ms and returned 200 31 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/task/log/_search?range=0-100&nparent=1 took 4100ms and returned 200 2 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/task/_stats took 8976ms and returned 200 11 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_stats took 12228ms and returned 200 28 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_stats took 12120ms and returned 200 61 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_stats took 12099ms and returned 200 30 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_stats took 12101ms and returned 200 31 bytes
[info] o.t.s.AccessLogFilter - 10.91.128.2 POST /api/case/artifact/_search?range=all&sort=-startDate&nstats=true took 12153ms and returned 200 317 bytes
The text was updated successfully, but these errors were encountered: