[Bug] Search queries are slow with large datasets #2116

mamoedo · 2021-07-14T09:40:11Z

Request Type

Bug

Work Environment

Question	Answer
TheHive version / git hash	4.1.7
Package Type	Binary
Database	Cassandra
Index type	Elasticsearch
Attachments storage	Local

Problem Description

When making a query, TH4 takes 4 minutes to answer my request, so many of my custom analyzers and responders which use this queries are blocked. If I make the query on my dev environment, with few data, it works fine, but if I make the query in my production environment, with large amounts of data, it takes 4 minutes at least.

Steps to Reproduce

Customize the following code and run it

from datetime import datetime
from thehive4py.api import TheHiveApi
from thehive4py.query import And, Child, Eq

api = TheHiveApi("http://localhost:9000", "apikey")
query = And(Child('case_artifact', And(Eq('dataType', "ip"), Eq('data', "8.8.8.8"))), Eq('status', 'Open'))
t0 = datetime.now()
response = api.find_cases(query=query, range='all', sort=['-updatedAt'])
t1 = datetime.now()
print(t1-t0)

The text was updated successfully, but these errors were encountered:

nadouani · 2021-07-26T05:08:22Z

TheHive4py doesn't support optimised queries for now, It relies on _search APIs.

This samples, have to scan all the database, to find Case objects, and for each one, scans all the observables to find the ones that corresponds to the query. This operation is most certainly not using the index.

mamoedo · 2021-07-26T07:44:38Z

TheHive4py doesn't support optimised queries for now, It relies on _search APIs.

This samples, have to scan all the database, to find Case objects, and for each one, scans all the observables to find the ones that corresponds to the query. This operation is most certainly not using the index.

Thanks for the answer.

I'm using this query to extract the custom fields of a case when running an analyzer. Is there any optimised way of retrieving them rigth now? All my analyzers that relied on custom fields are blocked by this.

mamoedo · 2021-10-04T10:32:18Z

Hi @nadouani, I don't see the issue assigned to a milestone. Will this be planned for 4.2.0?

mamoedo added bug TheHive4 TheHive4 related issues labels Jul 14, 2021

nadouani added the scope:performance label Jul 26, 2021

mamoedo mentioned this issue Oct 21, 2021

[Bug] Slow getting case observables via API get_case_observables #2218

Closed

mamoedo mentioned this issue Jan 5, 2022

[Bug] Performance issue using search function #2312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Search queries are slow with large datasets #2116

[Bug] Search queries are slow with large datasets #2116

mamoedo commented Jul 14, 2021

nadouani commented Jul 26, 2021

mamoedo commented Jul 26, 2021

mamoedo commented Oct 4, 2021

[Bug] Search queries are slow with large datasets #2116

[Bug] Search queries are slow with large datasets #2116

Comments

mamoedo commented Jul 14, 2021

Request Type

Work Environment

Problem Description

Steps to Reproduce

nadouani commented Jul 26, 2021

mamoedo commented Jul 26, 2021

mamoedo commented Oct 4, 2021