Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Search queries are slow with large datasets #2116

Open
mamoedo opened this issue Jul 14, 2021 · 3 comments
Open

[Bug] Search queries are slow with large datasets #2116

mamoedo opened this issue Jul 14, 2021 · 3 comments
Labels

Comments

@mamoedo
Copy link

mamoedo commented Jul 14, 2021

Request Type

Bug

Work Environment

Question Answer
TheHive version / git hash 4.1.7
Package Type Binary
Database Cassandra
Index type Elasticsearch
Attachments storage Local

Problem Description

When making a query, TH4 takes 4 minutes to answer my request, so many of my custom analyzers and responders which use this queries are blocked. If I make the query on my dev environment, with few data, it works fine, but if I make the query in my production environment, with large amounts of data, it takes 4 minutes at least.

Steps to Reproduce

  • Customize the following code and run it
from datetime import datetime
from thehive4py.api import TheHiveApi
from thehive4py.query import And, Child, Eq

api = TheHiveApi("http://localhost:9000", "apikey")
query = And(Child('case_artifact', And(Eq('dataType', "ip"), Eq('data', "8.8.8.8"))), Eq('status', 'Open'))
t0 = datetime.now()
response = api.find_cases(query=query, range='all', sort=['-updatedAt'])
t1 = datetime.now()
print(t1-t0)
@mamoedo mamoedo added bug TheHive4 TheHive4 related issues labels Jul 14, 2021
@nadouani
Copy link
Contributor

TheHive4py doesn't support optimised queries for now, It relies on _search APIs.

This samples, have to scan all the database, to find Case objects, and for each one, scans all the observables to find the ones that corresponds to the query. This operation is most certainly not using the index.

@mamoedo
Copy link
Author

mamoedo commented Jul 26, 2021

TheHive4py doesn't support optimised queries for now, It relies on _search APIs.

This samples, have to scan all the database, to find Case objects, and for each one, scans all the observables to find the ones that corresponds to the query. This operation is most certainly not using the index.

Thanks for the answer.

I'm using this query to extract the custom fields of a case when running an analyzer. Is there any optimised way of retrieving them rigth now? All my analyzers that relied on custom fields are blocked by this.

@mamoedo
Copy link
Author

mamoedo commented Oct 4, 2021

Hi @nadouani, I don't see the issue assigned to a milestone. Will this be planned for 4.2.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants