Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration fails from 4.0.2-1 to 4.1 #2216

Open
marpoe opened this issue Oct 14, 2021 · 1 comment
Open

Migration fails from 4.0.2-1 to 4.1 #2216

marpoe opened this issue Oct 14, 2021 · 1 comment
Assignees
Labels
bug core TheHive4 TheHive4 related issues

Comments

@marpoe
Copy link

marpoe commented Oct 14, 2021

Request Type

Bug

Work Environment

Question Answer
OS version (server) RedHat
OS version (client) 10
TheHive version 4.1.10
Package Type RPM
Database Cassandra
Index type Lucene
Attachments storage Local
Browser type & version Edge

Problem Description

The reindexation step fails when upgrading from TheHive 4.0.2-1.
In my opinion, it is similar to the closed issue #1861. The only difference is, that our problem is related to the size of the "data" field (see log file below). We are using this field for saving SIEM data within our "SIEM <> TheHive" integration, for more details to our workflow and use case, please see the explanation below.

TheHive application.log

2021-10-11 12:05:57,917 [ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-66 - Unexpected error processing data: {}
java.lang.IllegalArgumentException: Document contains at least one immense term in field="data" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[104, 116, 116, 112, 58, 47, 47, 122, 99, 114, 109, 115, 116, 97, 116, 105, 99, 45, 97, 46, 97, 107, 97, 109, 97, 105, 104, 100, 46, 110]...', original message: bytes can be at most 32766 in length; got 77530
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:853)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1616)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1608)
at org.janusgraph.diskstorage.lucene.LuceneIndex.restore(LuceneIndex.java:305)
at org.janusgraph.diskstorage.indexing.IndexTransaction.restore(IndexTransaction.java:128)
at org.janusgraph.graphdb.olap.job.IndexRepairJob.workerIterationEnd(IndexRepairJob.java:201)
at org.janusgraph.graphdb.olap.VertexJobConverter.workerIterationEnd(VertexJobConverter.java:118)
at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:285)
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 77530
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:265)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:843)
... 11 common frames omitted

Some points to our Workflow

Within our SIEM <> TheHive integration, we are creating TheHive-Alerts based on triggered SIEM Alerts. Here we are mapping the SIEM fields to Observables - e.g. the raw event field from the SIEM. This raw field includes all event data and helps the analyst to start further researching without going back to the SIEM. In some rare cases, this field is reaching a high amount of data....and as a consequence, also the observable. Another reason for us is, that we want to have everything documented to the alert/case.

Possible Solutions

Adjust indexing process for the field "data".

Complementary information

We have a TheHive instance with around 8.500 Alerts and 3.100 Cases. With our current version of TheHive we are facing more and more performance issues. For further usage, the Update to 4.1 is a must for us.

If there are no possibilities to adjust the indexing process, we have to use a new database for TheHive 4.1, change our workflow and keep a legacy system for accessing our data. Would be very grateful if you can help me.

@marpoe marpoe added bug TheHive4 TheHive4 related issues labels Oct 14, 2021
@nadouani
Copy link
Contributor

Hello @marpoe We will take a look and check the best possible solution to solve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug core TheHive4 TheHive4 related issues
Projects
None yet
Development

No branches or pull requests

3 participants