You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reindexation step fails when upgrading from TheHive 4.0.2-1.
In my opinion, it is similar to the closed issue #1861. The only difference is, that our problem is related to the size of the "data" field (see log file below). We are using this field for saving SIEM data within our "SIEM <> TheHive" integration, for more details to our workflow and use case, please see the explanation below.
TheHive application.log
2021-10-11 12:05:57,917 [ERROR] from org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor in Thread-66 - Unexpected error processing data: {}
java.lang.IllegalArgumentException: Document contains at least one immense term in field="data" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[104, 116, 116, 112, 58, 47, 47, 122, 99, 114, 109, 115, 116, 97, 116, 105, 99, 45, 97, 46, 97, 107, 97, 109, 97, 105, 104, 100, 46, 110]...', original message: bytes can be at most 32766 in length; got 77530
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:853)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1616)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1608)
at org.janusgraph.diskstorage.lucene.LuceneIndex.restore(LuceneIndex.java:305)
at org.janusgraph.diskstorage.indexing.IndexTransaction.restore(IndexTransaction.java:128)
at org.janusgraph.graphdb.olap.job.IndexRepairJob.workerIterationEnd(IndexRepairJob.java:201)
at org.janusgraph.graphdb.olap.VertexJobConverter.workerIterationEnd(VertexJobConverter.java:118)
at org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScannerExecutor$Processor.run(StandardScannerExecutor.java:285)
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 77530
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:265)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:843)
... 11 common frames omitted
Some points to our Workflow
Within our SIEM <> TheHive integration, we are creating TheHive-Alerts based on triggered SIEM Alerts. Here we are mapping the SIEM fields to Observables - e.g. the raw event field from the SIEM. This raw field includes all event data and helps the analyst to start further researching without going back to the SIEM. In some rare cases, this field is reaching a high amount of data....and as a consequence, also the observable. Another reason for us is, that we want to have everything documented to the alert/case.
Possible Solutions
Adjust indexing process for the field "data".
Complementary information
We have a TheHive instance with around 8.500 Alerts and 3.100 Cases. With our current version of TheHive we are facing more and more performance issues. For further usage, the Update to 4.1 is a must for us.
If there are no possibilities to adjust the indexing process, we have to use a new database for TheHive 4.1, change our workflow and keep a legacy system for accessing our data. Would be very grateful if you can help me.
The text was updated successfully, but these errors were encountered:
Request Type
Bug
Work Environment
Problem Description
The reindexation step fails when upgrading from TheHive 4.0.2-1.
In my opinion, it is similar to the closed issue #1861. The only difference is, that our problem is related to the size of the "data" field (see log file below). We are using this field for saving SIEM data within our "SIEM <> TheHive" integration, for more details to our workflow and use case, please see the explanation below.
TheHive application.log
Some points to our Workflow
Within our SIEM <> TheHive integration, we are creating TheHive-Alerts based on triggered SIEM Alerts. Here we are mapping the SIEM fields to Observables - e.g. the raw event field from the SIEM. This raw field includes all event data and helps the analyst to start further researching without going back to the SIEM. In some rare cases, this field is reaching a high amount of data....and as a consequence, also the observable. Another reason for us is, that we want to have everything documented to the alert/case.
Possible Solutions
Adjust indexing process for the field "data".
Complementary information
We have a TheHive instance with around 8.500 Alerts and 3.100 Cases. With our current version of TheHive we are facing more and more performance issues. For further usage, the Update to 4.1 is a must for us.
If there are no possibilities to adjust the indexing process, we have to use a new database for TheHive 4.1, change our workflow and keep a legacy system for accessing our data. Would be very grateful if you can help me.
The text was updated successfully, but these errors were encountered: