Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Migration issues from ES to Cassandra #1340

Closed
crackytsi opened this issue May 17, 2020 · 27 comments
Closed

[Bug] Migration issues from ES to Cassandra #1340

crackytsi opened this issue May 17, 2020 · 27 comments
Assignees
Labels
bug TheHive4 TheHive4 related issues
Milestone

Comments

@crackytsi
Copy link

Request Type

Bug

Work Environment

Question Answer
OS version (server) Debian
OS version (client) 10
TheHive version / git hash 4.0 RC2
Package Type DEB
Elasticsearch 5.6.10

Problem Description

I have several problems with the migration.
I haven't done any modification in elasticsearch, so this is a simple TheHive system that has just performed several updates after migrations from version 2 up to latest 3 version.

Except the issue, that the migration is very slow (see other issue, this are 250 MB) , there are several failures.
After 4 days with just some demo data migration stopped with this error message:

[error] o.t.t.m.t.DBConfiguration - ElasticSearch request failure: POST:/_search/scroll?
StringEntity({"scroll":"36000000ms","scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAMErFi1NWmZ5RGVRVFItTWF4SHZHWl9qaWcAAAAAAADBLBYtTVpmeURlUVRSLU1heEh2R1pfamlnAAAAAAAAwS0WLU1aZnlEZVFUUi1NYXhIdkdaX2ppZwAAAAAAAMEuFi1NWmZ5RGVRVFItTWF4SHZHWl9qaWcAAAAAAADBLxYtTVpmeURlUVRSLU1heEh2R1pfamln"},Some(application/json))
 => ElasticError(search_phase_execution_exception,all shards failed,None,None,None,List(ElasticError(search_context_missing_exception,No search context found for id [49455],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [49452],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [49451],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [49454],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [49453],None,None,None,null,None)),Some(CausedBy(search_context_missing_exception,No search context found for id [49453],Map())))
[warn] o.t.t.m.t.SearchWithScroll - Search error
org.thp.scalligraph.SearchError: all shards failed
        at org.thp.thehive.migration.th3.DBConfiguration.$anonfun$execute$2(DBConfiguration.scala:147)
        at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at org.thp.scalligraph.ContextPropagatingDisptacher$$anon$1.$anonfun$execute$2(ContextPropagatingDisptacher.scala:57)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.thp.scalligraph.DiagnosticContext$.$anonfun$withDiagnosticContext$2(ContextPropagatingDisptacher.scala:92)
        at org.thp.scalligraph.DiagnosticContext$.saveDiagnosticContext(ContextPropagatingDisptacher.scala:107)
        at org.thp.scalligraph.DiagnosticContext$.withDiagnosticContext(ContextPropagatingDisptacher.scala:90)
        at org.thp.scalligraph.DiagnosticContext$$anon$2.withContext(ContextPropagatingDisptacher.scala:75)
[error] o.t.t.m.t.DBConfiguration - ElasticSearch request failure: DELETE:/_search/scroll/?
StringEntity({"scroll_id":["DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAMEmFi1NWmZ5RGVRVFItTWF4SHZHWl9qaWcAAAAAAADBKBYtTVpmeURlUVRSLU1heEh2R1pfamlnAAAAAAAAwSoWLU1aZnlEZVFUUi1NYXhIdkdaX2ppZwAAAAAAAMEnFi1NWmZ5RGVRVFItTWF4SHZHWl9qaWcAAAAAAADBKRYtTVpmeURlUVRSLU1heEh2R1pfamln"]},Some(application/json))
 => ElasticError(404,404,None,None,None,List(),None)
[error] o.t.s.u.Retry - uncaught error, not retrying
java.lang.IllegalStateException: Cannot access element because its enclosing transaction is closed and unbound
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getNextTx(StandardJanusGraphTx.java:305)
        at org.janusgraph.graphdb.vertices.AbstractVertex.it(AbstractVertex.java:53)
        at org.janusgraph.graphdb.vertices.AbstractVertex.it(AbstractVertex.java:37)
        at org.janusgraph.graphdb.internal.AbstractElement.isRemoved(AbstractElement.java:141)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$3.isDeleted(StandardJanusGraphTx.java:1261)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$3.isDeleted(StandardJanusGraphTx.java:1188)
        at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:133)
        at com.google.common.collect.Iterators$5.computeNext(Iterators.java:637)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
[error] o.t.s.m.Database - Exception raised, rollback (Cannot access element because its enclosing transaction is closed and unbound)
[error] o.t.t.m.Migrate - InputObservable creation failure
java.lang.IllegalStateException: Cannot access element because its enclosing transaction is closed and unbound
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getNextTx(StandardJanusGraphTx.java:305)
        at org.janusgraph.graphdb.vertices.AbstractVertex.it(AbstractVertex.java:53)
        at org.janusgraph.graphdb.vertices.AbstractVertex.it(AbstractVertex.java:37)
        at org.janusgraph.graphdb.internal.AbstractElement.isRemoved(AbstractElement.java:141)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$3.isDeleted(StandardJanusGraphTx.java:1261)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$3.isDeleted(StandardJanusGraphTx.java:1188)
        at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:133)
        at com.google.common.collect.Iterators$5.computeNext(Iterators.java:637)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
[info] o.t.t.m.t.Output - Create observable %UserProfile%\AppData\Roaming\[victim_id].pek in alert 207847504
[warn] o.j.d.i.ConsistentKeyIDAuthority - Temporary storage exception while acquiring id block - retrying in PT0.6S: {}
org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at io.vavr.API$Match$Case0.apply(API.java:3174)
        at io.vavr.API$Match.of(API.java:3137)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:123)
        at io.vavr.control.Try.getOrElseThrow(Try.java:671)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:290)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.lambda$getCurrentID$0(ConsistentKeyIDAuthority.java:172)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getCurrentID(ConsistentKeyIDAuthority.java:171)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getIDBlock(ConsistentKeyIDAuthority.java:232)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:288)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Could not send request, session is closed
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
        at io.vavr.control.Try.of(Try.java:62)
        at io.vavr.concurrent.FutureImpl.lambda$run$2(FutureImpl.java:199)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        ... 4 common frames omitted
Caused by: java.lang.IllegalStateException: Could not send request, session is closed
        at com.datastax.driver.core.SessionManager.execute(SessionManager.java:693)
        at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:143)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:282)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.lambda$getCurrentID$0(ConsistentKeyIDAuthority.java:172)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getCurrentID(ConsistentKeyIDAuthority.java:171)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getIDBlock(ConsistentKeyIDAuthority.java:232)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:288)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:255)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[warn] o.j.d.i.ConsistentKeyIDAuthority - Temporary storage exception while acquiring id block - retrying in PT1.2S: {}
org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at io.vavr.API$Match$Case0.apply(API.java:3174)
        at io.vavr.API$Match.of(API.java:3137)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:123)
        at io.vavr.control.Try.getOrElseThrow(Try.java:671)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:290)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.lambda$getCurrentID$0(ConsistentKeyIDAuthority.java:172)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getCurrentID(ConsistentKeyIDAuthority.java:171)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getIDBlock(ConsistentKeyIDAuthority.java:232)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:288)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Could not send request, session is closed
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
        at io.vavr.control.Try.of(Try.java:62)
        at io.vavr.concurrent.FutureImpl.lambda$run$2(FutureImpl.java:199)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        ... 4 common frames omitted
Caused by: java.lang.IllegalStateException: Could not send request, session is closed
        at com.datastax.driver.core.SessionManager.execute(SessionManager.java:693)
        at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:143)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:282)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.lambda$getCurrentID$0(ConsistentKeyIDAuthority.java:172)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getCurrentID(ConsistentKeyIDAuthority.java:171)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getIDBlock(ConsistentKeyIDAuthority.java:232)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:288)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:255)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[warn] o.j.d.i.ConsistentKeyIDAuthority - Temporary storage exception while acquiring id block - retrying in PT2.4S: {}
org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
        at io.vavr.API$Match$Case0.apply(API.java:3174)
        at io.vavr.API$Match.of(API.java:3137)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$0(CQLKeyColumnValueStore.java:123)
        at io.vavr.control.Try.getOrElseThrow(Try.java:671)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:290)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.lambda$getCurrentID$0(ConsistentKeyIDAuthority.java:172)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getCurrentID(ConsistentKeyIDAuthority.java:171)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getIDBlock(ConsistentKeyIDAuthority.java:232)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:288)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Could not send request, session is closed
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
        at io.vavr.control.Try.of(Try.java:62)
        at io.vavr.concurrent.FutureImpl.lambda$run$2(FutureImpl.java:199)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        ... 4 common frames omitted
Caused by: java.lang.IllegalStateException: Could not send request, session is closed
        at com.datastax.driver.core.SessionManager.execute(SessionManager.java:693)
        at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:143)
        at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:282)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.lambda$getCurrentID$0(ConsistentKeyIDAuthority.java:172)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:147)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getCurrentID(ConsistentKeyIDAuthority.java:171)
        at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getIDBlock(ConsistentKeyIDAuthority.java:232)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:288)
        at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:255)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

During import several times errors like this appeared:

^[[31merror^[[0m] o.t.t.m.t.Input - Alert observable read failure:List(
 - : JsonValidationError(List('attachment' is undefined on object: {"tlp":2,"remoteAttachment":{"filename":"20171114-ELF-2.png","reference":"194081","type":"attachment"},"tags":["MISP:type=attachment","MISP:category=External analysis","src:misp-malware-sharing-prod","misp"],"message":"An overview of the malware’s operation.","dataType":"file","startDate":1512310223000}),WrappedArray()))

[^[[31merror^[[0m] o.t.t.m.t.Input - Alert observable read failure:List(
 - : JsonValidationError(List('attachment' is undefined on object: {"tlp":2,"remoteAttachment":{"filename":"tizi1.png","reference":"194158","type":"attachment"},"tags":["MISP:type=attachment","MISP:category=Payload delivery","src:misp-malware-sharing-prod","misp"],"message":"Here is an example social media post promoting a Tizi-infected app","dataType":"file","startDate":1512299720000}),WrappedArray()))

Can you please help here?

@crackytsi crackytsi added TheHive4 TheHive4 related issues bug labels May 17, 2020
@To-om To-om added this to the 4.0.0-RC3 milestone May 17, 2020
@To-om To-om self-assigned this May 17, 2020
To-om added a commit that referenced this issue May 22, 2020
@To-om
Copy link
Contributor

To-om commented May 22, 2020

This commit should fix your general problem and improve performance of migration.
Regarding the missing attachment, this is an old observable format from MISP alert (before TheHive 2.13.0).
This kind of attachment is not downloaded from MISP during alert creation but when the alert is imported into case.
I'm sorry to say that I can't migrate these kind of observable into TheHive 4.

@To-om To-om closed this as completed May 22, 2020
To-om added a commit that referenced this issue May 25, 2020
To-om added a commit that referenced this issue May 25, 2020
@mamoedo
Copy link

mamoedo commented Jun 1, 2020

It keeps failing on RC3

Question Answer
OS version (server) Ubuntu
OS version (client) 18.04
TheHive version / git hash 4.0 RC3
Package Type DEB
Elasticsearch 6.7.0
[info] o.t.t.m.t.Output - Create task [1234] Sensitive info in case 177868816
[error] o.t.s.u.Retry - uncaught error, not retrying
org.thp.scalligraph.NotFoundError: User [email protected] not found
        at org.thp.scalligraph.services.VertexSrv.$anonfun$getOrFail$1(VertexSrv.scala:33)
        at scala.Option.fold(Option.scala:251)
        at org.thp.scalligraph.services.VertexSrv.getOrFail(VertexSrv.scala:33)
        at org.thp.thehive.migration.th4.Output.$anonfun$createCaseTask$4(Output.scala:337)
        at scala.Option.map(Option.scala:230)
        at org.thp.thehive.migration.th4.Output.$anonfun$createCaseTask$2(Output.scala:337)
        at org.thp.thehive.migration.th4.Output.$anonfun$authTransaction$1(Output.scala:161)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$tryTransaction$10(JanusDatabase.scala:160)
        at scala.util.Try$.apply(Try.scala:213)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$tryTransaction$8(JanusDatabase.scala:160)
[error] o.t.s.m.Database - Exception raised, rollback (User [email protected] not found)
[error] o.t.t.m.Migrate - Case/Task creation failure: org.thp.scalligraph.NotFoundError: User [email protected] not found
[error] o.t.t.m.t.DBConfiguration - ElasticSearch request failure: POST:/_search/scroll?
StringEntity({"scroll":"82800000ms","scroll_id":"DnF1ZXJ5VGhlbkZldGNoBAAAAAAAAACXFjdzaTEtXzY1UVgyZFN6SGdqTnBKVncAAAAAAAAAmBY3c2kxLV82NVFYMmRTekhnak5wSlZ3AAAAAAAAAJkWN3NpMS1fNjVRWDJkU3pIZ2pOcEpWdwAAAAAAAACaFjdzaTEtXzY1UVgyZFN6SGdqTnBKVnc="},Some(application/json))
 => ElasticError(search_phase_execution_exception,all shards failed,None,None,None,List(ElasticError(search_context_missing_exception,No search context found for id [151],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [152],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [153],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [154],None,None,None,null,None)),Some(CausedBy(search_context_missing_exception,No search context found for id [154],Map())))
[warn] o.t.t.m.t.SearchWithScroll - Search error
org.thp.scalligraph.SearchError: all shards failed
        at org.thp.thehive.migration.th3.DBConfiguration.$anonfun$execute$2(DBConfiguration.scala:147)
        at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at org.thp.scalligraph.ContextPropagatingDisptacher$$anon$1.$anonfun$execute$2(ContextPropagatingDisptacher.scala:57)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.thp.scalligraph.DiagnosticContext$.$anonfun$withDiagnosticContext$2(ContextPropagatingDisptacher.scala:92)
        at org.thp.scalligraph.DiagnosticContext$.saveDiagnosticContext(ContextPropagatingDisptacher.scala:107)
        at org.thp.scalligraph.DiagnosticContext$.withDiagnosticContext(ContextPropagatingDisptacher.scala:90)
        at org.thp.scalligraph.DiagnosticContext$$anon$2.withContext(ContextPropagatingDisptacher.scala:75)
Exception in thread "main" org.thp.scalligraph.SearchError: Request terminated early or timed out
        at org.thp.thehive.migration.th3.SearchWithScroll$$anon$1$$anon$2.$anonfun$onPull$1(DBFind.scala:197)
        at org.thp.thehive.migration.th3.SearchWithScroll$$anon$1$$anon$2.$anonfun$onPull$1$adapted(DBFind.scala:177)
        at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:466)
        at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:497)
        at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:599)
        at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:768)
        at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:783)
        at akka.actor.Actor.aroundReceive(Actor.scala:534)
        at akka.actor.Actor.aroundReceive$(Actor.scala:532)
        at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:690)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
        at akka.actor.ActorCell.invoke(ActorCell.scala:543)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
        at akka.dispatch.Mailbox.run(Mailbox.scala:230)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
[warn] o.t.t.m.t.Output - User [email protected] not found, use system user
[info] o.t.t.m.t.Output - Create audit update on Some(Case) Some(177868816)

@To-om To-om reopened this Jun 1, 2020
@To-om To-om modified the milestones: 4.0.0-RC3, 4.0.0-RC4 Jun 1, 2020
@To-om
Copy link
Contributor

To-om commented Jun 1, 2020

The first error (User [email protected] not found) is partially fixed by #1374.
Then I'll prevent failure if user is not found.
I'll also investigate the "Search error"

@To-om
Copy link
Contributor

To-om commented Jun 1, 2020

Can you provide the same logs from /var/log/thehive/application.conf (with timing information) ? I'd like to know if the two errors are related.
The error "No search context found" occurs when the time between two ElasticSearch requests is too long (more that 24h, which doesn't seem probable) or if the search context has been explicitly removed (which happens only at the end of the search).
How long does it take before this problem appears?

@mamoedo
Copy link

mamoedo commented Jun 1, 2020

It was running for 4 days when the crash happened. It crashed frequently before RC3.

I'm attaching the requested logs:

2020-05-31 14:08:52,762 [ERROR] from org.thp.scalligraph.utils.Retry in TheHiveMigration-akka.actor.default-dispatcher-26 - uncaught error, not retrying
org.thp.scalligraph.NotFoundError: org.thp.thehive.models.ObservableType not found
        at org.thp.scalligraph.steps.StepsOps$TraversalOps.$anonfun$getOrFail$1(StepsOps.scala:315)
        at scala.Option.fold(Option.scala:251)
        at org.thp.scalligraph.steps.StepsOps$TraversalOps.getOrFail(StepsOps.scala:315)
        at org.thp.scalligraph.steps.StepsOps$TraversalOps.getOrFail(StepsOps.scala:311)
        at org.thp.thehive.migration.th4.Output.$anonfun$getObservableType$1(Output.scala:363)
        at play.api.cache.ehcache.SyncEhCacheApi.getOrElseUpdate(EhCacheApi.scala:213)
        at org.thp.thehive.migration.th4.Output.getObservableType(Output.scala:362)
        at org.thp.thehive.migration.th4.Output.$anonfun$createAlertObservable$2(Output.scala:459)
        at org.thp.thehive.migration.th4.Output.$anonfun$authTransaction$1(Output.scala:161)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$tryTransaction$10(JanusDatabase.scala:160)
        at scala.util.Try$.apply(Try.scala:213)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$tryTransaction$8(JanusDatabase.scala:160)
        at scala.util.Try$.apply(Try.scala:213)
        at org.thp.scalligraph.utils.DelayRetry.withTry(Retry.scala:88)
        at org.thp.scalligraph.janus.JanusDatabase.tryTransaction(JanusDatabase.scala:154)
        at org.thp.thehive.migration.th4.Output.authTransaction(Output.scala:160)
        at org.thp.thehive.migration.th4.Output.createAlertObservable(Output.scala:456)
        at org.thp.thehive.migration.MigrationOps.$anonfun$migrateAWholeAlert$1(MigrationOps.scala:213)
        at org.thp.thehive.migration.MigrationOps.$anonfun$migrateWithParent$2(MigrationOps.scala:91)
        at scala.util.Success.flatMap(Try.scala:251)
        at org.thp.thehive.migration.MigrationOps.$anonfun$migrateWithParent$1(MigrationOps.scala:91)
        at akka.stream.impl.fusing.Map$$anon$1.onPush(Ops.scala:54)
        at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:541)
        at akka.stream.impl.fusing.GraphInterpreter.processEvent(GraphInterpreter.scala:495)
        at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:390)
        at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:624)
        at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:501)
        at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:599)
        at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:768)
        at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:783)
        at akka.actor.Actor.aroundReceive(Actor.scala:534)
        at akka.actor.Actor.aroundReceive$(Actor.scala:532)
        at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:690)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
        at akka.actor.ActorCell.invoke(ActorCell.scala:543)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
        at akka.dispatch.Mailbox.run(Mailbox.scala:230)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
2020-05-31 14:08:52,762 [ERROR] from org.thp.scalligraph.models.Database in TheHiveMigration-akka.actor.default-dispatcher-26 - Exception raised, rollback (org.thp.thehive.models.ObservableType not found)
2020-05-31 14:08:52,762 [ERROR] from org.thp.thehive.migration.Migrate in TheHiveMigration-akka.actor.default-dispatcher-26 - Alert/Observable creation failure: org.thp.scalligraph.NotFoundError: org.thp.thehive.models.ObservableType not found
2020-05-31 14:08:52,762 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-26 - Create observable 123.123.123.123 in alert 260292648
2020-05-31 14:09:12,676 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-20 - Create audit create on Some(Alert) Some(260292648)
2020-05-31 14:09:12,683 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-20 - Create audit create on Some(Alert) Some(260292648)
2020-05-31 14:09:12,691 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-26 - Create audit create on Some(Alert) Some(260292648)
2020-05-31 14:09:12,698 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit create on Some(Alert) Some(260292648)
2020-05-31 14:09:12,706 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-5 - Create audit create on Some(Alert) Some(260292648)
2020-05-31 14:09:12,724 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:12,724 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create case #924
2020-05-31 14:09:12,754 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:12,754 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create task [1234] Sensitive info 177868816
2020-05-31 14:09:12,757 [ERROR] from org.thp.scalligraph.utils.Retry in TheHiveMigration-akka.actor.default-dispatcher-21 - uncaught error, not retrying
org.thp.scalligraph.NotFoundError: User [email protected] not found
        at org.thp.scalligraph.services.VertexSrv.$anonfun$getOrFail$1(VertexSrv.scala:33)
        at scala.Option.fold(Option.scala:251)
        at org.thp.scalligraph.services.VertexSrv.getOrFail(VertexSrv.scala:33)
        at org.thp.thehive.migration.th4.Output.$anonfun$createCaseTask$4(Output.scala:337)
        at scala.Option.map(Option.scala:230)
        at org.thp.thehive.migration.th4.Output.$anonfun$createCaseTask$2(Output.scala:337)
        at org.thp.thehive.migration.th4.Output.$anonfun$authTransaction$1(Output.scala:161)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$tryTransaction$10(JanusDatabase.scala:160)
        at scala.util.Try$.apply(Try.scala:213)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$tryTransaction$8(JanusDatabase.scala:160)
        at scala.util.Try$.apply(Try.scala:213)
        at org.thp.scalligraph.utils.DelayRetry.withTry(Retry.scala:88)
        at org.thp.scalligraph.janus.JanusDatabase.tryTransaction(JanusDatabase.scala:154)
        at org.thp.thehive.migration.th4.Output.authTransaction(Output.scala:160)
        at org.thp.thehive.migration.th4.Output.createCaseTask(Output.scala:334)
        at org.thp.thehive.migration.MigrationOps.$anonfun$migrateAWholeCase$1(MigrationOps.scala:163)
        at org.thp.thehive.migration.MigrationOps.$anonfun$migrateWithParent$2(MigrationOps.scala:91)
        at scala.util.Success.flatMap(Try.scala:251)
        at org.thp.thehive.migration.MigrationOps.$anonfun$migrateWithParent$1(MigrationOps.scala:91)
        at akka.stream.impl.fusing.Map$$anon$1.onPush(Ops.scala:54)
        at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:541)
        at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:423)
        at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:624)
        at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:501)
        at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:599)
        at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:768)
        at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:783)
        at akka.actor.Actor.aroundReceive(Actor.scala:534)
        at akka.actor.Actor.aroundReceive$(Actor.scala:532)
        at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:690)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
        at akka.actor.ActorCell.invoke(ActorCell.scala:543)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
        at akka.dispatch.Mailbox.run(Mailbox.scala:230)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
2020-05-31 14:09:12,757 [ERROR] from org.thp.scalligraph.models.Database in TheHiveMigration-akka.actor.default-dispatcher-21 - Exception raised, rollback (User [email protected] not found)
2020-05-31 14:09:12,757 [ERROR] from org.thp.thehive.migration.Migrate in TheHiveMigration-akka.actor.default-dispatcher-21 - Case/Task creation failure: org.thp.scalligraph.NotFoundError: User [email protected] not found
2020-05-31 14:09:12,878 [ERROR] from org.thp.thehive.migration.th3.DBConfiguration in TheHiveMigration-akka.actor.default-dispatcher-93 - ElasticSearch request failure: POST:/_search/scroll?
StringEntity({"scroll":"82800000ms","scroll_id":"DnF1ZXJ5VGhlbkZldGNoBAAAAAAAAACXFjdzaTEtXzY1UVgyZFN6SGdqTnBKVncAAAAAAAAAmBY3c2kxLV82NVFYMmRTekhnak5wSlZ3AAAAAAAAAJkWN3NpMS1fNjVRWDJkU3pIZ2pOcEpWdwAAAAAAAACaFjdzaTEtXzY1UVgyZFN6SGdqTnBKVnc="},Some(application/json))
 => ElasticError(search_phase_execution_exception,all shards failed,None,None,None,List(ElasticError(search_context_missing_exception,No search context found for id [151],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [152],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [153],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [154],None,None,None,null,None)),Some(CausedBy(search_context_missing_exception,No search context found for id [154],Map())))
2020-05-31 14:09:12,885 [WARN] from org.thp.thehive.migration.th3.SearchWithScroll in TheHiveMigration-akka.actor.default-dispatcher-19 - Search error
org.thp.scalligraph.SearchError: all shards failed
        at org.thp.thehive.migration.th3.DBConfiguration.$anonfun$execute$2(DBConfiguration.scala:147)
        at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at org.thp.scalligraph.ContextPropagatingDisptacher$$anon$1.$anonfun$execute$2(ContextPropagatingDisptacher.scala:57)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.thp.scalligraph.DiagnosticContext$.$anonfun$withDiagnosticContext$2(ContextPropagatingDisptacher.scala:92)
        at org.thp.scalligraph.DiagnosticContext$.saveDiagnosticContext(ContextPropagatingDisptacher.scala:107)
        at org.thp.scalligraph.DiagnosticContext$.withDiagnosticContext(ContextPropagatingDisptacher.scala:90)
        at org.thp.scalligraph.DiagnosticContext$$anon$2.withContext(ContextPropagatingDisptacher.scala:75)
        at org.thp.scalligraph.ContextPropagatingDisptacher$$anon$1.$anonfun$execute$1(ContextPropagatingDisptacher.scala:57)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:47)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:47)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
2020-05-31 14:09:12,948 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:12,948 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:12,962 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:12,962 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:12,974 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:12,974 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:12,985 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:12,985 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:12,996 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:12,996 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,008 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,008 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,021 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,021 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,032 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,032 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,042 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,043 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,054 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,054 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,069 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,069 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,080 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,080 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,093 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,093 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,104 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,104 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,116 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,116 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,127 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,127 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit update on Some(Case) Some(177868816)
2020-05-31 14:09:13,139 [WARN] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - User [email protected] not found, use system user
2020-05-31 14:09:13,139 [INFO] from org.thp.thehive.migration.th4.Output in TheHiveMigration-akka.actor.default-dispatcher-21 - Create audit create on Some(Case) Some(177868816)

@To-om
Copy link
Contributor

To-om commented Jun 1, 2020

Unless you have several terabytes of data, the RC3 migration should not take 4 days.
The RC3 migration did it started from a empty database ? The structure of the database has changed in order to speed up data creation.
Drop your data (on cassandra host run cqlsh --execute "DROP KEYSPACE thehive") or change the keyspace (db.janusgraph.storage.cql.keyspace in application.conf), then the migration could start from empty database. This will increase the performance.

@mamoedo
Copy link

mamoedo commented Jun 1, 2020

Thanks for the tip, I didn't know that I had to start from scratch.

It's been running for 3 hours now. It started really fast but the performance is slower and slower as time passes by. Now, the speed is 3 seconds per observable. Still faster than the previous migration (8 seconds per observable) but I'm afraid that it will get slower.

I'm not migrating terabytes, just about 20GB. Also, my host's specs are not bad at all.

@mamoedo
Copy link

mamoedo commented Jun 2, 2020

After 19 hours, 202 alerts and 11 cases migrated. The performance is bad at the moment so I'm resuming the previous migration.

@To-om
Copy link
Contributor

To-om commented Jun 2, 2020

How many observables do you have ?

@mamoedo
Copy link

mamoedo commented Jun 2, 2020

I don't know exactly, but I think I have about 200K - 300K

@To-om
Copy link
Contributor

To-om commented Jun 3, 2020

You can improve performance by adding application.conf:

  janusgraph {
    [...]
    ids.block-size: 1000000
    storage.batch-loading: true
  }

Both settings should be set only for migration.
batch-loading can be enabled only if the database schema has already been created (only if it is a resume or after starting TheHive).

@crackytsi
Copy link
Author

Both settings should be set only for migration.
batch-loading can be enabled only if the database schema has already been created (only if it is a resume or after starting TheHive).

So is it enough to create an additional superuser? Or do I have to create the Org that the database schema is prepared?

@crackytsi
Copy link
Author

I also already have a migration time more than 4 days, having usually "only" arround 30-40 Observables. Currently 1600 Cases of 7000 are imported.

@To-om
Copy link
Contributor

To-om commented Jun 3, 2020

The version RC3 of migration increases migration speed if the database schema has been created by it (i.e. it is not a resume from previous migration version).
If your database has been already created, you can start TheHive, it will update the db schema (if needed). Then stop TheHive and resume migration.
Database schema is the structure of the database (type of entities, properties), it is different from the data itself.

@crackytsi
Copy link
Author

Ok thanks :), So I' already using the new database schema. First it was much faster during migration, but later it got slower and slower.

@mamoedo
Copy link

mamoedo commented Jun 3, 2020

You can improve performance by adding application.conf:

  janusgraph {
    [...]
    ids.block-size: 1000000
    storage.batch-loading: true
  }

Both settings should be set only for migration.
batch-loading can be enabled only if the database schema has already been created (only if it is a resume or after starting TheHive).

Do you know how to properly set this value?

[warn] o.j.d.c.b.ReadConfigurationBuilder - Local setting ids.block-size=1000000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000). Use the ManagementSystem interface instead of the local configuration to control this setting.

To-om added a commit that referenced this issue Jun 8, 2020
To-om added a commit that referenced this issue Jun 8, 2020
To-om added a commit that referenced this issue Jun 8, 2020
@To-om
Copy link
Contributor

To-om commented Jun 8, 2020

@mamoedo In fact, you can't change ids.block-size by updating the configuration. This setting is set when the database is create and can be updated only programmatically.

@mamoedo
Copy link

mamoedo commented Jun 8, 2020

@To-om I see that you made some changes to improve the migration. Is it going to be available soon on a RC so we all can speed up the migrations?

@mamoedo
Copy link

mamoedo commented Jun 8, 2020

Also, I've been testing performance and some alternatives to speed up the migration and I found that creating alerts with thehive4py 1.5.3, the performance problem seems to be on the observables upload. Is it checking if the observable already exists in cassandra before creating it? That could explain the performance degradation over time.

A simple alert, with 7 observables, takes 1 minute.

user@machine:~/TheHive4$ time python3 testAPI.py
real    1m1.071s
user    0m0.256s
sys     0m0.023s

Here you can see the script used:

from thehive4py.api import TheHiveApi
from thehive4py.models import Alert, CaseTask, CaseObservable, Case, AlertArtifact
from thehive4py.query import Eq, String
import uuid

api_local = TheHiveApi("ip", "apikey", cert=False)
artifacts = [
    AlertArtifact(dataType='ip', data='8.8.8.8'),
    AlertArtifact(dataType='ip', data='8.8.8.9'),
    AlertArtifact(dataType='ip', data='8.8.9.8'),
    AlertArtifact(dataType='ip', data='8.9.8.8'),
    AlertArtifact(dataType='domain', data='google.com'),
    AlertArtifact(dataType='domain', data='google.br'),
    AlertArtifact(dataType='domain', data='google.uk'),
]
sourceRef = str(uuid.uuid4())[0:6]
alert = Alert(
    title = "Test",
    description = "Test desc",
    tlp = 1,
    severity = 1,
    type = "test",
    source = "test",
    artifacts=artifacts,
    sourceRef = sourceRef
)
api_local.create_alert(alert)

@mamoedo
Copy link

mamoedo commented Jun 12, 2020

Last week's speed was about 7-10 seconds per observable. Today it's taking 30 seconds per observable.

I don't think this will finish anytime soon :(

@To-om
Copy link
Contributor

To-om commented Jun 12, 2020

The next release will improve migration process (not yet committed). On my test database the observable creation go from 275ms (average over all the migration) to 7ms. The time becomes stable during the migration.

@crackytsi
Copy link
Author

@To-om Using this patches (yes really!) ;) , the import seems to still get slower and slower.
Today I stopped the import after 2 days having 916 Cases imported, it seems it got stuck. The last limes repeat (i guess) endless.

Here some error log, it seems there is also a scroll-things with ES that makes issues:

[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658(11188ms) Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:9/16 Audit:29553/310256 CaseTemplate:21/21 Alert/Observable:26 Case/Observable:3895/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951(6ms)
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:9/16 Audit:29553/310256 CaseTemplate:21/21 Alert/Observable:26 Case/Observable:3895/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:9/16 Audit:29553/310256 CaseTemplate:21/21 Alert/Observable:26 Case/Observable:3895/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[33mwarn^[[0m] o.j.g.t.StandardJanusGraphTx - Query requires iterating over all vertices [(_label = Tag AND namespace = Threshold AND predicate = 54)]. For better performance, use indexes
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:9/16 Audit:29553/310256 CaseTemplate:21/21 Alert/Observable:26 Case/Observable:3895/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:9/16 Audit:29553/310256 CaseTemplate:21/21 Alert/Observable:26 Case/Observable:3895/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[33mwarn^[[0m] o.j.g.t.StandardJanusGraphTx - Query requires iterating over all vertices [(_label = Tag AND namespace = user AND predicate = 57339)]. For better performance, use indexes
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:9/16 Audit:29553/310256 CaseTemplate:21/21 Alert/Observable:26 Case/Observable:3895/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:9/16 Audit:29553/310256 CaseTemplate:21/21 Alert/Observable:26 Case/Observable:3895/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[33mwarn^[[0m] o.j.g.t.StandardJanusGraphTx - Query requires iterating over all vertices [(_label = CaseTemplate AND name = AlertImport)]. For better performance, use indexes
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:9/16 Audit:29569/310256(2ms) CaseTemplate:21/21 Alert/Observable:26 Case/Observable:3896/46450(66023ms) User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[31merror^[[0m] o.t.t.m.t.DBConfiguration - ElasticSearch request failure: POST:/_search/scroll?
StringEntity({"scroll":"36000000ms","scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAKeGFi1NWmZ5RGVRVFItTWF4SHZHWl9qaWcAAAAAAACniRYtTVpmeURlUVRSLU1heEh2R1pfamlnAAAAAAAAp4cWLU1aZnlEZVFUUi1NYXhIdkdaX2ppZwAAAAAAAKeKFi1NWmZ5RGVRVFItTWF4SHZHWl9qaWcAAAAAAACniBYtTVpmeURlUVRSLU1heEh2R1pfamln"},Some(application/json))
 => ElasticError(search_phase_execution_exception,all shards failed,None,None,None,List(ElasticError(search_context_missing_exception,No search context found for id [42888],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [42887],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [42890],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [42889],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [42886],None,None,None,null,None)),Some(CausedBy(search_context_missing_exception,No search context found for id [42886],Map())))
 => ElasticError(search_phase_execution_exception,all shards failed,None,None,None,List(ElasticError(search_context_missing_exception,No search context found for id [42888],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [42887],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [42890],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [42889],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [42886],None,None,None,null,None)),Some(CausedBy(search_context_missing_exception,No search context found for id [42886],Map())))
[^[[33mwarn^[[0m] o.t.t.m.t.SearchWithScroll - Search error
org.thp.scalligraph.SearchError: all shards failed
        at org.thp.thehive.migration.th3.DBConfiguration.$anonfun$execute$2(DBConfiguration.scala:147)
        at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at org.thp.scalligraph.ContextPropagatingDisptacher$$anon$1.$anonfun$execute$2(ContextPropagatingDisptacher.scala:57)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.thp.scalligraph.DiagnosticContext$.$anonfun$withDiagnosticContext$2(ContextPropagatingDisptacher.scala:92)
        at org.thp.scalligraph.DiagnosticContext$.saveDiagnosticContext(ContextPropagatingDisptacher.scala:107)
        at org.thp.scalligraph.DiagnosticContext$.withDiagnosticContext(ContextPropagatingDisptacher.scala:90)
        at org.thp.scalligraph.DiagnosticContext$$anon$2.withContext(ContextPropagatingDisptacher.scala:75)
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:10/16(13531ms) Audit:29573/310256(5ms) CaseTemplate:21/21 Alert/Observable:27(27ms) Case/Observable:3896/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:10/16 Audit:29573/310256 CaseTemplate:21/21 Alert/Observable:27 Case/Observable:3896/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:10/16 Audit:29573/310256 CaseTemplate:21/21 Alert/Observable:27 Case/Observable:3896/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:10/16 Audit:29573/310256 CaseTemplate:21/21 Alert/Observable:27 Case/Observable:3896/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:10/16 Audit:29573/310256 CaseTemplate:21/21 Alert/Observable:27 Case/Observable:3896/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:10/16 Audit:29573/310256 CaseTemplate:21/21 Alert/Observable:27 Case/Observable:3896/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951
[^[[37minfo^[[0m] o.t.t.m.Migrate - CaseTemplate/Task:42 Organisation:1/1 Case/Task:1288/11658 Case:916/8575 Job:651/5546 ObservableType:4/18 Alert:10/16 Audit:29573/310256 CaseTemplate:21/21 Alert/Observable:27 Case/Observable:3896/46450 User:70/70 CustomField:9/9 Case/Task/Log:1401/18951

@mamoedo
Copy link

mamoedo commented Jun 16, 2020

I have the same error almost every week. Now it takes 7 hours to resume the migration and start importing observables after a crash.

@To-om
Copy link
Contributor

To-om commented Jun 17, 2020

In order to increase write performance, I removed the indexes. But without an index, the search is very slow, that's why I keep the identifiers of certain objects (case model and tag for example) during the migration. When the migration resumes, the identifiers of the objects already migrated are not known and are searched in the database (warning "The request requires an iteration on all the vertices")

I will prevent all searches during migration. I can create a new object instead of retrieving the existing one. Then the migration process will remove duplicates when the indexes are operational.

@mamoedo
Copy link

mamoedo commented Jul 30, 2020

Using the new 4.0.0 release (https://download.thehive-project.org/thehive4-4.0.0-1.zip) I'm also experiencing performance downgrade since the start from scratch of the migration. It has been 4 days since it started and it's not over yet.

Here are some logs showing the performance issue, specially on alerts creation:

cat migration.log | cut -d " " -f11 | grep ms


Alert:178(86ms)
Alert:187(90ms)
Alert:189(88ms)
Alert:190(89ms)
Alert:192(98ms)
Alert:194(96ms)
Alert:225(104ms)

...

Alert:5635(2738ms)
Alert:5639(2739ms)
Alert:5642(2759ms)

...

Alert:29405(16658ms)
Alert:29406(16626ms)

@mamoedo
Copy link

mamoedo commented Aug 10, 2020

Migration finished after 12 days (too much compared to other migrations with less data #1465 ). But it showed some strange errors at the end. Is it something to worry about, @To-om? Also, do you know a way to remove the audits before migrating? They seem to be the biggest and less important thing for me to migrate

[info] [Migrate cases and alerts] CaseTemplate/Task:77 Case/Task:1059 Case:887 Job:125746 ObservableType:46 Alert:41532 Audit:16233165 CaseTemplate:15 Case/Observable:126274 Alert/Observable:1487747 User:2 CustomField:2 Case/Task/Log:379
[error] ElasticSearch request failure: POST:/_search/scroll?
StringEntity({"scroll":"82800000ms","scroll_id":"DnF1ZXJ5VGhlbkZldGNoBQAAAAAAl_n0FjdzaTEtXzY1UVgyZFN6SGdqTnBKVncAAAAAAJf59RY3c2kxLV82NVFYMmRTekhnak5wSlZ3AAAAAACX-fYWN3NpMS1fNjVRWDJkU3pIZ2pOcEpWdwAAAAAAl_n4FjdzaTEtXzY1UVgyZFN6SGdqTnBKVncAAAAAAJf59xY3c2kxLV82NVFYMmRTekhnak5wSlZ3"},Some(application/json))
 => ElasticError(search_phase_execution_exception,all shards failed,None,None,None,List(ElasticError(search_context_missing_exception,No search context found for id [9959924],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [9959925],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [9959926],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [9959928],None,None,None,null,None), ElasticError(search_context_missing_exception,No search context found for id [9959927],None,None,None,null,None)),Some(CausedBy(search_context_missing_exception,No search context found for id [9959927],Map())))
[warn] Search error
org.thp.scalligraph.SearchError: all shards failed
        at org.thp.thehive.migration.th3.DBConfiguration.$anonfun$execute$2(DBConfiguration.scala:146)
        at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at org.thp.scalligraph.ContextPropagatingDisptacher$$anon$1.$anonfun$execute$2(ContextPropagatingDisptacher.scala:56)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.thp.scalligraph.DiagnosticContext$.$anonfun$withDiagnosticContext$2(ContextPropagatingDisptacher.scala:91)
        at org.thp.scalligraph.DiagnosticContext$.saveDiagnosticContext(ContextPropagatingDisptacher.scala:106)
        at org.thp.scalligraph.DiagnosticContext$.withDiagnosticContext(ContextPropagatingDisptacher.scala:89)
        at org.thp.scalligraph.DiagnosticContext$$anon$2.withContext(ContextPropagatingDisptacher.scala:74)
[error] ElasticSearch request failure: DELETE:/_search/scroll/?
StringEntity({"scroll_id":["DnF1ZXJ5VGhlbkZldGNoBQAAAAAAl_nvFjdzaTEtXzY1UVgyZFN6SGdqTnBKVncAAAAAAJf58BY3c2kxLV82NVFYMmRTekhnak5wSlZ3AAAAAACX-fEWN3NpMS1fNjVRWDJkU3pIZ2pOcEpWdwAAAAAAl_nyFjdzaTEtXzY1UVgyZFN6SGdqTnBKVncAAAAAAJf58xY3c2kxLV82NVFYMmRTekhnak5wSlZ3"]},Some(application/json))
 => ElasticError(404,404,None,None,None,List(),None)
[error] Migration failed
org.thp.scalligraph.SearchError: Request terminated early or timed out
        at org.thp.thehive.migration.th3.SearchWithScroll$$anon$1$$anon$2.$anonfun$onPull$1(DBFind.scala:196)
        at org.thp.thehive.migration.th3.SearchWithScroll$$anon$1$$anon$2.$anonfun$onPull$1$adapted(DBFind.scala:176)
        at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:466)
        at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:498)
        at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:600)
        at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:769)
        at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:784)
        at akka.actor.Actor.aroundReceive(Actor.scala:535)
        at akka.actor.Actor.aroundReceive$(Actor.scala:533)
        at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:691)
[info] Stage: Migrate cases and alerts
Alert/Observable: 1488966/1489029 (63 failures) avg:9ms
Alert: 41533/41533 avg:10967ms
Audit: 16233165/16233165 avg:15ms
Case/Observable: 126272/126274 (2 failures) avg:20ms
Case/Task/Log: 379/379 avg:232ms
Case/Task: 1059/1059 avg:7ms
Case: 887/887 avg:10ms
CaseTemplate/Task: 75/77 (2 failures) avg:7ms
CaseTemplate: 15/15 avg:22ms
CustomField: 2/2 avg:6ms
ImpactStatus: 0/0
Job: 125746/125746 avg:4ms
ObservableType: 46/46 (14 exists) avg:110ms
Organisation: 0/1 (1 exists)
Profile: 0/0
ResolutionStatus: 0/0
User: 2/2 avg:329ms
 

@mamoedo
Copy link

mamoedo commented Aug 10, 2020

Also, does anybody know how to use the --case-from-date param in the migration tool? It fails:

Error: Option --case-from-date expects a number but was given '20200727'
Try --help for more information.

And I tried both with and without quotes ./migrate --case-from-date "20200727" ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug TheHive4 TheHive4 related issues
Projects
None yet
Development

No branches or pull requests

3 participants