Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittently losing Cortex #739

Closed
gonrada opened this issue Oct 1, 2018 · 17 comments
Closed

Intermittently losing Cortex #739

gonrada opened this issue Oct 1, 2018 · 17 comments
Assignees
Labels
Milestone

Comments

@gonrada
Copy link

gonrada commented Oct 1, 2018

Request Type

Bug

Work Environment

Question Answer
OS version (server) Ubuntu 16.04
OS version (client) Windows 7x64
TheHive version / git hash 3.1.0-1
Package Type Docker
Cortex version 2.1.0-0.1RC1

Problem Description

TheHive intermittently loses connection to Cortex. After a few minutes I will see another message that the connection is up. While the connectivity between TheHive and Cortex is down I am still able to login to the Cortex via the web gui and run jobs. Both TheHive and Cortex are running in docker containers on the same machine. The CPU load is not high when this happens. I've checked the logs for TheHive container and I'm not seeing errors. I'm not sure where to look for more information to try to debug this.

cortex_flapping

@nadouani
Copy link
Contributor

nadouani commented Oct 2, 2018

Well, these notifications are displayed when TheHive sees a status change related to the configured cortex instances. This statusis polled every minute.

Can you provide the result of

curl -H 'Authorization: Bearer THEHIVE-API-KEY' 'http://THEHIVE-SERVER:THEHIVE-PORT/api/status'

@gonrada
Copy link
Author

gonrada commented Oct 2, 2018

I ran that and got the following:

{
    "config": {
        "authType": [
            "key",
            "local",
            "ad"
        ],
        "capabilities": [
            "authByKey",
            "changePassword",
            "setPassword"
        ],
        "protectDownloadsWith": "malware",
        "ssoAutoLogin": false
    },
    "connectors": {
        "cortex": {
            "enabled": true,
            "servers": [
                {
                    "name": "cortex1",
                    "status": "OK",
                    "version": "2.1.0-RC1"
                }
            ],
            "status": "OK"
        },
        "misp": {
            "enabled": true,
            "servers": [
                {
                    "name": "misp",
                    "purpose": "ExportOnly",
                    "status": "ERROR",
                    "version": ""
                }
            ],
            "status": "ERROR"
        }
    },
    "health": {
        "elasticsearch": "WARNING"
    },
    "versions": {
        "Elastic4Play": "1.6.2",
        "Elastic4s": "5.6.6",
        "ElasticSearch": "5.6.9",
        "Play": "2.6.18",
        "TheHive": "3.1.0"
    }
}

I then waited a couple of minutes and ran it again:

{
    "config": {
        "authType": [
            "key",
            "local",
            "ad"
        ],
        "capabilities": [
            "authByKey",
            "changePassword",
            "setPassword"
        ],
        "protectDownloadsWith": "malware",
        "ssoAutoLogin": false
    },
    "connectors": {
        "cortex": {
            "enabled": true,
            "servers": [
                {
                    "name": "cortex1",
                    "status": "ERROR",
                    "version": ""
                }
            ],
            "status": "ERROR"
        },
        "misp": {
            "enabled": true,
            "servers": [
                {
                    "name": "misp",
                    "purpose": "ExportOnly",
                    "status": "ERROR",
                    "version": ""
                }
            ],
            "status": "ERROR"
        }
    },
    "health": {
        "elasticsearch": "WARNING"
    },
    "versions": {
        "Elastic4Play": "1.6.2",
        "Elastic4s": "5.6.6",
        "ElasticSearch": "5.6.9",
        "Play": "2.6.18",
        "TheHive": "3.1.0"
    }
}

@nadouani
Copy link
Contributor

nadouani commented Oct 2, 2018

OK, so the UI is behaving as expected. No the question is: Why your TheHive is randomly reaching your Cortex.

Do you have any logs in /var/log/thehive/application.log?

@To-om
Copy link
Contributor

To-om commented Oct 2, 2018

@gonrada Do you use docker-compose ? If so, what is your docker-compose file ?

@gonrada
Copy link
Author

gonrada commented Oct 2, 2018

@To-om

version: "2"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.0
    restart: always
    volumes:
      - /srv/thehive/elasticsearch/data:/usr/share/elasticsearch/data
      - /srv/thehive/elasticsearch/backup:/backup
    environment:
      - http.host=0.0.0.0
      - transport.host=0.0.0.0
      - xpack.security.enabled=false
      - cluster.name=hive
      - script.inline=true
      - thread_pool.index.queue_size=100000
      - thread_pool.search.queue_size=100000
      - thread_pool.bulk.queue_size=100000
      - path.repo=/backup
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    ports:
      - "9200:9200"
  cortex:
    container_name: thecortex
    restart: always
    image: thehiveproject/cortex:latest
    depends_on:
      - elasticsearch
    volumes:
      - /srv/thehive/cortex/application.conf:/etc/cortex/application.conf
      - /srv/thehive/cortex/Cortex-Analyzers:/opt/Cortex-Analyzers
    ports:
      - "9001:9001"
  thehive:
    container_name: thehive
    restart: always
    image: thehiveproject/thehive:3.1.0-1
    volumes:
      - /srv/thehive/keystore.jks:/etc/thehive/keystore.jks
      - /srv/thehive/application.conf:/etc/thehive/application.conf
    depends_on:
      - elasticsearch
      - cortex
    ports:
      - "9443:9443"

@nadouani I was running a tail -f /var/log/thehive/application.log inside the container. There aren't any updates to the log that occur when I get the error. Even while updating this ticket I've seen the error a couple of times and there aren't any new lines in the log file.

@rayschippers
Copy link

Probably doesn't help a lot but I'm using docker as well and also get this error constantly. Happy to provide any logs or data that might help troubleshoot

@sprungknoedl
Copy link

We are losing the Cortex connection as well. TheHive 3.1.0 and Cortex 2.1 installed as a "normal" service on 2 Ubuntu servers.

@nadouani
Copy link
Contributor

nadouani commented Oct 9, 2018

We are losing the Cortex connection as well. TheHive 3.1.0 and Cortex 2.1 installed as a "normal" service on 2 Ubuntu servers.

Hello, I'm curious about what Losing connection means: are your seeing TheHive's UI saying: The Cortex connection is red? or you are getting broken connections when calling cortex APIs from TheHive?

@rayschippers
Copy link

In my case yes we get the UI pop up constantly saying it has disconnected and then that it's back and get analyzer failures.

@nadouani
Copy link
Contributor

nadouani commented Oct 9, 2018

@rayschippers and @secdecompiled can you please call this type of script, to poll the status API:

while [ 1 ]
do
    curl -H 'Authorization: Bearer API_KEY' 'http://THEHIVE:9000/api/status' -s | jq  .connectors.cortex.status
    sleep 30
done

This will wait 30 seconds and call the API, you can stop it manually.
Here I use jq to get the cortex status from the API response.

@gonrada
Copy link
Author

gonrada commented Oct 16, 2018

@nadouani is there any further information I can provide?

@nadouani
Copy link
Contributor

nadouani commented Oct 16, 2018

I need the restult for my last question, to see how does thehive poll the cortex connection

@rayschippers
Copy link

Hi @nadouani I spent today upgrading everything to latest to see if there was any improvements, and it's happening less but still happening, output when it's broken:
"cortex":{"enabled":true,"servers":[{"name":"XXCORTEX","version":"","status":"ERROR"}]

and when it's back

"cortex":{"enabled":true,"servers":[{"name":"XXCORTEX","version":"2.1.2","status":"OK"}]

@nadouani
Copy link
Contributor

Well, again. I need to know does the status polling work, so without running that script for few minutes, I cannot investigate. Thanks

@rayschippers
Copy link

Ran it for a few minutes and the output for Cortex status
OK
OK
ERROR
ERROR
ERROR
ERRROR
ERROR
ERROR
ERROR
ERROR
OK
OK

@nadouani
Copy link
Contributor

Well this looks like a bug within the status polling that has a very small timeout.

Will be fixed in the next hotfix

@To-om To-om added this to the 3.1.3 milestone Oct 17, 2018
@To-om To-om closed this as completed Nov 5, 2018
@To-om To-om modified the milestones: 3.1.3, 3.2.0 (Cerana 2) Nov 15, 2018
@BrijJhala
Copy link

We have been running cortex almost 3 hours with 200 users in loop of 5. 1000 samples continuously. when we reach 12k cortex jobs, our codebase can not hit cortex /api/run or /api/<<jobid//results. we use httpClient axios to communicate cortex end point. its not responding. we thought its an issue on axios side but apparantly we started cortex, communication between our service (axios client) and cortex working. so sounds like cortex is holding up connections of client. we really need some inputs on it. Note : very important : we can run scan from UI without any issue. but not from our httpClient. Restarting cortex fixes our issue. need some inside on this issue. Another aspects is even we restart the pod of our service, cortex communication with our service is not functional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants