Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance on latest release in Docker #2098

Closed
Narzhan opened this issue Sep 3, 2021 · 7 comments
Closed

Slow performance on latest release in Docker #2098

Narzhan opened this issue Sep 3, 2021 · 7 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior component: core performance
Milestone

Comments

@Narzhan
Copy link
Contributor

Narzhan commented Sep 3, 2021

I've tried to upgrade out intelmq setup to the major version 3.0.x and found that the whole processing was significantly slower. During troubleshooting I've found out that starting a bot and using the intelmqctl utility takes longer than on the previous version.

  1. The throughput of bot was on average half of the previously used version. Of all the bots experts and output were the slowest compared to the previous version. Collectors and parsers were at the similar speed.
  2. Mainly the initialization takes a lot longer than before.
  3. I've tried both version 3.0.0 and the newly release 3.0.1.
  4. The intelmq was running in Docker in all cases.

Setup:

  • Base layer Ubuntu 20.04
  • Python 3.8.5
  • Baseline intelmq version: 2.2.3
  • VM used for tests: 4xCPU, 16Gb RAM
  • All tested versions ran the same sets of bots.

I've tried to profile the run of two commands (all output is sorted by descending cumulative time:

  • listing non empty queues
  • running a bot via intelmqctl (deduplicator was chosen) and consuming less than 100 messages.

For v2.2.3:

Listing queues

         405990 function calls (396799 primitive calls) in 10.960 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    339/1    0.174    0.001   10.992   10.992 {built-in method builtins.exec}
        1    0.000    0.000   10.992   10.992 intelmqctl.py:3(<module>)
   352/18    0.039    0.000    9.399    0.522 <frozen importlib._bootstrap>:986(_find_and_load)
   351/18    0.036    0.000    9.398    0.522 <frozen importlib._bootstrap>:956(_find_and_load_unlocked)
   325/21    0.035    0.000    9.110    0.434 <frozen importlib._bootstrap>:650(_load_unlocked)
   273/20    0.029    0.000    9.068    0.453 <frozen importlib._bootstrap_external>:777(exec_module)
   398/20    0.001    0.000    8.928    0.446 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
    80/31    0.005    0.000    5.312    0.171 {built-in method builtins.__import__}
   118/37    0.006    0.000    3.673    0.099 <frozen importlib._bootstrap>:1017(_handle_fromlist)
        4    0.002    0.001    3.375    0.844 __init__.py:2(<module>)
        1    0.002    0.002    2.782    2.782 utils.py:2(<module>)
       24    0.001    0.000    2.546    0.106 __init__.py:1(<module>)
        1    0.000    0.000    2.479    2.479 __init__.py:8(<module>)

Deduplicator

         611665 function calls (592855 primitive calls) in 19.499 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    347/1    0.066    0.000   19.499   19.499 {built-in method builtins.exec}
        1    0.001    0.001   19.499   19.499 intelmqctl.py:3(<module>)
        1    0.000    0.000   13.268   13.268 intelmqctl.py:1865(main)
        1    0.000    0.000   12.755   12.755 intelmqctl.py:1009(run)
        1    0.000    0.000   12.754   12.754 intelmqctl.py:1028(bot_run)
        1    0.000    0.000   12.754   12.754 intelmqctl.py:152(bot_run)
        1    0.000    0.000   10.082   10.082 bot_debugger.py:35(__init__)
        1    0.004    0.004    9.739    9.739 bot.py:242(start)
      116    0.023    0.000    9.618    0.083 expert.py:45(process)
  702/697    0.029    0.000    6.627    0.010 client.py:894(execute_command)
      116    0.019    0.000    6.310    0.054 bot.py:574(receive_message)
   361/20    0.020    0.000    6.241    0.312 <frozen importlib._bootstrap>:986(_find_and_load)
   360/20    0.004    0.000    6.206    0.310 <frozen importlib._bootstrap>:956(_find_and_load_unlocked)
   334/26    0.027    0.000    6.069    0.233 <frozen importlib._bootstrap>:650(_load_unlocked)
   281/25    0.002    0.000    6.016    0.241 <frozen importlib._bootstrap_external>:777(exec_module)
   412/23    0.001    0.000    5.920    0.257 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
  702/697    0.048    0.000    5.449    0.008 client.py:912(parse_response)
1408/1398    0.171    0.000    5.261    0.004 connection.py:185(_read_from_socket)

V3.0.0

listing queues

         9739079 function calls (9685621 primitive calls) in 288.335 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    373/1    0.430    0.001  288.336  288.336 {built-in method builtins.exec}
        1    0.000    0.000  288.336  288.336 intelmqctl.py:6(<module>)
        1    0.000    0.000  265.167  265.167 intelmqctl.py:1901(main)
        1    0.005    0.005  260.925  260.925 intelmqctl.py:681(__init__)
        2    0.000    0.000  259.938  129.969 utils.py:201(load_configuration)
        2    0.233    0.116  259.937  129.969 main.py:419(load)
        2    0.000    0.000  259.704  129.852 constructor.py:117(get_single_data)
        2    0.000    0.000  252.715  126.357 composer.py:68(get_single_node)
        2    0.000    0.000  252.714  126.357 composer.py:93(compose_document)
  21304/2    5.401    0.000  252.714  126.357 composer.py:111(compose_node)
   1482/2    0.701    0.000  252.714  126.357 composer.py:199(compose_mapping_node)
    60626    3.273    0.000  196.940    0.003 parser.py:141(check_event)
   172596   11.306    0.000  148.415    0.001 scanner.py:171(check_token)
        1    0.142    0.142  135.889  135.889 intelmqctl.py:1016(load_defaults_configuration)
        1    0.000    0.000  135.747  135.747 utils.py:882(get_global_settings)
        1    0.000    0.000  135.747  135.747 utils.py:878(get_runtime)

Deduplicator

         19416412 function calls (19308243 primitive calls) in 534.028 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    381/1    0.175    0.000  534.028  534.028 {built-in method builtins.exec}
        1    0.207    0.207  534.028  534.028 intelmqctl.py:6(<module>)
        1    0.000    0.000  515.948  515.948 intelmqctl.py:1901(main)
        5    0.000    0.000  505.013  101.003 utils.py:201(load_configuration)
        5    0.735    0.147  505.012  101.002 main.py:419(load)
        5    0.003    0.001  504.259  100.852 constructor.py:117(get_single_data)
        5    0.000    0.000  491.445   98.289 composer.py:68(get_single_node)
        5    0.000    0.000  491.430   98.286 composer.py:93(compose_document)
  43333/5    9.946    0.000  491.430   98.286 composer.py:111(compose_node)
   3075/5    3.583    0.001  491.429   98.286 composer.py:199(compose_mapping_node)
   123399    5.342    0.000  372.997    0.003 parser.py:141(check_event)
   350491   22.127    0.000  282.354    0.001 scanner.py:171(check_token)
        1    0.000    0.000  266.593  266.593 intelmqctl.py:1029(run)
        1    0.000    0.000  266.592  266.592 intelmqctl.py:1048(bot_run)
        1    0.000    0.000  266.592  266.592 intelmqctl.py:158(bot_run)
        1    0.000    0.000  257.218  257.218 bot_debugger.py:43(__init__)
        1    0.012    0.012  257.214  257.214 bot.py:112(__init__)
        2    0.008    0.004  252.400  126.200 utils.py:882(get_global_settings)
        2    0.000    0.000  252.392  126.196 utils.py:878(get_runtime)
        1    0.011    0.011  249.355  249.355 intelmqctl.py:681(__init__)
    65642   13.783    0.000  221.566    0.003 scanner.py:220(fetch_more_tokens)
    17196    5.545    0.000  139.221    0.008 parser.py:624(parse_block_mapping_value)
    37310    2.745    0.000  137.594    0.004 scanner.py:745(fetch_plain)
    20160    3.904    0.000  131.456    0.007 parser.py:595(parse_block_mapping_key)
        1    0.000    0.000  129.203  129.203 bot.py:781(__load_defaults_configuration)
        1    0.018    0.018  123.215  123.215 intelmqctl.py:1016(load_defaults_configuration)
    37310   37.263    0.001  120.878    0.003 scanner.py:1540(scan_plain)
        1    0.039    0.039  118.809  118.809 bot.py:792(__load_runtime_configuration)
      864    0.665    0.001  112.698    0.130 composer.py:166(compose_sequence_node)

V3.0.1

The profiling had similar output but the time were generally better compared to the v3.0.0.

Issue

In the current state the intelmq is not usable and the deployment in production needs to be delayed. I wasn't able to pinpoint where the slowdown happens. Hope this information helps. If not. I can provide more.

@ghost
Copy link

ghost commented Sep 3, 2021

Hi @Narzhan, thanks for the report. Which Docker images do you use in particular? Is "only" intelmqctl affected, or is also the performance of bots during their processing affected? I think we also need to differentiate between the startup of the processes and their "runtime".

@waldbauer-certat
Copy link
Contributor

Hey @Narzhan :)

Thanks for the report! We've updated the docker image 3.0.1 or latest. We're using the ruamel.yaml from aptitude instead of the pip package. We're still looking into detail to get further optimizations.

@Narzhan
Copy link
Contributor Author

Narzhan commented Sep 6, 2021

Hi,
we're building our own custom Docker image. In which we just install all the dependencies and build intelmq from source. Botnet is started via intelmqctl. No other black magix is present.

While starting the bot via intelmqctl run 'bot' the initialization of the bot took the longest, processing started instantly after the init was done, I can only guess that for queue listing the situation is the same.

The processing of messages (the 'runtime') was slower compared to the base version (2.2.3) on average by half in 3.0.0, in 3.0.1 the difference was lower.

@waldbauer-certat
Copy link
Contributor

Hey, its me again :)

In the past couple of days, we investigated & hunted down the performance issue. So thats the outcome so far, described below.

Startup performance
The Startup performance has increased a bit, because of ruamel.yaml ( pyyaml fork ) instead of json. It is a very small amount, so there is no real impact

Runtime performance
We've noticed that the runtime performance has increased alot, so we started investigating further on that one, because runtime is more important than startup. So as of our investigating, it seems that there is an APICollector-bug which nearly eats up the CPU.

To answer your question @Narzhan
I've tried to reproduce your performance problem on my machine, but i cant. Could you provide a Dockerfile?

waldbauer-certat added a commit that referenced this issue Sep 9, 2021
The harmonization file had been loaded within each new report,
this nearly kills the CPU, because it has to open the file,
parse the file & load it in the memory.

Fixes #2098

Signed-off-by: Sebastian Waldbauer <[email protected]>
waldbauer-certat added a commit that referenced this issue Sep 9, 2021
The harmonization file had been loaded within each new report,
this nearly kills the CPU, because it has to open the file,
parse the file & load it in the memory.

Fixes #2098

Signed-off-by: Sebastian Waldbauer <[email protected]>
waldbauer-certat added a commit that referenced this issue Sep 9, 2021
The harmonization file had been loaded within each new report,
this nearly kills the CPU, because it has to open the file,
parse the file & load it in the memory.

Fixes #2098

Signed-off-by: Sebastian Waldbauer <[email protected]>
waldbauer-certat added a commit that referenced this issue Sep 9, 2021
As of now each new message re-loads the actual harmonization conf,
which unfortuneatly results in a high CPU drain.

This commit adds the harmonization parameter to the function call,
so the harmonization is loaded from the internal python memory
allocated space & no more file read/write has to be done.

Fixes #2098

Signed-off-by: Sebastian Waldbauer <[email protected]>
@ghost ghost added this to the 3.0.2 milestone Sep 9, 2021
@ghost ghost added the bug Indicates an unexpected problem or unintended behavior label Sep 9, 2021
@ghost ghost assigned waldbauer-certat Sep 9, 2021
@ghost
Copy link

ghost commented Sep 9, 2021

We identified two issues:

  • the collector bots loaded the harmonization file for every newly created report. that bug existed for 5 years, but now we are loading json with ruamel which much slower than loading it with json and that has opened a performance issue.
  • the statistics were enabled by default

We'll do a hotfix 3.0.2 release, hopefully tomorrow.

waldbauer-certat added a commit that referenced this issue Sep 9, 2021
As of now each new message re-loads the actual harmonization conf,
which unfortuneatly results in a high CPU drain.

This commit adds the harmonization parameter to the function call,
so the harmonization is loaded from the internal python memory
allocated space & no more file read/write has to be done.

Fixes #2098

Signed-off-by: Sebastian Waldbauer <[email protected]>
waldbauer-certat added a commit that referenced this issue Sep 9, 2021
As of now each new message re-loads the actual harmonization conf,
which unfortuneatly results in a high CPU drain.

This commit adds the harmonization parameter to the function call,
so the harmonization is loaded from the internal python memory
allocated space & no more file read/write has to be done.

Fixes #2098

Signed-off-by: Sebastian Waldbauer <[email protected]>
waldbauer-certat added a commit that referenced this issue Sep 9, 2021
As of now each new message re-loads the actual harmonization conf,
which unfortuneatly results in a high CPU drain.

This commit adds the harmonization parameter to the function call,
so the harmonization is loaded from the internal python memory
allocated space & no more file read/write has to be done.

Fixes #2098

Signed-off-by: Sebastian Waldbauer <[email protected]>
waldbauer-certat added a commit that referenced this issue Sep 9, 2021
As of now each new message re-loads the actual harmonization conf,
which unfortuneatly results in a high CPU drain.

This commit adds the harmonization parameter to the function call,
so the harmonization is loaded from the internal python memory
allocated space & no more file read/write has to be done.

Fixes #2098

Signed-off-by: Sebastian Waldbauer <[email protected]>
@waldbauer-certat
Copy link
Contributor

To see the difference

Before the #2106 merge
shodan_before

After the #2106 merge
shodan_after

@Narzhan
Copy link
Contributor Author

Narzhan commented Sep 10, 2021

I'll test the version 3.0.2 and get back to you with results.
If the situation is more or less the same I'll also post the Dockerfile for @waldbauer-certat .

@ghost ghost closed this as completed in 644b3c5 Sep 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior component: core performance
Projects
None yet
Development

No branches or pull requests

2 participants