Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticAPM crash because of a missing context since version 6.3.0 #1188

Closed
cccs-sgaron opened this issue Jul 9, 2021 · 5 comments · Fixed by #1190
Closed

ElasticAPM crash because of a missing context since version 6.3.0 #1188

cccs-sgaron opened this issue Jul 9, 2021 · 5 comments · Fixed by #1190
Assignees

Comments

@cccs-sgaron
Copy link

Describe the bug:

Since ElasticAPM release 6.3.0 our application has started to see a ton of crashes that we believed are cause by a recent change in ElasticAPM. The elasticsearch instrumentation package is trying to assign the type of DB to elasticsearch into an None context which causes it to crash.

Here's a snippet of a strack trace from our app:

...
  File "/var/lib/assemblyline/.local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 168, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/var/lib/assemblyline/.local/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 1026, in get
    "GET", _make_path(index, doc_type, id), params=params, headers=headers
  File "/var/lib/assemblyline/.local/lib/python3.7/site-packages/elasticapm/instrumentation/packages/base.py", line 210, in call_if_sampling
    return self.call(module, method, wrapped, instance, args, kwargs)
  File "/var/lib/assemblyline/.local/lib/python3.7/site-packages/elasticapm/instrumentation/packages/elasticsearch.py", line 113, in call
    result_data = wrapped(*args, **kwargs)
  File "/var/lib/assemblyline/.local/lib/python3.7/site-packages/elasticsearch/transport.py", line 388, in perform_request
    timeout=timeout,
  File "/var/lib/assemblyline/.local/lib/python3.7/site-packages/elasticapm/instrumentation/packages/base.py", line 210, in call_if_sampling
    return self.call(module, method, wrapped, instance, args, kwargs)
  File "/var/lib/assemblyline/.local/lib/python3.7/site-packages/elasticapm/instrumentation/packages/elasticsearch.py", line 56, in call
    self._update_context_by_request_data(span.context, instance, args, kwargs)
  File "/var/lib/assemblyline/.local/lib/python3.7/site-packages/elasticapm/instrumentation/packages/elasticsearch.py", line 72, in _update_context_by_request_data
    context["db"] = {"type": "elasticsearch"}
TypeError: 'NoneType' object does not support item assignment

I believe this crash is related to a recent change in this commit: ee75cb8#diff-c8fb731f92134757656c157f5c3175bcb62e131c1fed1aec5041367603c204d0L62

You can see here, the context was previously assigned it's DB type in a way where even if the context was None it would still work but now it assumes the context is a dictionary. I'm not creating a PR to fix this because I'm not 100% sure if the old way was changed for a reason.

Possible fix
I have very limited understanding on what that context should be before reaching this function but possible fixes include:

  1. Revert to the old way of assigning the DB type.

  2. To test for None context before assigning a type.

if context is None:
    context = {}
  1. or make sure span.context default value is an empty dict instead of None

To Reproduce

I have no easy way to reproduce this crash because it does not happen all the time.

Environment (please complete the following information)

  • OS: Linux
  • Python version: 3.7.11
  • Framework and version [e.g. Django 2.1]: Flask 2.0.1
  • APM Server version: 7.12
  • Agent version: 6.3.0+

Aditional Information

Our app in launched in Gunicorn using gevent workers.

@esseti
Copy link

esseti commented Jul 12, 2021

Same problem here

@beniwohli beniwohli self-assigned this Jul 12, 2021
beniwohli added a commit to beniwohli/apm-agent-python that referenced this issue Jul 12, 2021
…rrectly

If the elasticsearch span is dropped for some reason, the context object
is None, which, if unhandled, leads to an exception

fixes elastic#1188
@beniwohli
Copy link
Contributor

@cccs-sgaron thanks for opening the issue! I went through the code, and the only possible way that I currently see how this could happen is if the Elasticsearch span is a DroppedSpan. Dropped spans happen in two situations:

  1. If the transaction has more spans than the maximum, defined in transaction_max_spans
  2. If it is a child of a leaf span (this should not be the case here)

I opened #1190 to address the issue. The test case in that pull request triggers the same stack trace as the one you added to this issue.

@esseti
Copy link

esseti commented Jul 12, 2021

My case is 99% the 1). In the past we had that note in kibana saying that some spans were dropped since there were too many. but it was working without problem.

beniwohli added a commit that referenced this issue Jul 12, 2021
…rrectly (#1190)

If the elasticsearch span is dropped for some reason, the context object
is None, which, if unhandled, leads to an exception

fixes #1188
@cccs-sgaron
Copy link
Author

Our issue was most likely due to hitting the transaction_max_spans as well since the API that we were monitoring may recursively loop and issue DB calls to elastic each of them creating a new span. Thanks for the fix, will report back if the issue come up again.

@beniwohli
Copy link
Contributor

@cccs-sgaron @esseti Awesome, thanks for the feedback. I just pushed version 6.3.3 of the agent with the fix from #1190

beniwohli added a commit to beniwohli/apm-agent-python that referenced this issue Sep 14, 2021
…rrectly (elastic#1190)

If the elasticsearch span is dropped for some reason, the context object
is None, which, if unhandled, leads to an exception

fixes elastic#1188
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants