-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeouts for some collectors using "requests" #859
Conversation
Signed-off-by: Sebastian Wagner <[email protected]> Conflicts: docs/Bots.md intelmq/bots/collectors/http/collector_http.py intelmq/bots/collectors/mail/collector_mail_url.py
Codecov Report
@@ Coverage Diff @@
## master #859 +/- ##
==========================================
- Coverage 78.21% 77.98% -0.24%
==========================================
Files 221 221
Lines 9015 9047 +32
==========================================
+ Hits 7051 7055 +4
- Misses 1964 1992 +28
Continue to review full report at Codecov.
|
Signed-off-by: Sebastian Wagner <[email protected]>
timeoutretries += 1 | ||
self.logger.warn("Timeout whilst downloading the report.") | ||
|
||
if timeoutretries >= 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to have this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be reasonable.
# The download timed out too often, leave the Loop. | ||
continue | ||
|
||
self.logger.debug(resp.content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this left from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuuuh yes.
To Merge this PR intelmq/bots/collectors/http/collector_http_stream.py and intelmq/bots/collectors/rt/collector_rt.py need to be adapted to the new parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http_timeout_max_tries is not documented
no changelog entry
timeoutretries = 0 | ||
resp = None | ||
|
||
while timeoutretries < self.http_timeout_max_tries and resp is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wagner-certat can you validate this with me:
Is the Error Handling features documented here capable to do the same thing as this pull request want to do? I'm not 100% sure but it seems the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, except that this is specific to timeouts. However, as the error_max_retries can be set to 3 for theses both collectors, I actually think that keeping the number of configuration parameters low is more important than handling the errors differently. So maybe just let the request.get raise the timeouterror
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I agree that it's good practice to keep the number of config parameters low. if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmth to fix the timeout issue there is only need to add 3 lines as far I can see:
try:
resp = requests.get(url=self.parameters.http_url, auth=self.auth,
proxies=self.proxy, headers=self.http_header,
verify=self.http_verify_cert,
cert=self.ssl_client_cert,
timeout=self.http_timeout)
except requests.exceptions.Timeout:
raise requests.exceptions.Timeout('Timeout whilst downloading the report.')
as a User, you go to configuration and change as you want according to this Error Handling documentation
Let me know if there is anything that I miss on my analysis of the problem. Thank you for raising this one. :)
Why do you catch and raise the exact same error at all? |
:D:D:D:D heh I only added that just to have a specific message, if it's important but in practice, I agree with you @wagner-certat , there is no need to fix for that |
From my point of view it's better to have an explicit timeout behaviour than using the built in error handling. Based on my experience users, and administrators as well, might want to differentiate this behaviour. What might be a step to simplify configuration? BTW: The Guide https://github.com/certtools/intelmq/blob/master/docs/User-Guide.md#error-handling is buggy:
But: |
there's not retry mode, that's standard behavior Thanks @dmth #859 (comment) related: #859 Signed-off-by: Sebastian Wagner <[email protected]>
Thanks, fixed: 1f51da9 |
FWIW I'm with @dmth here: |
In additon: DOC: Added http_timeout_max_tries to documentation
I've upgraded the documentation for the http_max_retries parameter. In addition I renamed the parameter for the stream collector in order to use the same name. |
This PR now proposes a mix of > grep http_timeout * -r
bots/collectors/http/collector_http_stream.py:http_timeout: tuple of two floats or float
bots/collectors/http/collector_http_stream.py: timeout=self.http_timeout)
bots/collectors/http/collector_http.py:http_timeout: tuple of two floats or float
bots/collectors/http/collector_http.py: timeout=self.http_timeout_sec)
Binary file bots/collectors/http/__pycache__/collector_http.cpython-34.pyc matches
Binary file bots/collectors/http/__pycache__/collector_http_stream.cpython-34.pyc matches
Binary file bots/collectors/rt/__pycache__/collector_rt.cpython-34.pyc matches
bots/collectors/rt/collector_rt.py: timeout=self.http_timeout)
Binary file bots/collectors/mail/__pycache__/collector_mail_url.cpython-34.pyc matches
bots/collectors/mail/collector_mail_url.py: timeout=self.http_timeout_sec)
etc/defaults.conf: "http_timeout_sec": 30,
lib/bot.py: self.http_timeout_sec = getattr(self.parameters, 'http_timeout_sec', None) |
more clear parameter http_timeout_sec instead of http_timeout
The mix is not intentional. I updated the RT collector and documentation correspondingly. |
@aaronkaplan I really wonder why this cannot be merged...
From reviewing the latest commits, it looks like
|
Would be great if someone from @Intevation could check my changes. I'll then merge it. |
55535ff
to
002654a
Compare
002654a
to
9ba056e
Compare
Thanks for enhancing the code. Looks good, please go ahead. |
This PR adds the capability of setting a timeout to the collector_http and collector_mail_url.
If a timeout occurs, the connection is tested again for two times. If the request fails three time in a row, the collector will start processing again, when the
rate_limit
interval was exceeded.This PR is in conflict with #835 (comment)
This PR fixes #854 and #666