Looping on mailbox URL messages with no content #988

chorsley · 2017-06-01T13:07:44Z

IntelMQ newbie here: wanted to get feedback on the right way to do this rather than submit a pull request just yet.

I had an issue with an email report containing a URL, where the URL content is 0 bytes. This caused lib/message.py:212 to raise InvalidValue, and the bot exits and restarts.

2017-05-30 23:00:01,926 - shadowserver-url-collector - ERROR - Bot has found a problem.
Traceback (most recent call last):
  File "/opt/intelmq/venv/lib/python3.5/site-packages/intelmq-1.0.0.dev7-py3.5.egg/intelmq/lib/bot.py", line 150, in start
    self.process()
  File "/opt/intelmq/venv/lib/python3.5/site-packages/intelmq-1.0.0.dev7-py3.5.egg/intelmq/bots/collectors/mail/collector_mail_url.py", line 87, in process
    self.chunk_replicate_header):
  File "/opt/intelmq/venv/lib/python3.5/site-packages/intelmq-1.0.0.dev7-py3.5.egg/intelmq/lib/splitreports.py", line 152, in generate_reports
    report.add("raw", infile.read(), force=True)
  File "/opt/intelmq/venv/lib/python3.5/site-packages/intelmq-1.0.0.dev7-py3.5.egg/intelmq/lib/message.py", line 212, in add
    raise exceptions.InvalidValue(key, value, reason=valid_value[1])
intelmq.lib.exceptions.InvalidValue: invalid value '' (<class 'str'>) for key 'raw' is_valid returned False.

Since the problem report didn't get marked as read before the bot restart, the collector_mail_url.py bot would get permanently stuck in the mailbox trying and failing to process that one report.

A quick fix was modifying this code in collector_mail_url.py:

                        self.logger.info("Report downloaded.")

                        if len(resp.content) > 0:
                            template = self.new_report()

                            for report in generate_reports(template, io.BytesIO(resp.content),
                                                           self.chunk_size,
                                                           self.chunk_replicate_header):
                                self.send_message(report)
                        else:
                            self.logger.error("Report for {} was empty".format(url))

                        # Only mark read if message relevant to this instance,
                        # so other instances watching this mailbox will still
                        # check it.
                        mailbox.mark_seen(uid)

I'd imagine there would be more situations where we have a permanent failure in a report URL, but we never mark the problem mail as read. Perhaps we need to specify other conditions between temporary failures and a permanent inability to process a report - I'm wondering if this is the right place to do it.

The text was updated successfully, but these errors were encountered:

ghost · 2017-06-02T10:08:52Z

Hi,

Thanks for reporting that.
I think that we can treat these cases as "success" - in terms of tried to fetch - and thus mark the email as read, as we don't expect a different result for another run:

empty webpages
HTTP status codes 4xx (permanent errors)

I also think that these cases should be logged as they indicate some problem - either the configuration, on the network part or the source of the webpage. Thus warning could be appropriate - as it's an error that we can handle.
@dmth what's your opinion?

dmth · 2017-06-06T10:36:07Z

Sorry for my late reply.
@chorsley thank your for reporting this issue and providing a workaround.

Marking the mail as sent is a quick fix, but risks losing reports.
We've already seen a some shadowserver reports without content in one (or more) requests, but with content in another one.

So, contrary to @wagner-certat, I do expect a different result on another run.
We maybe want a retry-mechanism for this case.

Proposal:
Do not mark the mail as sent but write it's message-id / uuid (or another identifier the IMAP protocol is providing) and the timestamp of the last try to a list in the bot and retry to download it's contents later. If the content could not be fetched after a certain time, for instance six hours, log an ERROR which identifies the e-mail and maybe the url that was tried, mark the e-mail as read and give up.

ghost · 2017-11-15T15:23:17Z

Marking the mail as sent is a quick fix,

read, not sent

but risks losing reports.

That risk is communicated to the user by logging a warning.

### Core - `lib/message`: `Report()` can now create a Report instance from Event instances (#1225). - `lib/bot`: * The first word in the log line `Processed ... messages since last logging.` is now adaptible and set to `Forwarded` in the existing filtering bots (#1237). * Kills oneself again after proper shutdown if the bot is XMPP collector or output (#970). Previously these two bots needed two stop commands to get actually stopped. - `lib/utils`: log: set the name of the `py.warnings` logger to the bot name (#1184). ### Bots #### Collectors - `bots.collectors.mail.collector_mail_url`: handle empty downloaded reports (#988). - `bots.collectos.file.collector_file`: handle empty files (#1244). #### Parsers - Shadowserver parser: * SSL FREAK: Remove optional column `device_serial` and add several new ones. * Fixed HTTP URL parsing for multiple feeds (#1243). - Spamhaus CERT parser: * add support for `smtpauth`, `l_spamlink`, `pop`, `imap`, `rdp`, `smb`, `iotscan`, `proxyget`, `iotmicrosoftds`, `automatedtest`, `ioturl`, `iotmirai`, `iotcmd`, `iotlogin` and `iotuser` (#1254). * fix `extra.destination.local_port` -> `extra.source.local_port`. #### Experts - `bots.experts.filter`: Pre-compile regex at bot initialization. ### Tests - Ensure that the bots did process all messages (#291). ### Tools - `intelmqctl`: * `intelmqctl run` has a new parameter `-l` `--loglevel` to overwrite the log level for the run (#1075). * `intelmqctl run [bot-id] mesage send` can now send report messages (#1077). - `intelmqdump`: * has now command completion for bot names, actions and queue names in interacive console. * automatically converts messages from events to reports if the queue the message is being restored to is the source queue of a parser (#1225). * is now capable to read messages in dumps that are dictionaries as opposed to serialized dicts as strings and does not convert them in the show command (#1256). * truncated messages are no longer used/saved to the file after being shown (#1255). * now again denies recovery of dumps if the corresponding bot is running. The check was broken (#1258). * now sorts the dump by the time of the dump. Previously, the list was in random order (#1020). ### Known issues no known issues

1.0.5 ### Core - `lib/message`: `Report()` can now create a Report instance from Event instances (certtools#1225). - `lib/bot`: * The first word in the log line `Processed ... messages since last logging.` is now adaptible and set to `Forwarded` in the existing filtering bots (certtools#1237). * Kills oneself again after proper shutdown if the bot is XMPP collector or output (certtools#970). Previously these two bots needed two stop commands to get actually stopped. - `lib/utils`: log: set the name of the `py.warnings` logger to the bot name (certtools#1184). ### Bots #### Collectors - `bots.collectors.mail.collector_mail_url`: handle empty downloaded reports (certtools#988). - `bots.collectos.file.collector_file`: handle empty files (certtools#1244). #### Parsers - Shadowserver parser: * SSL FREAK: Remove optional column `device_serial` and add several new ones. * Fixed HTTP URL parsing for multiple feeds (certtools#1243). - Spamhaus CERT parser: * add support for `smtpauth`, `l_spamlink`, `pop`, `imap`, `rdp`, `smb`, `iotscan`, `proxyget`, `iotmicrosoftds`, `automatedtest`, `ioturl`, `iotmirai`, `iotcmd`, `iotlogin` and `iotuser` (certtools#1254). * fix `extra.destination.local_port` -> `extra.source.local_port`. #### Experts - `bots.experts.filter`: Pre-compile regex at bot initialization. ### Tests - Ensure that the bots did process all messages (certtools#291). ### Tools - `intelmqctl`: * `intelmqctl run` has a new parameter `-l` `--loglevel` to overwrite the log level for the run (certtools#1075). * `intelmqctl run [bot-id] mesage send` can now send report messages (certtools#1077). - `intelmqdump`: * has now command completion for bot names, actions and queue names in interacive console. * automatically converts messages from events to reports if the queue the message is being restored to is the source queue of a parser (certtools#1225). * is now capable to read messages in dumps that are dictionaries as opposed to serialized dicts as strings and does not convert them in the show command (certtools#1256). * truncated messages are no longer used/saved to the file after being shown (certtools#1255). * now again denies recovery of dumps if the corresponding bot is running. The check was broken (certtools#1258). * now sorts the dump by the time of the dump. Previously, the list was in random order (certtools#1020). ### Known issues no known issues

ghost added this to the v1.1 Feature release milestone Jun 2, 2017

ghost added bug Indicates an unexpected problem or unintended behavior component: bots needs: discussion labels Jun 2, 2017

ghost modified the milestones: 1.1.0, 1.0.4 Feb 5, 2018

ghost modified the milestones: 1.0.4, 1.0.5 Apr 20, 2018

ghost self-assigned this May 14, 2018

ghost closed this as completed in 3b0c336 May 14, 2018

ghost assigned ghost and unassigned ghost Jun 14, 2018

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Looping on mailbox URL messages with no content #988

Looping on mailbox URL messages with no content #988

chorsley commented Jun 1, 2017

ghost commented Jun 2, 2017

dmth commented Jun 6, 2017

ghost commented Nov 15, 2017

Looping on mailbox URL messages with no content #988

Looping on mailbox URL messages with no content #988

Comments

chorsley commented Jun 1, 2017

ghost commented Jun 2, 2017

dmth commented Jun 6, 2017

ghost commented Nov 15, 2017