Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looping on mailbox URL messages with no content #988

Closed
chorsley opened this issue Jun 1, 2017 · 3 comments
Closed

Looping on mailbox URL messages with no content #988

chorsley opened this issue Jun 1, 2017 · 3 comments
Labels
bug Indicates an unexpected problem or unintended behavior component: bots needs: discussion
Milestone

Comments

@chorsley
Copy link
Collaborator

chorsley commented Jun 1, 2017

IntelMQ newbie here: wanted to get feedback on the right way to do this rather than submit a pull request just yet.

I had an issue with an email report containing a URL, where the URL content is 0 bytes. This caused lib/message.py:212 to raise InvalidValue, and the bot exits and restarts.

2017-05-30 23:00:01,926 - shadowserver-url-collector - ERROR - Bot has found a problem.
Traceback (most recent call last):
  File "/opt/intelmq/venv/lib/python3.5/site-packages/intelmq-1.0.0.dev7-py3.5.egg/intelmq/lib/bot.py", line 150, in start
    self.process()
  File "/opt/intelmq/venv/lib/python3.5/site-packages/intelmq-1.0.0.dev7-py3.5.egg/intelmq/bots/collectors/mail/collector_mail_url.py", line 87, in process
    self.chunk_replicate_header):
  File "/opt/intelmq/venv/lib/python3.5/site-packages/intelmq-1.0.0.dev7-py3.5.egg/intelmq/lib/splitreports.py", line 152, in generate_reports
    report.add("raw", infile.read(), force=True)
  File "/opt/intelmq/venv/lib/python3.5/site-packages/intelmq-1.0.0.dev7-py3.5.egg/intelmq/lib/message.py", line 212, in add
    raise exceptions.InvalidValue(key, value, reason=valid_value[1])
intelmq.lib.exceptions.InvalidValue: invalid value '' (<class 'str'>) for key 'raw' is_valid returned False.

Since the problem report didn't get marked as read before the bot restart, the collector_mail_url.py bot would get permanently stuck in the mailbox trying and failing to process that one report.

A quick fix was modifying this code in collector_mail_url.py:

                        self.logger.info("Report downloaded.")

                        if len(resp.content) > 0:
                            template = self.new_report()

                            for report in generate_reports(template, io.BytesIO(resp.content),
                                                           self.chunk_size,
                                                           self.chunk_replicate_header):
                                self.send_message(report)
                        else:
                            self.logger.error("Report for {} was empty".format(url))

                        # Only mark read if message relevant to this instance,
                        # so other instances watching this mailbox will still
                        # check it.
                        mailbox.mark_seen(uid)

I'd imagine there would be more situations where we have a permanent failure in a report URL, but we never mark the problem mail as read. Perhaps we need to specify other conditions between temporary failures and a permanent inability to process a report - I'm wondering if this is the right place to do it.

@ghost
Copy link

ghost commented Jun 2, 2017

Hi,

Thanks for reporting that.
I think that we can treat these cases as "success" - in terms of tried to fetch - and thus mark the email as read, as we don't expect a different result for another run:

  • empty webpages
  • HTTP status codes 4xx (permanent errors)

I also think that these cases should be logged as they indicate some problem - either the configuration, on the network part or the source of the webpage. Thus warning could be appropriate - as it's an error that we can handle.
@dmth what's your opinion?

@ghost ghost added this to the v1.1 Feature release milestone Jun 2, 2017
@ghost ghost added bug Indicates an unexpected problem or unintended behavior component: bots needs: discussion labels Jun 2, 2017
@dmth
Copy link
Contributor

dmth commented Jun 6, 2017

Sorry for my late reply.
@chorsley thank your for reporting this issue and providing a workaround.

Marking the mail as sent is a quick fix, but risks losing reports.
We've already seen a some shadowserver reports without content in one (or more) requests, but with content in another one.

So, contrary to @wagner-certat, I do expect a different result on another run.
We maybe want a retry-mechanism for this case.

Proposal:
Do not mark the mail as sent but write it's message-id / uuid (or another identifier the IMAP protocol is providing) and the timestamp of the last try to a list in the bot and retry to download it's contents later. If the content could not be fetched after a certain time, for instance six hours, log an ERROR which identifies the e-mail and maybe the url that was tried, mark the e-mail as read and give up.

@ghost
Copy link

ghost commented Nov 15, 2017

Marking the mail as sent is a quick fix,

read, not sent

but risks losing reports.

That risk is communicated to the user by logging a warning.

@ghost ghost modified the milestones: 1.1.0, 1.0.4 Feb 5, 2018
@ghost ghost modified the milestones: 1.0.4, 1.0.5 Apr 20, 2018
@ghost ghost self-assigned this May 14, 2018
@ghost ghost closed this as completed in 3b0c336 May 14, 2018
@ghost ghost assigned ghost and unassigned ghost Jun 14, 2018
ghost pushed a commit that referenced this issue Jun 21, 2018
 ### Core
- `lib/message`: `Report()` can now create a Report instance from Event instances (#1225).
- `lib/bot`:
  * The first word in the log line `Processed ... messages since last logging.` is now adaptible and set to `Forwarded` in the existing filtering bots (#1237).
  * Kills oneself again after proper shutdown if the bot is XMPP collector or output (#970). Previously these two bots needed two stop commands to get actually stopped.
- `lib/utils`: log: set the name of the `py.warnings` logger to the bot name (#1184).

 ### Bots
 #### Collectors
- `bots.collectors.mail.collector_mail_url`: handle empty downloaded reports (#988).
- `bots.collectos.file.collector_file`: handle empty files (#1244).

 #### Parsers
- Shadowserver parser:
  * SSL FREAK: Remove optional column `device_serial` and add several new ones.
  * Fixed HTTP URL parsing for multiple feeds (#1243).
- Spamhaus CERT parser:
  * add support for `smtpauth`, `l_spamlink`, `pop`, `imap`, `rdp`, `smb`, `iotscan`, `proxyget`, `iotmicrosoftds`, `automatedtest`, `ioturl`, `iotmirai`, `iotcmd`, `iotlogin` and `iotuser` (#1254).
  * fix `extra.destination.local_port` -> `extra.source.local_port`.

 #### Experts
- `bots.experts.filter`: Pre-compile regex at bot initialization.

 ### Tests
- Ensure that the bots did process all messages (#291).

 ### Tools
- `intelmqctl`:
  * `intelmqctl run` has a new parameter `-l` `--loglevel` to overwrite the log level for the run (#1075).
  * `intelmqctl run [bot-id] mesage send` can now send report messages (#1077).
- `intelmqdump`:
  * has now command completion for bot names, actions and queue names in interacive console.
  * automatically converts messages from events to reports if the queue the message is being restored to is the source queue of a parser (#1225).
  * is now capable to read messages in dumps that are dictionaries as opposed to serialized dicts as strings and does not convert them in the show command (#1256).
  * truncated messages are no longer used/saved to the file after being shown (#1255).
  * now again denies recovery of dumps if the corresponding bot is running. The check was broken (#1258).
  * now sorts the dump by the time of the dump. Previously, the list was in random order (#1020).

 ### Known issues
no known issues
chorsley pushed a commit to chorsley/intelmq that referenced this issue Jul 14, 2021
1.0.5

 ### Core
- `lib/message`: `Report()` can now create a Report instance from Event instances (certtools#1225).
- `lib/bot`:
  * The first word in the log line `Processed ... messages since last logging.` is now adaptible and set to `Forwarded` in the existing filtering bots (certtools#1237).
  * Kills oneself again after proper shutdown if the bot is XMPP collector or output (certtools#970). Previously these two bots needed two stop commands to get actually stopped.
- `lib/utils`: log: set the name of the `py.warnings` logger to the bot name (certtools#1184).

 ### Bots
 #### Collectors
- `bots.collectors.mail.collector_mail_url`: handle empty downloaded reports (certtools#988).
- `bots.collectos.file.collector_file`: handle empty files (certtools#1244).

 #### Parsers
- Shadowserver parser:
  * SSL FREAK: Remove optional column `device_serial` and add several new ones.
  * Fixed HTTP URL parsing for multiple feeds (certtools#1243).
- Spamhaus CERT parser:
  * add support for `smtpauth`, `l_spamlink`, `pop`, `imap`, `rdp`, `smb`, `iotscan`, `proxyget`, `iotmicrosoftds`, `automatedtest`, `ioturl`, `iotmirai`, `iotcmd`, `iotlogin` and `iotuser` (certtools#1254).
  * fix `extra.destination.local_port` -> `extra.source.local_port`.

 #### Experts
- `bots.experts.filter`: Pre-compile regex at bot initialization.

 ### Tests
- Ensure that the bots did process all messages (certtools#291).

 ### Tools
- `intelmqctl`:
  * `intelmqctl run` has a new parameter `-l` `--loglevel` to overwrite the log level for the run (certtools#1075).
  * `intelmqctl run [bot-id] mesage send` can now send report messages (certtools#1077).
- `intelmqdump`:
  * has now command completion for bot names, actions and queue names in interacive console.
  * automatically converts messages from events to reports if the queue the message is being restored to is the source queue of a parser (certtools#1225).
  * is now capable to read messages in dumps that are dictionaries as opposed to serialized dicts as strings and does not convert them in the show command (certtools#1256).
  * truncated messages are no longer used/saved to the file after being shown (certtools#1255).
  * now again denies recovery of dumps if the corresponding bot is running. The check was broken (certtools#1258).
  * now sorts the dump by the time of the dump. Previously, the list was in random order (certtools#1020).

 ### Known issues
no known issues
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior component: bots needs: discussion
Projects
None yet
Development

No branches or pull requests

2 participants