Fix splitting inside parsed header extraction. #962

ch0wm3in · 2021-03-25T11:36:32Z

I dont know what initially caused this to be(besides the comment about body being parsed also, but my testing with a few different eml files i cannot replicate it original dev was probably using an old version of it, and future commits continued the regression), i can see in previous commits it was splitted by \r\n\r\n or \n\n.

The email.parser lib already parses the whole header so there is no need for splitting inside the extracted headers.

Currently the 'content-type:' splitting is causing mails from office365 to be parsed incorrectly by the program, as seen with arc headers..
arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=microsoft.com; s=[REDACTED];
h=From:Date:Subject:Message-ID:------->Content-Type:<----------MIME-Version:X-MS-Exchange-SenderADCheck; (<- and -> is not actual syntax but to show case it)

Above is 8 lines into the header where as the original header is 224 lines long.

I dont know hat initially caused this to be(besides the comment about body being parsed also, but my testing with a few different eml files i cannot replicate it original dev was probably using an old version of it, and future commits continued the regression), i can see in previous commits it was splitted by \r\n\r\n or \n\n. The email.parser lib already parses the whole header so there is no need for splitting inside the extracted headers. Currently the 'content-type:' splitting is causing mails from office365 to be parsed incorrectly by the program, as seen with arc headers.. arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=[REDACTED]; h=From:Date:Subject:Message-ID:------->Content-Type:<----------MIME-Version:X-MS-Exchange-SenderADCheck; (<- and -> is not actual syntax but to show case it) Above is 8 lines into the header where as the original header is 224 lines long.

ch0wm3in · 2021-06-14T14:58:10Z

I've investigated more into this, above is not true the headerparser will sometimes take content with it for unknown reasons.
The better solution would be to add a 'whitespace' to the 'content-type:' index with 'content-type: '

ch0wm3in mentioned this pull request May 26, 2021

[Bug][EMLParser] incomplete headers #976

Closed

Update parse.py

ac926b8

ch0wm3in changed the title ~~Fix unnecessary splitting inside an already parsed header extraction.~~ Fix splitting inside parsed header extraction. Jun 14, 2021

ch0wm3in closed this Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix splitting inside parsed header extraction. #962

Fix splitting inside parsed header extraction. #962

ch0wm3in commented Mar 25, 2021 •

edited

Loading

ch0wm3in commented Jun 14, 2021

Fix splitting inside parsed header extraction. #962

Fix splitting inside parsed header extraction. #962

Conversation

ch0wm3in commented Mar 25, 2021 • edited Loading

ch0wm3in commented Jun 14, 2021

ch0wm3in commented Mar 25, 2021 •

edited

Loading