Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix splitting inside parsed header extraction. #962

Closed
wants to merge 2 commits into from
Closed

Fix splitting inside parsed header extraction. #962

wants to merge 2 commits into from

Conversation

ch0wm3in
Copy link
Contributor

@ch0wm3in ch0wm3in commented Mar 25, 2021

I dont know what initially caused this to be(besides the comment about body being parsed also, but my testing with a few different eml files i cannot replicate it original dev was probably using an old version of it, and future commits continued the regression), i can see in previous commits it was splitted by \r\n\r\n or \n\n.

The email.parser lib already parses the whole header so there is no need for splitting inside the extracted headers.

Currently the 'content-type:' splitting is causing mails from office365 to be parsed incorrectly by the program, as seen with arc headers..
arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=microsoft.com; s=[REDACTED];
h=From:Date:Subject:Message-ID:------->Content-Type:<----------MIME-Version:X-MS-Exchange-SenderADCheck; (<- and -> is not actual syntax but to show case it)

Above is 8 lines into the header where as the original header is 224 lines long.

I dont know hat initially caused this to be(besides the comment about body being parsed also, but my testing with a few different eml files i cannot replicate it original dev was probably using an old version of it, and future commits continued the regression), i can see in previous commits it was splitted by \r\n\r\n or \n\n. 

The email.parser lib already parses the whole header so there is no need for splitting inside the extracted headers. 

Currently the 'content-type:' splitting is causing mails from office365 to be parsed incorrectly by the program, as seen with arc headers..
 arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=microsoft.com; s=[REDACTED];
 h=From:Date:Subject:Message-ID:------->Content-Type:<----------MIME-Version:X-MS-Exchange-SenderADCheck; (<- and -> is not actual syntax but to show case it)

Above is 8 lines into the header where as the original header is 224 lines long.
@ch0wm3in
Copy link
Contributor Author

I've investigated more into this, above is not true the headerparser will sometimes take content with it for unknown reasons.
The better solution would be to add a 'whitespace' to the 'content-type:' index with 'content-type: '

@ch0wm3in ch0wm3in changed the title Fix unnecessary splitting inside an already parsed header extraction. Fix splitting inside parsed header extraction. Jun 14, 2021
@ch0wm3in ch0wm3in closed this Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant