Fix splitting inside parsed header extraction. #962
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I dont know what initially caused this to be(besides the comment about body being parsed also, but my testing with a few different eml files i cannot replicate it original dev was probably using an old version of it, and future commits continued the regression), i can see in previous commits it was splitted by \r\n\r\n or \n\n.
The email.parser lib already parses the whole header so there is no need for splitting inside the extracted headers.
Currently the 'content-type:' splitting is causing mails from office365 to be parsed incorrectly by the program, as seen with arc headers..
arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=microsoft.com; s=[REDACTED];
h=From:Date:Subject:Message-ID:------->Content-Type:<----------MIME-Version:X-MS-Exchange-SenderADCheck; (<- and -> is not actual syntax but to show case it)
Above is 8 lines into the header where as the original header is 224 lines long.