Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specification #5

Open
williballenthin opened this issue Jan 24, 2019 · 3 comments
Open

specification #5

williballenthin opened this issue Jan 24, 2019 · 3 comments
Assignees

Comments

@williballenthin
Copy link

williballenthin commented Jan 24, 2019

having been privy to a lot of the imphash development over recent years, a point of friction was that the initial research and discussion was around a specific python implementation. its usage gained momentum before a clear specification on how the imphash should be computed was published. this made it difficult to compute an imphash from another language (you have to adhere to python's idiosyncrasies if you want to match across vendors).

with this in mind, I'd like to recommend that the RichPE hash specification be considered before it becomes too late. specifically, lines 38-48 rely on the way python formats integers. could we define this in a way that's language independent?

one option might be to hash the raw bytes that back the flags/fields. alternatively, maybe provide concrete format specifiers that can be passed to sprintf/etc. across languages.

fyi, the motivation here was that we'd been discussing internally how to use RichPE across our datasets. one option would be to extend yara to compute the RichPE (sharing upstream, of course); however, this would require doing the implementation in C, which brings up this issue...

@joyce8 joyce8 self-assigned this Jan 28, 2019
@joyce8
Copy link
Collaborator

joyce8 commented Jan 30, 2019

Is this what you had in mind?

from struct import pack

...

def get_richpe(file_path=None, data=None):

    ...

    md5 = hashlib.md5()
    while len(rich_fields):
        compid = rich_fields.pop(0)
        count = rich_fields.pop(0)
        mask = 2 ** (count.bit_length() // 2 + 1) - 1
        count |= mask
        md5.update(pack("<L", compid))
        md5.update(pack("<L", count))

    md5.update(pack("<L", pe.FILE_HEADER.Machine))
    md5.update(pack("<L", pe.FILE_HEADER.Characteristics))
    md5.update(pack("<L", pe.OPTIONAL_HEADER.Subsystem))
    md5.update(pack("<B", pe.OPTIONAL_HEADER.MajorLinkerVersion))
    md5.update(pack("<B", pe.OPTIONAL_HEADER.MinorLinkerVersion))
    md5.update(pack("<L", pe.OPTIONAL_HEADER.MajorOperatingSystemVersion))
    md5.update(pack("<L", pe.OPTIONAL_HEADER.MinorOperatingSystemVersion))
    md5.update(pack("<L", pe.OPTIONAL_HEADER.MajorImageVersion))
    md5.update(pack("<L", pe.OPTIONAL_HEADER.MinorImageVersion))
    md5.update(pack("<L", pe.OPTIONAL_HEADER.MajorSubsystemVersion))
    md5.update(pack("<L", pe.OPTIONAL_HEADER.MinorSubsystemVersion))

    pe.close()
    return md5.hexdigest()

@williballenthin
Copy link
Author

this seems reasonable. when i have a chance (eta probably 1-2 weeks), I'll attempt to re-implement the RichPE hash in another language and report back on the experience.

@joyce8
Copy link
Collaborator

joyce8 commented Jan 31, 2019

Thank you! I've pushed changes but will leave this issue open in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants