Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add four additional performance rules #691

Merged
merged 22 commits into from
Oct 25, 2022

Conversation

dmachaj
Copy link
Collaborator

@dmachaj dmachaj commented Aug 8, 2022

Closes #677
Closes #679
Closes #680
Closes #681

Why is this change being made?

This change is continuing the pattern established with rule BA6001. Four additional performance rules have been added, all of which are at the Warning level.

Briefly summarize what changed

The following new rules were added with this change. BA6003 is a bit more controversial so I am saving it for a future PR. The pattern established by BA6001 was followed and the changes were mostly contained within the new rule implementations. The other changes were largely in the Compiler/Linker command line parser objects.

I also refactored the "Is compiled with MSVC?" detection out of the SourceLink rule and into a shared location so that all of the perf rules can reference it. The compiler/linker options are MSVC specific so it doesn't make sense to flag them for other compilers.

BA6002.EliminateDuplicateStrings

Applies to both Release and Debug. Looks for the /GF compiler option.

BA6004.EnableCOMDATFolding

Applies to Release builds. Warns if it is not enabled. Also has a bonus check for Debug builds that enable COMDAT folding (which makes debugging harder).

BA6005.EnableOptimizeReferences

Applies to both Release and Debug. Looks for /opt:ref linker option.

BA6006.EnableLinkTimeCodeGeneration

Applies to Release builds.

How was this change tested?

This change was tested with a combination of unit and functional tests. The linker and compiler command line parsing are finnicky, and I kept having subtle errors crop up during local testing, so I added unit test coverage to these classes. They turn a string into some bools so they are amenable to good unit testing.

The functional tests were expanded with a Pass/Fail/NotApplicable for each new rule. Adding those tests found a few more subtle problems that were fixed. Yay for testing.

Comment on lines 971 to 980
foreach (DisposableEnumerableView<Symbol> omView in pdb.CreateObjectModuleIterator())
{
Symbol om = omView.Value;
if (om.GetObjectModuleDetails().COMDATFoldingEnabled)
{
// There should be exactly one symbol in the binary containing knowledge about the linker command-line. If that one symbol
// says that OPT:ICF was enablelsd then the whole PE had it enabled too.
return true;
}
}

Check notice

Code scanning / CodeQL

Missed opportunity to use Select

This foreach loop immediately maps its iteration variable to another variable [here](1) - consider mapping the sequence explicitly using '.Select(...)'.
Comment on lines +986 to +995
foreach (DisposableEnumerableView<Symbol> omView in pdb.CreateObjectModuleIterator())
{
Symbol om = omView.Value;
if (om.GetObjectModuleDetails().OptimizeReferencesEnabled)
{
// There should be exactly one symbol in the binary containing knowledge about the linker command-line. If that one symbol
// says that OPT:REF was enabled then the whole PE had it enabled too.
return true;
}
}

Check notice

Code scanning / CodeQL

Missed opportunity to use Select

This foreach loop immediately maps its iteration variable to another variable [here](1) - consider mapping the sequence explicitly using '.Select(...)'.
Comment on lines 74 to 113
foreach (DisposableEnumerableView<Symbol> omView in pdb.CreateObjectModuleIterator())
{
Symbol om = omView.Value;
ObjectModuleDetails omDetails = om.GetObjectModuleDetails();

if (omDetails.Language != Language.C &&
omDetails.Language != Language.Cxx &&
omDetails.Language != Language.MASM)
{
continue;
}

if (!omDetails.HasDebugInfo)
{
continue;
}

bool isMSVC = (omDetails.WellKnownCompiler == WellKnownCompilers.MicrosoftC ||
omDetails.WellKnownCompiler == WellKnownCompilers.MicrosoftCxx);
if (isMSVC)
{
if (!omDetails.EliminateDuplicateStringsEnabled)
{
CompilandRecord record = om.CreateCompilandRecord();
if (!string.IsNullOrEmpty(record.Library))
{
compilandsLibraryWithoutStringPooling.Add(omDetails);
}
else
{
compilandsBinaryWithoutStringPooling.Add(omDetails);
}
}
else
{
int i = 0;
i++;
}
}
}

Check notice

Code scanning / CodeQL

Missed opportunity to use Select

This foreach loop immediately maps its iteration variable to another variable [here](1) - consider mapping the sequence explicitly using '.Select(...)'.
@@ -919,6 +919,14 @@ public bool IsMostlyOptimized(Pdb pdb)
Symbol om = omView.Value;
ObjectModuleDetails moduleDetails = om.GetObjectModuleDetails();

// Many non-code-file modules will pass through here. Referenced libraries will show up as LINK, for example. Exclude those
// from consideration to reduce skew.
if ((moduleDetails.Language != Language.C) &&
Copy link
Collaborator Author

@dmachaj dmachaj Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change greatly improved the accuracy of the "is optimized?" check. At least for a small sample project the majority of modules were libraries and those do not have relevant compiler options. #Resolved

if (ArgumentStartsWith(argument, "debug"))
// There are multiple /debug options so use StartsWith. There is also /debugtype: so we must be careful not
// to over-match against that.
if (ArgumentStartsWith(argument, "debug:") || ArgumentEquals(argument, "debug"))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a pre-existing but where /debugtype over-matched.

{
optLbr = true;
// "The /OPT arguments may be specified together, separated by commas. For example, instead of
// /OPT:REF /OPT:NOICF, you can specify /OPT:REF,NOICF."
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a bug that this parsing did not already support specifying multiple /OPT options with commas.

@lgtm-com
Copy link

lgtm-com bot commented Aug 12, 2022

This pull request introduces 1 alert when merging 027e37b into 257e9cc - view on LGTM.com

new alerts:

  • 1 for Dereferenced variable may be null

}

return;
}
Copy link
Contributor

@marmegh marmegh Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to nest these if statements? The outer if doesn't check anything additional to the inner ones. #WontFix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referenced a pre-existing rule that calls out objs/libs by name and it had this same logic (BA2004). The return statement does prevent a Pass from being logged so this does seem to be meaningful. The if blocks are independent so if/elseif/else is not appropriate.

@dmachaj
Copy link
Collaborator Author

dmachaj commented Sep 19, 2022

@marmegh @michaelcfanning it is now hackathon week so I have some time to revisit these changes and make any necessary edits to get them ready for merging. Please let me know if there is anything that you'd like to see. Thanks.

@marmegh
Copy link
Contributor

marmegh commented Sep 19, 2022

I've added a handful of comments, mostly around cleaning up comments and user facing strings. Next step for me is to enlist in your branch and step through the code. Then, I'll ask Michael to give a final review before we merge. This looks about 99% done, minor cleanups and we should be good to merge. @dmachaj


In reply to: 1251303961

marmegh
marmegh previously approved these changes Sep 22, 2022
Copy link
Contributor

@marmegh marmegh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@marmegh marmegh dismissed their stale review September 22, 2022 23:28

revoking review

/// <remarks>
/// This isn't as simple as looking for the well known compiler
/// values because both rust and clang binaries can link with
/// C runtime libraries compiled with MSVC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC

ouch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it is worth this whole block was cut/paste from BA2027.EnableSourceLink.cs. I needed the same answer and a valid implementation existed and could be made more widely visible.


private const string AnalyzerName = RuleIds.EliminateDuplicateStrings + "." + nameof(EliminateDuplicateStrings);

public override AnalysisApplicability CanAnalyzePE(PEBinary target, Sarif.PropertiesDictionary policy, out string reasonForNotAnalyzing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CanAnalyzePE

So, is CanAnalyzePE a shared implementation for all the perf rules?

If so, maybe create a base class with a shared implementation.

Named something like

MsvcCompiledBinaryAndPdbBase
MicrosoftNativeCompilerGeneratedBinaryAndPdbBase

according to the rename choice you make.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not all identical (although there is a lot of overlap). The notable difference is that some rules only apply to Release builds whereas others apply to both Release and Debug.

For what it is worth I would much prefer a flat helper instead of any sort of class hierarchy for this sort of thing. Some sort of CanAnalyzePEForPerformance(bool allowDebug) helper would do the trick. I just need a reasonable place to put it.

{
if (compilandsLibraryWithoutStringPooling.Count > 0)
{
GenerateCompilandsAndLog(context, compilandsLibraryWithoutStringPooling);
Copy link
Member

@michaelcfanning michaelcfanning Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenerateCompilandsAndLog

There's nothing wrong with this pattern but I note that currently we can only configure a single severity for this check. A common pattern is for results that are directly relevant to a binary are 'errors' (because these are immediately fixable/under control of the user) while library dependencies remain as 'warnings' (because it is harder to solicit fixes for dependencies).

Note action here, just a thought to keep in mind... #Pending

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that the compiland list is a tricky one. I tried doing the same thing for another rule (LTCG and the /GL option?) but the spew for things outside your control was too large so the signal/noise ratio was not good enough.

Copy link
Member

@michaelcfanning michaelcfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -5,6 +5,8 @@
* BUGFIX: Fix `error ERR997.ExceptionLoadingPdb : '[filename]' was not evaluated because its PDB could not be loaded (E_PDB_NOT_FOUND).` when reading PE file built with `PDBPageSize:8192` or greater, by upgrading msdia140.dll from `14.27.28826.96` to `14.32.31326.0`. [685](https://github.com/microsoft/binskim/pull/685)
* FEATURE: Add BA3031.EnableClangSafeStack, rename BA3030.UseCheckedFunctionsWithGcc to BA3030.UseGccCheckedFunctions [#663](https://github.com/microsoft/binskim/pull/663)
* Bump Sarif.Sdk by updating submodule from [fc9a9df to 698adb6](https://github.com/microsoft/sarif-sdk/compare/fc9a9dfb865096b5aaa9fa3651854670940f7459...698adb6365a242c6bb75adde56e3bd4be39c21d7). [#674](https://github.com/microsoft/binskim/pull/674)
* Introduce first performance rule BA6001.DisableIncrementalLinkingInReleaseBuilds [#667](https://github.com/microsoft/binskim/pull/667)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce

Fantastic changes, David. I've marked this approved but will wait until you have a chance to address my renaming/other minor suggestions before merging. Give me an ack when we're ready to go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I'm just getting back from a short vacation so I'll need some time to look through the feedback in detail. In the meantime I've merged the head of main into my branch so it is at least up to date in that regard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, besides a minor point on switching to a different logging overload for failure cases (that specifies the severity explicitly), my feedback is mostly around renaming. If you're under the gun and want me to simply go in and try to address my concerns, let me know, it wouldn't take long.

Of course, if you want time/inclination to discuss naming conventions, let's go. :) It's surprising how much thinking has to go into this topic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no particular hurry on my side. I would like to get to it today it will just be a few hours because I have other catching up to do.

Naming things is always a challenge :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I addressed all of your feedback. The one exception is whether there should be more code sharing for the "CanAnalyzePE" logic in the performance rules. We can discuss that one further and decide if further changes are appropriate. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, two small nits, update the rules metadata and expand the rule ids in the release notes to meet our standard, encapsulated in back-ticks, both the opaque and readable rule names are explicit. Once this is done, we'll merge.

We can go tinker in future with various things as required, based on testing, output review, etc.

@michaelcfanning michaelcfanning enabled auto-merge (squash) October 25, 2022 17:14
@michaelcfanning michaelcfanning merged commit ea0daa8 into main Oct 25, 2022
@michaelcfanning michaelcfanning deleted the user/dmachaj/mo-rules-mo-problems branch October 25, 2022 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants