Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide necessary support for SpotBugs SARIF #1952

Open
michaelcfanning opened this issue Jun 30, 2020 · 21 comments
Open

Provide necessary support for SpotBugs SARIF #1952

michaelcfanning opened this issue Jun 30, 2020 · 21 comments

Comments

@michaelcfanning
Copy link
Member

e.g., can we help provide a schema-driven Java OM?

spotbugs/discuss#95

@KengoTODA
Copy link

Biggest problem is that our descriptor is written in HTML like this so we cannot provide it in TEXT and MD.

It's also nice if we can generate necessary classes/interfaces from JSON schema.
But it's not mandatory; we can code classes manually.

@KengoTODA
Copy link

Biggest problem is that our descriptor is written in HTML like this so we cannot provide it in TEXT and MD.

I remember that markdown can contain HTML elements, I'll directly put our descriptors into text.markdown property as an experiment.

@KengoTODA
Copy link

You can find the .sarif report generated by the latest SpotBugs (not released yet) at here:
spotbugs/spotbugs#1184 (comment)

When you find problems, please come spotbugs/spotbugs#1185 to comment, or mention me at here. Thanks!

@KengoTODA
Copy link

I remember that markdown can contain HTML elements, I'll directly put our descriptors into text.markdown property as an experiment.

I want to confirm this really work or not. However, even when I put fullDescription.markdown property, the SARIF viewer plugin for VS Code doesn't display it. How I can verify that the current JSON file is valid or not?

Here is viewers' screenshot from my local:

Screen Shot 2020-07-08 at 18 40 23

Here is the JSON data in tool.driver.rules:

{"messageStrings":{"default":{"text":"{0} is Serializable; consider declaring a serialVersionUID"}},"id":"SE_NO_SERIALVERSIONID","shortDescription":{"text":"Class is Serializable, but doesn't define serialVersionUID"},"fullDescription":{"markdown":"\n\n  <p> This class implements the <code>Serializable<\/code> interface, but does\n  not define a <code>serialVersionUID<\/code> field.&nbsp;\n  A change as simple as adding a reference to a .class object\n    will add synthetic fields to the class,\n   which will unfortunately change the implicit\n   serialVersionUID (e.g., adding a reference to <code>String.class<\/code>\n   will generate a static field <code>class$java$lang$String<\/code>).\n   Also, different source code to bytecode compilers may use different\n   naming conventions for synthetic variables generated for\n   references to class objects or inner classes.\n   To ensure interoperability of Serializable across versions,\n   consider adding an explicit serialVersionUID.<\/p>\n\n    "},"helpUri":"https://spotbugs.readthedocs.io/en/latest/bugDescriptions.html#SE_NO_SERIALVERSIONID","properties":{"tags":["BAD_PRACTICE"]}}

Or you may check the while main.sarif.txt.
Thanks in advance!

@KengoTODA
Copy link

What is the difference between tool.driver.notifications and invocations.toolConfigurationNotifications?
I'm using driver.notifications to put errors and exceptions like below, is it OK?

{
  "version":"2.1.0",
  "$schema":"https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.4.json",
  "runs":[
    {
      "tool":{
        "driver":{
          "name":"SpotBugs",
          "version":"Development",
          "language":"en",
          "notifications":[
            {
              "level":"error",
              "descriptor":{
                "id":"spotbugs-missing-classes"
              },
              "message":{
                "text":"Classes needed for analysis were missing: [com.github.spotbugs.MissingClass]"
              }
            }
          ]
        }
      }
    }
  ]
}

@michaelcfanning
Copy link
Member Author

michaelcfanning commented Jul 12, 2020

tool.driver.notifications are a set of reporting descriptors that provide metadata on the notifications a component might raise.

invocations.toolConfigurationNotifications and invocations.toolExecutionNotifications are instances of those notifications that actually occurred at runtime. We call the analysis equivalents rules and results. It's confusing here because we use the same term for notification metadata as well as the notifications.

toolConfigurationNotifications are errors that resolve to some sort of configuration problem. That is, they reflect a problem that can be resolved by the user properly configuring the tool (in your example, by ensuring that all classes relevant to analysis are present. toolExecutionNotifications are errors in the engine itself (i.e., bugs that need to be addressed by the tool developer). You'd use this property to store an unhandled exception or other engine correctness problem.

@KengoTODA
Copy link

OK thanks. I'm still not sure when I need to use tool.driver.notifications, I know that toolConfigurationNotifications is better place to put information about missing classes.

@michaelcfanning
Copy link
Member Author

When you want to report an actual notification that occurred to a user, such as when classes are missing, use toolConfigurationNotifications to send those messages. tool.driver.notifications is a place to add metadata about each class of notification, for example, this is where you would store a full description of the notification.

@lgolding, you are an effective explainer of the standard, want to jump in here? :)

@ghost
Copy link

ghost commented Jul 13, 2020

@KengoTODA, @michaelcfanning is correct (except that, looking at your file, I see that you have toolExecutionNotifications, not toolConfigurationNotifications.)

The SARIF Tutorials has a section on Notifications that might help. Here is an example (which I will add to the tutorial):

{
  "version": "2.1.0",
  "runs": [
    {                                          // A run object.
      "tool": {
        ...
      },
      "results": [
        ...
      ],
      "invocations": [
        {                                      // An invocation object
          "toolExecutionNotifications": [
            {                                  // A notification object.
              "level": "error",
              "descriptor": {
                "id": "spotbugs-missing-classes"
              },
              "message": {
                "text": "Classes needed for analysis were missing: [apply, iterator, ...]"
              }
            },
          ]
          "toolConfigurationNotifications": [
            {                                  // A notification object.
              ...
            },
          ]
        }
      ]
    }
  ]
}

Now, the SARIF toolComponent object does have a notification property, but it doesn't hold notifications: it holds descriptions of (metadata about) notifications. So if, for example, you wanted to provide a full description of what the notification with id NOT10001 meant, you could put it in tool.driver.notifications.

The property tool.driver.notifications used to be called ...notificationDescriptors. Towards the end of the standardization process, we went through an exercise of shortening many of the property names. In this case, it seems that the shortened name caused confusion.

@ghost
Copy link

ghost commented Jul 13, 2020

@KengoTODA I added a Notifications sample to the SARIF Tutorials. I hope you find it helpful.

I'm going through your sample SARIF output in detail and will provide more feedback later today. Thank you for doing this!

@KengoTODA
Copy link

Thanks. This is the latest example spotbugs.sarif, that was made for this PR. At least it passed the SARIF validator.

I will check your comments carefully, to try to understand which property I should use for this case.

@ghost
Copy link

ghost commented Jul 13, 2020

Kengo, thank you for the latest spotbugs.sarif. It saved me a lot of work :-)

This file is almost valid according to the schema. The only error is that the location.logicalLocation property is actually named logicalLocations (plural). Note that its value is an array.

The spec (§3.28.4) explains the reason for this:

NOTE: logicalLocations is an array because some logical locations can be expressed in more than one way. For example, the logical location of an element in an HTML document might be expressed by an XML Path expression such as /html/body/img[1] or by a CSS selector such as #logo.

After I fixed that, the latest version of the SARIF validator tool (which we will publish very soon) finds only two problems:

  1. The $schema property should have the value https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.5.json, which is the final official version. The only difference between -rtm.4 and -rtm.5 was to correct a typo in the name of one property that you are not using.

  2. Rule messages should be expressed in complete sentences, so they should end with a period (for example, "{0} may expose internal representation by returning {1}."

I will now look by hand to see if I can make any more recommendations.

@ghost
Copy link

ghost commented Jul 14, 2020

Hi @KengoTODA, here are some more comments on the SpotBugs SARIF output.

NOTE: Many of the things that I mention here, we will enhance our validator to report automatically.

  • Do you plan to include codeFlows in your output? spotbugs/discuss#69 mentioned potential improvements to SpotBugs output related to code flows, and SARIF's code flow feature would make it easier to do.

  • In your previous sample output, you included tool notifications, including exception objects. That was very helpful. Do you plan to put them back?

  • Your logicalLocations.fullyQualifiedName (for example, edu/umd/cs/findbugs/workflow/RejarClassesForAnalysis.java:[line 220]) is actually a physical location: a file name plus a line number. You already have that information in physicalLocation. The spec defines "logical location" like this:

    logical location

    location specified by reference to a programmatic construct, without specifying the artifact within which that construct occurs

    Example: A class name, a method name, a namespace.

    The idea of logicalLocation.fullyQualifiedName is to have (for example) a namespace + a class + a method name, like System.Collections.Generic.Dictionary.Keys. In that case, you might use Keys as the name property, and the whole string as the fullyQualifiedName property.

  • The spec describes a rule id as a "stable", "opaque" identifier (see §3.49.3):

    In the case of a rule, id SHALL contain a stable identifier for the rule and SHOULD be opaque.

    By "opaque" the spec means a purely symbolic name, like SPOT1001. We suggest you assign ids such as these to your rules, and place your existing human-readable rule names in the name property:

    "tool": {
        "driver": {
            ...
            "rules": [
                {
                    "id": "SPOT1001",
                    "name": "EI_EXPOSE_REP",
                    ...
                }
            ]
        }
    }
    
  • The property names in originalUriBaseIds are numbers. The spec (§3.3.4) does say that this is legal:

    The uriBaseId property can be any string; it does not need to have any particular syntax or follow any particular naming convention.

    However, it is helpful to use a symbolic name that is meaningful to the reader. In examples, we frequently use names like SOURCE_ROOT, BIN_ROOT, or REPO_ROOT.

  • Since tool.driver.notifications is an empty array, you can omit it.

  • In a region object, if endLine is the same as startLine, you don't have to mention endLine (it defaults to startLine).

  • We recommend enclosing "dynamic content" (the placeholders like {0} in your message strings) in single quotes. The quotes make the variable content easier to spot, and we've found single quotes give a nice clean appearance. To be clear, the spec does not require this, but we did add a rule to our SARIF validator tool to recommend it.

@ghost
Copy link

ghost commented Jul 15, 2020

@KengoTODA, I want to say a little more about SARIF "opaque" rule ids. When the spec says that a rule id must be "opaque", it does not mean that it must be incomprehensible to human readers, like "SPOT1001". It just means that a generic SARIF consumer -- that is, a consumer with no special knowledge of the tool -- is not allowed to assume anything about the internal structure of the identifier. For example, a generic SARIF consumer is not allowed to assume that because the rule id starts with "EI_", it has to do with "exposing internal representation" (as I see from the documentation of EI_EXPOSE_REP, EI_EXPOSE_REP2, etc.)

SARIF opaque rule identifiers should be short (to take up less UI space) and search-engine-friendly (to make it easy for users to learn more about the rule). Your strings like EI_EXPOSE_REP are fine in those regards. I should not have encouraged you to invent additional "purely symbolic" identifiers like "SPOT1001".

If your rules have additional, human-readable names, you could put them in the rule's name property, for example:

"rules": [
  {
    "id": "EI_EXPOSE_REP",
    "name": "ExposesInternalRepresentation",

@michaelcfanning FYI

@ghost
Copy link

ghost commented Jul 26, 2020

@KengoTODA, we have now published the latest SARIF validator, which you can use to assess your SARIF output. To use the validator:

  1. Install the .NET Core SDK 3.1: https://dotnet.microsoft.com/download/dotnet-core/3.1

    This adds the .NET Core command line tool dotnet to your path.

  2. In a new terminal window, run the command:

    dotnet tool install --global Sarif.Multitool
    

    This adds the SARIF Multitool command line tool sarif to your path.

  3. In a new terminal window, run the validate command on your SpotBugs SARIF output:

    sarif validate SpotBugs.sarif
    

    This produces a list of results on the console. You can save those results as a SARIF log file:

    sarif validate SpotBugs.sarif --output SpotBugs-validation.sarif
    

The SARIF validator checks for:

  • JSON syntax errors.
  • Violations of the SARIF schema.
  • Violations of certain conditions that can't be expressed in the schema, including:
    • Actual spec requirements.
    • Recommended best practices.

We continue to add new rules to the validator and to refine existing rules. To get the latest version:

dotnet tool update --global Sarif.Multitool

Hope this helps!

@KengoTODA
Copy link

Thanks, I've tried it. It seems that it needs not SDK 3.1 but SDK 2.1, so I installed it from https://dotnet.microsoft.com/download/dotnet-core/2.1

Here is the log which was reported when I run sarif tool with SDK 3.1:

$ sarif validate Desktop/spotbugs.sarif                                                                                                    
It was not possible to find any compatible framework version
The framework 'Microsoft.NETCore.App', version '2.1.0' was not found.
  - The following frameworks were found:
      3.1.6 at [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

You can resolve the problem by installing the specified framework and/or SDK.

The specified framework can be found at:
  - https://aka.ms/dotnet-core-applaunch?framework=Microsoft.NETCore.App&framework_version=2.1.0&arch=x64&rid=osx.10.15-x64

KengoTODA added a commit to spotbugs/spotbugs that referenced this issue Jul 30, 2020
* fix: fix JSON1005 reported by SARIF validator

error JSON1005: at "runs[0].results[0].locations[0].logicalLocation":
The schema does not define a property named "logicalLocation", and
the schema does not permit additional properties.

* fix: fix SARIF1011 reported by SARIF validator

$schema: The  property value https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.4.json
does not refer to the final version of the SARIF 2.1.0 schema. If you are using an earlier version
of the SARIF format, consider upgrading your analysis tool to produce the final version.
If this file does in fact conform to the final version of the schema, upgrade the tool
to populate the  property with a URL that refers to the final version of the schema.

* refactor: extract the logic to decide exit code, to reuse from the SARIF reporter

* docs: note that the SARIF support is experimental

* fix: move notifications from tool.driver to invocations

microsoft/sarif-sdk#1952 (comment)
@KengoTODA
Copy link

Sorry for my lazy reply, I'll check your comments one by one.

Do you plan to include codeFlows in your output? spotbugs/discuss#69 mentioned potential improvements to SpotBugs output related to code flows, and SARIF's code flow feature would make it easier to do.

Issued as spotbugs/spotbugs#1241

In your previous sample output, you included tool notifications, including exception objects. That was very helpful. Do you plan to put them back?

It should exist even in the current version (SpotBugs 4.1.1). I need to improve the sample output to cover features in the format.

@KengoTODA
Copy link

KengoTODA commented Sep 5, 2020

Trying to generate POJO from JSON schema. It marks fields in MessageStrings object 'additional properties' so the generated JSON file keeps nothing. So we cannot keep the message template with placeholder explained as 3.49.11. Here is generated Java code:
https://gist.github.com/KengoTODA/357e483984c8ed5442da7ed4b7331907

Still not sure where this problem comes from.

@eddynaka
Copy link
Collaborator

@michaelcfanning , i think we can close this, since we were able to solve the issues in the sarif version from spotbugs. What do you think?

@eddynaka eddynaka closed this as completed Dec 1, 2020
@KengoTODA
Copy link

@eddynaka in my case, in which project I should make an issue? is it https://github.com/oasis-tcs/sarif-spec/ ?

@eddynaka
Copy link
Collaborator

eddynaka commented Dec 2, 2020

@KengoTODA I thought this was the issue that we closed in spotbugs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants