Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow fetching of licenses (cdxgen -t dotnet) #352

Closed
andreycha opened this issue Jun 13, 2023 · 9 comments · Fixed by #358
Closed

Slow fetching of licenses (cdxgen -t dotnet) #352

andreycha opened this issue Jun 13, 2023 · 9 comments · Fixed by #358

Comments

@andreycha
Copy link

andreycha commented Jun 13, 2023

Hi,

I'm considering to switch from cyclonedx-dotnet to cdxgen when generating BOM for .NET applications which also contain frontend dependencies. With cyclonedx-dotnet I'm specifying Github username and token to include package information and licenses in particular into resulting BOM file.

So first I've tried to generate BOM for .NET dependencies using cdxgen: I set FETCH_LICENSE and GITHUB_TOKEN variables and ran cdxgen -t dotnet -o .... And it runs very slowly, at least x10 comparing to cyclonedx-dotnet.

I've set SCAN_DEBUG_MODE=debug and noticed few things comparing to cyclonedx-dotnet output:

  1. cdxgen seems to fetch package information for every project. In many cases there are lots of duplicated packages between projects which leads to querying same package multiple times. cyclonedx-dotnet first gathers all packages, deduplicates them, and then fetches licenses.
  2. cdxgen fetches information for much more packages/libraries comparing to cyclonedx-dotnet. Looks like cyclonedx-dotnet somehow aggregates packages into groups and then fetches information for the whole group, e.g. Retrieving GitHub license for repository dotnetcore/NPOI and ref master - URL: https://api.github.com/repos/dotnetcore/NPOI/license?ref=master while cdxgen does it for every single package, e.g.:

Querying nuget for DotNetCore.NPOI
Querying nuget for DotNetCore.NPOI.Core
Querying nuget for DotNetCore.NPOI.OpenXml4Net
Querying nuget for DotNetCore.NPOI.OpenXmlFormats

And then it gets multiplied by a number of projects where they are referenced.

  1. cdxgen also queries information for every single system dependency, while cyclonedx-dotnet doesn't seem to do that (not sure which behavior is more correct here), e.g.:

Querying nuget for System.Linq
Querying nuget for System.Linq.Expressions
Querying nuget for System.Net.Http
Querying nuget for System.Net.Primitives
Querying nuget for System.Net.Sockets
Querying nuget for System.ObjectModel
Querying nuget for System.Reflection

It again gets multiplied by a number of projects where they are referenced.

  1. cdxgen tries to query information also for internal packages which doesn't make sense. cyclonedx-dotnet doesn't do that, but I guess it's related to the previous observation. Anyway, it would be nice to have an ability to exclude some packages from fetching, e.g. by name like MyCompany.*.

I initially thought that cdxgen just serves like a proxy to a set of concrete tools, but looks like it has its own implementation at least when it comes to fetching package information. Would it be possible to improve the performance here?

Meanwhile I guess I'd be better off with running cyclonedx-dotnet to obtain a BOM file with .NET deps, then cdxgen to obtain a BOM file with frontend only dependencies, and then merge them.

Thank you!

@andreycha andreycha changed the title Slow fetching of licenses Slow fetching of licenses (cdxgen -t dotnet) Jun 13, 2023
@prabhu
Copy link
Collaborator

prabhu commented Jun 13, 2023

@andreycha Great ticket. Will definitely look into this. Do you happen to have any sample project to test this?

@andreycha
Copy link
Author

cyclonedx-dotnet repo can be used as a sample project, there you already see all the items I've described.

One more note: while the ticket is described as .NET specific, it looks to me like some of these items might benefit all project types.

@prabhu
Copy link
Collaborator

prabhu commented Jun 19, 2023

@andreycha do you have a project where cdxgen is slower than cyclonedx-dotnet? Looking at the code, cdx-dotnet is first calling nuget to get the github license url, then queries github to get the spdx id. cdxgen in contrast is only querying nuget and then sets the license name as CUSTOM.

https://github.com/CycloneDX/cyclonedx-dotnet/blob/master/CycloneDX/Services/NugetV3Service.cs#L209

Also, the grouping feature you noticed is merely multiple libraries specifying the same vcs url. Let me see if there is a way to group and reduce lookups without losing accuracy.

@andreycha
Copy link
Author

@prabhu cyclonedx-dotnet is such a project :). When I run it over itself, it runs several times faster than cdxgen (for bigger project difference is even bigger).

@prabhu
Copy link
Collaborator

prabhu commented Jun 19, 2023

@andreycha Sorry didn't build the project. Got the issue now.

@prabhu
Copy link
Collaborator

prabhu commented Jun 19, 2023

@andreycha PR 358 is ready for your review. If you could test the branch with multiple repos and compare the result with cyclonedx-dotnet, that would be awesome!

#358

prabhu added a commit that referenced this issue Jun 19, 2023
@andreycha
Copy link
Author

andreycha commented Jun 20, 2023

@prabhu thanks for a rapid fix! Changes looks good to me. What's the best way to compile and run it locally? Docker Compose?

@prabhu
Copy link
Collaborator

prabhu commented Jun 20, 2023

@andreycha, Yes docker compose is a good idea. Alternatively, you can git clone cdxgen repo and switch to the branch fix/issue-352 . Or use github cli as shown

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally?platform=linux&tool=cli

Then npm install followed by invoking cdxgen using bin/cdxgen -t csharp ...

@prabhu
Copy link
Collaborator

prabhu commented Jun 21, 2023

@andreycha, any luck with the testing?

prabhu added a commit that referenced this issue Jun 21, 2023
prabhu added a commit that referenced this issue Jun 21, 2023
prabhu added a commit that referenced this issue Jun 22, 2023
prabhu added a commit that referenced this issue Jun 22, 2023
prabhu added a commit that referenced this issue Jun 26, 2023
prabhu added a commit that referenced this issue Jun 26, 2023
prabhu added a commit that referenced this issue Jun 26, 2023
* Fixes #352. Cache nuget metadata lookup with some safe defaults

Signed-off-by: Prabhu Subramanian <[email protected]>

---------

Signed-off-by: Prabhu Subramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants