Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: (multiple) attachments for license texts of type "expression" #554

Open
jkowalleck opened this issue Dec 14, 2024 · 16 comments · May be fixed by #599
Open

feat: (multiple) attachments for license texts of type "expression" #554

jkowalleck opened this issue Dec 14, 2024 · 16 comments · May be fixed by #599
Assignees
Milestone

Comments

@jkowalleck
Copy link
Member

jkowalleck commented Dec 14, 2024

currently , we do allow one text attachment per "named"-/"spdx"-license.
but we dont allow any test attachments for a SPDX license expression

Request

allow multiple license text attachments per SPDX license expression

Discussion

why thou?

short: not all SPDX licenses are templates.

Not all SPDX licenses are templates, some have qualified "placeholders" that need to be filled
by the ones applying them.
Therefore, it is important to carry the actual declared license texts of a component, even when using a SPDX license expression (like MIT or GPL-3.0-or-later)
And even for template texts (like Apache-2.0), it might be required to carry license amendment texts (like a NOTICE file for Apache2).

This is why it is needed to have a license texts for SPDX expression.

why multiple tests, why not a single text?

short: expressions might consist of multiple different licenses, each having an own text

expected outcome: the specification

Have an option to carry the text for each SPDX-license-ID and SPD-license-ref in an SPDX license expression

intended proposed implementation

use the existing structure of an attachment, but also have a field to tell which SPDX id or ref-name it applies to.
Spdx id MUST use the existing enum CycloneDX spec usesfor that matter. -- https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license_id
Name is free text -- https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license_name
Like with existing license spec -- EITHER name OR id (XSD <xs:choice> / JSON-schema oneOf - one, not both)

possible results

{
  "expression": "MIT OR GPL-3.0-or-later OR LicenseRef-.amazon.com.-AmznSL-1.0",
  "acknowledgement": "declared",
  "texts": [
    {
      "id": "MIT",
      "content": "Copyright (c) 1984 Example org\n\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software\n[...]"
    },
    {
      "id": "GPL-3.0-or-later",
      "content": "Example project\nCopyright (C) 1984 Example org\n\nThis program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.[...]"
    },
    {
      "name": "LicenseRef-.amazon.com.-AmznSL-1.0",
      "content": " Amazon Software License 1.0\\n\\n\\nThis Amazon Software License (\"License\") governs your use, reproduction, and\\n\\ndistribution of the accompanying software as specified below.\\n\\n\\n## 1. Definitions\\n\\n\\n  \"Licensor\" means any person or entity that distributes its Work.\n[...]"
    }
  ]
}

original story:

Hi @jkowalleck ,

My impression is that with v1.5 we have a significant design flaw.

@Joerki , could you give a practical example for something that is not possible with today's design?

to do a separation with between a list or single expression I see the following issues:

With the expression I don't see how to include a license text for a certain item of the expression.
Licenses that come with the SPDX license list that come with a text without placeholders are not a problem.
For a standard license this might be a problem if the license definition has placeholders for e.g. authors or a company in the text. My colleague who deals with legal aspects says that the use of such a "template" is not sufficient for a reference, we need a "verbatim" copy of the license text (which is stored in the public repo of the component). In an attribution report (that we generate from the SBOM) we must have texts that satisfy these legal requirements, so the text must be contained in the SBOM.
This could be a problem with 1.4 already if such a license is referenced in an expression.
SPDX allows to create a custom ID (LicenseRef-*). This is declared also as "expression" like compound expression given as example in the CycloneDX spec. And again: where can I specify the license text that belongs to the non-standard ID?

Example: https://metadata.ftp-master.debian.org/changelogs//main/o/openssl/openssl_3.0.15-1~deb12u1_copyright
Please note that in these files you do not find standardized IDs. Therefore you have both IDs and texts. Texts might appear in a "Files" stanza or a dedicated "License" stanza (which makes sense when licenses appear multiple times). So I don't need a reference to content outside the copyright file.
I convert the IDs to an SPDX ID of a standard license, this was possible for me in the past to have finally a proper SPDX expression. I use the aboutcode.org license list repo and extend it for us.

My conclusion:
With the license list I have the chance to provide (almost) full information when several licenses need to be considered at the same time including license texts (X AND Y).

CycloneDX limits the use of SPDX expressions to cases where the creator has to make a conclusion for a multi-licensed component where he can choose between licenses (X OR Y) that have a known, standardized text that can be taken 1:1 from its original definition.

Originally posted by @Joerki in #349

@jkowalleck
Copy link
Member Author

jkowalleck commented Dec 14, 2024

I will work on a solution, planned for milestone 1.7.
All discussion and every help is welcome 👋

@jkowalleck jkowalleck self-assigned this Dec 14, 2024
@Joerki
Copy link

Joerki commented Dec 16, 2024

Hi @jkowalleck,

it's great to see this item for planning in the 1.7 milestone!

To prevent confusion, the example should show "GPL-3.0-or-later" (which is the proper SPDX identifier) instead of "GPL3+" (an ID you find in Debian copyright files).

I don't know what you (CycloneDX creators) had in mind with "SPDX expression" in CycloneDX spec context. My impression is that in contrast to to the license ID/name list you wanted to have a counterpart that supports a compound expression, which appears in declared licenses.

The definition of "SPDX expression" by SPDX is broader and covers "simple" and "compound" expressions, e.g. user defined licenses with "LicenseRef-[idstring]" (also named as "license-ref" in contrast to "license-ids", the items in official license list)), see

https://spdx.github.io/spdx-spec/v2.3/SPDX-license-expressions/
https://spdx.github.io/spdx-spec/v3.0.1/annexes/spdx-license-expressions/#overview

To make CycloneDX usable for license compliance (including OSS) I see the need to support SPDX expressions that fully support SPDX's definition.

A further suggestion:

To simplify attribution reports for humans that are generated from a CycloneDX SBOM I suggest to add also an optional "name" field like we have for the "named" license. The license identifiers we have in the SPDX License List don't require it. They already have a full name. But in case the full "SPDX expression" definition is considered including "license-refs", a name should be available in the specfication to be consistent with the SPDX ID/name list for reports.

For compatiblity and readability I suggest to stick with the "license ID/name list" and "SPDX expression" approach, but - as said - without a usage restriction.

BR,
Jörg

@jkowalleck
Copy link
Member Author

To prevent confusion, the example should show "GPL-3.0-or-later" (which is the proper SPDX identifier) instead of "GPL3+" (an ID you find in Debian copyright files).

Thank you for pointing that out. I've edited my original feature request, fixed the GPL3+ to be the correct term GPL-3.0-or-later

@jkowalleck
Copy link
Member Author

To simplify attribution reports for humans that are generated from a CycloneDX SBOM I suggest to add also an optional "name" field like we have for the "named" license. The license identifiers we have in the SPDX License List don't require it. They already have a full name. But in case the full "SPDX expression" definition is considered including "license-refs", a name should be available in the specfication to be consistent with the SPDX ID/name list for reports

great call! I've edited the original feature request example to reflect this.
please have a review.

@Joerki
Copy link

Joerki commented Dec 19, 2024

Hi,

my example looks like this:

{
  "expression": "MIT OR GPL-3.0-or-later OR LicenseRef-.amazon.com.-AmznSL-1.0",
  "acknowledgement": "declared",
  "texts": [
    {
      "license-identifier": "MIT",
      "text": {
        "content": "Copyright (c) 1984 Example org\n\n\nPermission is hereby granted, free of charge, to any person 
obtaining a copy of this software\n[...]"
      }
    },
    {
      "license-identifier": "GPL-3.0-or-later",
      "text": {
        "content": "Example project\nCopyright (C) 1984 Example org\n\nThis program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.[...]"
      }
    },
    {
      "license-identifier": "LicenseRef-.amazon.com.-AmznSL-1.0",
      "name": "Amazon Software License",
      "text": {
        "content": "Amazon Software License 1.0\\n\\n\\nThis Amazon Software License (\\"License\\") governs your use, reproduction, and\\n\\ndistribution of the accompanying software as specified below.\\n\\n\\n## 1. Definitions\\n\\n\\n  \\"Licensor\\" means any person or entity that distributes its Work.\n[...]"
      }
    }
  ]
}

Another significant reference:
BSI-TR-03183-2 Version 2.0.0 (10.10.2024)
Federal Office for Information Security (BSI)
https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TR03183/BSI-TR-03183-2-2_0_0.html
Chapter 6.1 (License identifiers and expressions)
They refer to SPDX annexes about usage, they also recommend to use the Scancode LicenseDB Aboutcode!

My conclusion:

  • Expressions and IDs (SPDX IDs, custom IDs) should not be mixed up with names and text that give additional context.
  • (Human readable) names give additional help for identification of a license inside the SBOM (to have it in the external DB like Aboutcode is not enough)
  • (human readable) names get more helpful when no reference (LicenseRef-xyz) to another source of information exists (like with SPDX ID official list or Aboutcode DB with LicenseRef-scancode-* entries that come with human readable license names)
  • The BSI gives strict rules about the identification of licenses, which means that the .licenses.license lists with SPDX ID/names are not considered applicable anymore for companies that decide to implement and reference the BSI TR.
  • License information must be understandable for humans, supporting different concepts of license declaration
    • the license declaration from authors and output of current tools is very often not sufficient for license compliance and requires manual efforts for identifcation and conclusion
    • the transfer of copyright and licensing information we find in different ecosystems into CycloneDX needs to be managable by software tools and humans if a flexible fashion
      • metadata contained in packages distributed by platforms (npm, pypi, nuget etc.)
      • copyright data distributed by Linux distributions (Debian machine-readable/not m.-r. copyright data)

@jkowalleck
Copy link
Member Author

jkowalleck commented Dec 19, 2024

re: #554 (comment)

your suggestion

    {
      "id": "LicenseRef-.amazon.com.-AmznSL-1.0",
      "name": "Amazon Software License",
      "content": "..."
    }

The "id" is planned to be the usual CycloneDX enum, see https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license_id
Therefore, it is not possible to use LicenseRef-* here. This is why my example looked this way.
Changing the spec, so id may be either enum value or an arbitrary string following [%s"DocumentRef-"(idstring)":"]%s"LicenseRef-"(idstring), is not in the scope of this very ticket. Please open an extra ticket for this matter, if needed.

Current spec allows either id or name for licenses. see https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license - OneOf/Choice -> not both
Changing the spec, so name and id may exist at the same time, is not in the scope of this very ticket. Please open an extra ticket for this matter, if needed.

@Joerki
Copy link

Joerki commented Dec 20, 2024

re: #554 (comment)

your suggestion

{
  "id": "LicenseRef-.amazon.com.-AmznSL-1.0",
  "name": "Amazon Software License",
  "content": "..."
}

The "id" is planned to be the usual CycloneDX enum, see https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license_id Therefore, it is not possible to use LicenseRef-* here.

Based on SPDX and BSI documentation referenced above, please give a proof of how CycloneDX including this approach fulfills license declaration based on legal license compliance requirements.
I do not see it with the CycloneDX spec and this enhancement.

@jkowalleck
Copy link
Member Author

Based on SPDX and BSI documentation referenced above, please give a proof of how CycloneDX including this approach fulfills license declaration based on legal license compliance requirements.
I do not see it with the CycloneDX spec and this enhancement.

which aspects do you see as not feasible?

@Joerki
Copy link

Joerki commented Jan 8, 2025

We extend the existing CycloneDX specification and should take care that the naming of elements do not have different meaning dependent in their context.

We (already) have:

  • "id": An SPDX-License-Identifier of a license that is present in the official SPDX license list, e.g. "0BSD"
  • "name": The name of a license, e.g. "Acme Software License"
  • "expression": An SPDX expression (currently seen as SPDX compound expression)
  • A license identified by "id" and "name" may come with text that identifies the (exact) license text
  • No license texts can be associated to expression items

What we need additionally:

a) An attribute that holds the license texts for items that are used in an expression context
b) An identifier attribute representing an SPDX identifier that is not limited to official "SPDX license list", but is compliant to the "simple-expression" and "simple-expression ( %s"WITH" / %s"with" ) addition-expression" definition of the "SPDX license expression" annex
c) The attributes' names shall not overload the meaning of an alraady existing one in the spec to prevent confusion
d) Dedicated counterparts to "id" and "name" and "text" are meaningful and should exist to achieve consistency with the existing definition

Point a)
The suggestion is to have a section "texts" having a "content" item representing the text.
We already have a "text" definition containing "contentType", "encoding" and "content".
I expect that a new element's context needs to be compatible with it.
An item of the list should contain the "text" in the same fashion like we have at other locations in the spec.

Point b)
I suggest "license-identifier" which is compatible to the current SPDX wording.

Point c)
In "licenses.license" context the name is a human readable label of the license (CycloneDX's example "Acme Software License"). To mix this with an item that is finally an ID causes inconsistency and has therefore protentional risks for automated processing.

Point d)
To have a "name" makes sense for labelling a license like for existing licenses.license.name item.

@Joerki
Copy link

Joerki commented Jan 9, 2025

Beyond the text, why shoudn't we support all other attributes licenses.license has to give a true alternative to licenses.license?

@jkowalleck
Copy link
Member Author

jkowalleck commented Jan 9, 2025

Beyond the text, why shoudn't we support all other attributes licenses.license has to give a true alternative to licenses.license?

I am not against such a proposal per se 😄 ,
but please keep the scope of this very ticket: this ticket is about licence text attachments - and nothing more

@jkowalleck jkowalleck changed the title feat: multiple attachments for licenses of type "expression" feat: (multiple) attachments for license texts of type "expression" Jan 9, 2025
@Joerki
Copy link

Joerki commented Jan 10, 2025

Beyond the text, why shoudn't we support all other attributes licenses.license has to give a true alternative to licenses.license?

I am not against such a proposal per se 😄 , but please keep the scope of this very ticket: this ticket is about licence text attachments - and nothing more

Yes, but an alternative for "texts" is feasible to be more generic when extending the spec.

@jkowalleck
Copy link
Member Author

We extend the existing CycloneDX specification and should take care that the naming of elements do not have different meaning dependent in their context.

i gave this some thought and i think you are completely right.
instead of re-using existing field names "id" and "name", we could go with a name the official SPDX license expression documents use.

https://spdx.github.io/spdx-spec/v3.0.1/annexes/spdx-license-expressions/

idstring = 1*(ALPHA / DIGIT / "-" / "." )

license-id = <short form license identifier from SPDX License List>

license-exception-id = <short form license exception identifier from SPDX License List>

license-ref = [%s"DocumentRef-"(idstring)":"]%s"LicenseRef-"(idstring)

addition-ref = [%s"DocumentRef-"(idstring)":"]%s"AdditionRef-"(idstring)

simple-expression = license-id / license-id"+" / license-ref

addition-expression = license-exception-id / addition-ref

compound-expression = (simple-expression /
  simple-expression ( %s"WITH" / %s"with" ) addition-expression /
  compound-expression ( %s"AND" / %s"and" ) compound-expression /
  compound-expression ( %s"OR" / %s"or" ) compound-expression /
  "(" compound-expression ")" )

license-expression = (simple-expression / compound-expression)

be aware: per SPDX definition, each "simple-expression" is also a "compound-expression"...

so from original ABNF, there is no name that comes to my head.
All comes down to "license-id" and "license-ref" and "license-exception-id" and "addition-ref".

I suggest "license-identifier" which is compatible to the current SPDX wording

I was unable to find this word anywhere.

How about the name "expression-atom"?
What do you think?

@Joerki
Copy link

Joerki commented Feb 10, 2025

We extend the existing CycloneDX specification and should take care that the naming of elements do not have different meaning dependent in their context.

i gave this some thought and i think you are completely right. instead of re-using existing field names "id" and "name", we could go with a name the official SPDX license expression documents use.
...

simple-expression = license-id / license-id"+" / license-ref

addition-expression = license-exception-id / addition-ref

compound-expression = (simple-expression /
simple-expression ( %s"WITH" / %s"with" ) addition-expression /
compound-expression ( %s"AND" / %s"and" ) compound-expression /
compound-expression ( %s"OR" / %s"or" ) compound-expression /
"(" compound-expression ")" )

license-expression = (simple-expression / compound-expression)

be aware: per SPDX definition, each "simple-expression" is also a "compound-expression"...

Interesting thought, but I think this interpretation is misleading: A compound message must contain simple expressions that form a compound element. So it has to be present in the definition.

But in license-expression = (simple-expression / compound-expression) simple-expression would be redundant.
If simple-expression is a compound-expression, license-expression = compound-expression would be sufficient. So, I see here a clear distinction.

See also the following text (with "either").

A valid <license-expression> string consists of either:

(i) a simple license expression, such as a single license identifier; or

(ii) a more complex expression constructed by combining smaller valid expressions using Boolean license operators.

so from original ABNF, there is no name that comes to my head. All comes down to "license-id" and "license-ref" and "license-exception-id" and "addition-ref".

I suggest "license-identifier" which is compatible to the current SPDX wording

I was unable to find this word anywhere.

How about the name "expression-atom"? What do you think?

This SPDX documentation takes "license identifier" at so many places (also on this page), but distinguishes between license identifiers and license references in one sentence. So, simple-expression would be correct. The term atom does not appear in SPDX's specification, so this may led into confusion.

@jkowalleck
Copy link
Member Author

jkowalleck commented Feb 11, 2025

The suggestion "license identifier" is just one of the atoms of an expression, using this here might confuse people, too.

According to the official docs, a "license identifier" is not the word that fits here, i think.

https://spdx.github.io/spdx-spec/v3.0.1/annexes/spdx-license-expressions/

A license expression could be a single license identifier found on the SPDX License List; a user defined license reference denoted by the LicenseRef-[idString]; a license identifier combined with an SPDX exception; or some combination of license identifiers, license references and exceptions constructed using a small set of defined operators (e.g., AND, OR, WITH and +).

the atoms of "license-expression" are

  • license identifier
  • license reference
  • SPDX exception

these 3 things are the smallest part of an expression, right?

SPDX exception must be combined with a license identifier.
And this result is also defined to be a valid compound-expression, which itself is a license-expression.

license identifier and license reference each are defined as simple simple-expression and also are compound-expression, which each itself is a license-expression.

This makes me think, that the term license-expression might probably be the best fitting one.

Of course, we might come up with some longer text explaining what the thing is for and how to use it - I would leave this to a pullrequest showcasing a possible solution.
I will draft a pullrequest soon, so we can annotate and discuss things right in there.

@jkowalleck
Copy link
Member Author

jkowalleck commented Feb 20, 2025

based on current discussion and ideas, I've drafted #599.
Please review, and suggest changes and improvements, so we can collaborate on something that solves this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants