-
Notifications
You must be signed in to change notification settings - Fork 19
Adding SPDX 3.0 Support
A Google Summer of Code (GSoC) 2025 report, Arthit Suriyawongkul
- The design is based on the existing architecture of ntia-conformance-checker to keep backward compatibility as the program is used by other workflows, including SPDX Online Tools at https://tools.spdx.org/app/ntia_checker/, and breaking changes are not desirable.
- Parsing is done by external libraries
- For SPDX 2 files, use spdx-tools https://github.com/spdx/tools-python
- For SPDX 3 files, use spdx-python-model https://github.com/spdx/spdx-python-model
- BaseChecker class main responsibility is to walk the model and extract pieces of information that are necessary for the conformance testing in the next step.
- The logic of conformance testing will not be inside the
BaseChecker, but rather implemented inside its subclasses (NTIACheckerandFSCT3Checker) - This way it is possible to separate the information extraction code from the conformance testing code. The separation also allowed us to reuse a large chunk of code.
- The differences of SPDX 2 and SPDX 3 model, and relevant information extraction tasks (like graph traversing), are also shielded away from the subclasses, as they are all handled in the
BaseChecker.
Main information extracted by the BaseChecker are:
sbom_name: str-
doc_version: bool(Does the SBOM contain a SPDX specification version?) -
doc_author: bool(Does the SBOM contain author name?) -
doc_timestamp: bool(Does the SBOM contain creation timestamp?) -
dependency_relationships: bool(Does the SBOM contain DESCRIBES relationship?)- Note that in SPDX 3 implementation, we implied that if a
rootElementof/Core/Bomis an object of type/Software/Package, it is considered that it is a DESCRIBES relationship that we are looking for.
- Note that in SPDX 3 implementation, we implied that if a
components_without_names: List[str]components_without_versions: List[str]components_without_suppliers: List[str]components_without_identifiers: List[str]components_without_concluded_licenses: List[str]components_without_copyright_texts: List[str]
These extracted pieces of information will be consumed by the check_compliance() method in the subclasses, where the conformance testing logic resides.
For the extraction code, please look at methods starting with “check_” (return boolean) and “get_” (return some value) in BaseChecker.
Note that some of these information extraction methods may not ever actually executed in practice, since they could be considered mandatory per the specification (for example, an identifier (spdxId) is mandatory in SPDX 3) and the parser or deserialiser may choose to raise an error and terminate the process – hence, the whole subsequence information process will never started.
To be able to capture missing mandatory fields/relationships, we may need a specialised parser that can go through as much as possible an invalid/incomplete document. This could be a project for next year.
A number of unit tests and related test data has been introduced. Test data are adapted from the examples on spdx-examples repo at https://github.com/spdx/spdx-examples and kept in tests/data/spdx3 directory with a README.
New command line options and shortcuts are introduced but keep strictly backward compatibility. Existing scripts should not be affected by this update.
--sbom-spec option is introduced for letting the sbomcheck CLI knows that we are now working with SPDX 3 file, as the file extension alone may not enough to differentiate the specification version (both SPDX 2 and SPDX 3 have files with .json extension, for example). To use SPDX 3, add --sbom-spec spdx3 to the command. It is possible to work more on the detection of the SPDX version from its content.
The web interface, like one at SPDX Online Tools, needed to be updated in order to allow users to select SPDX 3 file as an input.
During the preparation process before the direct contribution to ntia-conformance-checker itself, the author takes a considerable amount of time to understand how to use the spdx-python-model library, as the documentation is quite limited at the time.
Due to the limited documentation, the author chose to read into the actual source code of the library. It turns out that most of the code in spdx-python-model is generated code by a tool called shacl2code https://github.com/jpewdev/shacl2code/. The tool is developed by Joshua Watt, an active SPDX contributor and also the developer of the spdx-python-model Python binding.
This preparation and learning process in the end led to two indirect contributions to SPDX ecosystem in general:
- A Python notebook “Using SPDX 3 in Python with spdx-python-model” as a short tutorial for people to get into the SPDX 3 development in Python https://gist.github.com/bact/7227ad858500c2097a25344a4af015d6
- Improved generated Python binding code from shacl2code. With guidance from Joshua, the author has added type annotations and defensive dynamic type checks to the generated code. Static type analysis tests, using widely type checkers: mypy, Pyrefly, pyright, and pytype, is also added to the pytest suite of shacl2code to ensure higher type consistency in the generated Python code, way forward. The PR is here: https://github.com/JPEWdev/shacl2code/pull/50
At the end of the preparation process, the author has a prototype to get the general idea on how to manipulate the SPDX 3 JSON. The code is at https://github.com/bact/spdx3reader.
- The EU AI Act documentation test is not implemented yet. The author is intended to continue working on this feature.
- A more detailed mapping between the SPDX field and the regulation information obligations is still under development. The author is looking forward to implementing a basic conformance test for EU AI Act documentation. For example, based on this mapping https://docs.google.com/spreadsheets/d/15M-y6ibBeg1NtYCyNCtrVdj9fi4aINhA-aUr-ql5Y-A/edit?usp=sharing which is partly reviewed by the SPDX AI working group.
Add SPDX 3.0 support
- #277 https://github.com/spdx/ntia-conformance-checker/pull/277
- #282 https://github.com/spdx/ntia-conformance-checker/pull/282
- #284 https://github.com/spdx/ntia-conformance-checker/pull/284
- #290 https://github.com/spdx/ntia-conformance-checker/pull/290
Improve validation message printout
- #274 https://github.com/spdx/ntia-conformance-checker/pull/274
- #276 https://github.com/spdx/ntia-conformance-checker/pull/276
Fix FSCT3 output to include copyright texts and concluded licenses information
Improve type hints
- #272 https://github.com/spdx/ntia-conformance-checker/pull/272
- #281 https://github.com/spdx/ntia-conformance-checker/pull/281
Mapping of NTIA and FSCT3 data fields to SPDX 3.0
Update license metadata in pyproject.toml
spdx-python-model tutorial
- Tutorial (Python Notebook) https://gist.github.com/bact/7227ad858500c2097a25344a4af015d6
Update license metadata in pyproject.toml
Add Python type hints to generated model code
The author would like to thank John Speed Meyers and Gary O'Neall for their mentorship, Joshua Watt for spdx-python-model, shacl2code and the code review, the teams behind ntia-conformance-checker and spdx-tools (present and in the past) for something to build upon, the Google Summer of Code program, the Linux Foundation, the SPDX community, and ADAPT Centre at Trinity College Dublin for the opportunity.
Most of the author's contributions during GSoC 2025 were made during conferences and travels in Bangkok, Chiang Mai, and Kuala Lumpur, as the author enjoyed the freedom of movement during his Ph.D. “breaks”. The author extends solidarity to those affected by forced emigration or those confined in occupied areas.