Releases: aboutcode-org/scancode-toolkit
v31.0.1
This is a major release with important bug and security fixes, new and improved
features and API changes.
Note that we no longer support Python 3.6. Use Python 3.7+ instead.
Important API changes:
-
The data structure of the JSON output has changed for copyrights, authors
and holders. We now use a proper name for attributes and not a generic "value". -
The data structure of the JSON output has changed for packages. We now
return "package_data" package information at the manifest file-level
rather than "packages". This has all the data attributes of a "package_data"
field plus others: "package_uuid", "package_data_files" and "files".-
There is a a new top-level "packages" attribute that contains package
instances that can be aggregating data from multiple manifests. -
There is a a new top-level "dependencies" attribute that contains each
dependency instance, these can be standalone or releated to a package.
These contain a new "extra_data" object. -
There is a new resource-level attribute "for_packages" which refers to
packages through package_uuids (pURL + uuid string).
-
-
The data structure for HTML output has been changed to include emails and
urls under the "infos" object. The HTML template displays output for holders,
authors, emails, and urls into separate tables like "licenses" and "copyrights". -
The data structure for CSV output has been changed to rename the Resource
column to "path". "copyright_holder" has been renamed to "holder". The CSV
output is deprecated and will be replaced in the future by an improved tabular
format. -
The license clarity scoring plugin has been overhauled to show new license
clarity criteria. More details of the new scoring criteria are provided below. -
The functionality of the summary plugin has been imprived to provide declared
origin and license information for the codebase being scanned. The previous
summary plugin functionality has been preserved in the newtallies
plugin.
More details are provided below. -
ScanCode has adopted the new code skeleton from https://github.com/nexB/skeleton
The key change is the location of the virtual environment. It used to be
created at the root of the scancode-toolkit directory. It is now created
under thevenv
subdirectory. You mus be aware of this if you use ScanCode
from a git clone -
DatafileHandler.assemble()
,DatafileHandler.assemble_from_many()
, and
the other.assemble()
methods from the other Package handlers from
packagedcode, have been updated to yield Package items before Dependency or
Resource items. This is particulary important in the case where we are calling
theassemble()
method outside of the scancode-toolkit context, where we
need to ensure that a Package exists before we assocate a Resource or
Dependency to it.
Copyright detection:
- The data structure in the JSON is now using consistently named attributes as
opposed to plain values. - Several copyright detection bugs have been fixed.
- French and German copyright detection is improved.
- Some spurious trailing dots in holders are not stripped.
License detection:
-
There have been significant license detection rules and licenses updates:
- 107 new licenses have been added (total is now 1954)
- 6780 new license detection rules have been added (total is now 32259)
- 6753 existing false positive license rules have been removed (see below).
- The SPDX license list has been updated to the latest v3.17
-
The rule attribute "only_known_words" has been renamed to "is_continuous" and its
meaning has been updated and expanded. A rule tagged as "is_continuous" can only
be matched if there are no gaps between matched words, be they stopwords, extra
unknown or known words. This improves several false positive license detections.
The processing for "is_continous" has been merged in "key phrases" processing
below. -
Key phrases can now be defined in a RULE text by surrounding one or more words
with double curly braces{{
and}}
. When defined a RULE will only match
when the key phrases match exactly. When all the text of rule is a "key phrase",
this is the same as being "is_continuous". -
The "--unknown-licenses" option now also detects unknown licenses using a
simple and effective ngrams-based matching in area that are not matched or
weakly matched. This helps detects things that look like a license but are not
yet known as licenses. -
False positive detection of "license lists" like the lists seen in license and
package management tools has been entirely reworked. Rather than using
thousands of small false positive rules, there is a new filter to detect a
long run of license references and tags that is typical of license lists.
As a results, thousands of rules have been replaced by a simpler filter, and
the license detection is more accurate, faster and has fewer false
positives. -
The new license flag "is_generic" tags licenses that are "generic" licenses
such as "other-permissive" or "other-copyleft". This is not yet
returned in the JSON API. -
When scanning binary files, the detection of single word rules is filtered when
surrounded by gibberish or mixed case. For instance$#%$GpL$
is a false
positive and is no longer reported. -
Several rules we tagged as is_license_notice incorrectly but were references
and have been requalified as is_license_reference. All rules made of a single
ord have been requalified as is_license_reference if they were not qualified
this way. -
Matches to small license rules (with small defined as under 15 words)
that are scattered over too many lines are now filtered as false matches. -
Small, two-words matches that overlap the previous or next match by
by the word "license" and assimilated are now filtered as false matches. -
The new --licenses-reference option adds a new "licenses_reference" top
level attribute to a scan when using the JSON and YAML outputs. This contains
all the details and the full text of every license seen in a file or
package license expression of a scan. This can be added added after the fact
using the --from-json option. -
New experimental support for non-English licenses. Use the command
./scancode --reindex-licenses-for-all-languages to index all known non-English
licenses and rules. From that point on, they will be detected. Because of this
some licenses that were not tagged with their languages are now correctly
tagged and they may not be detected unless you activate this new indexing
feature.
Package detection:
-
Major changes in package detection and reporting, codebase-level attribute
packages
with one or morepackage_data
and files for the packages are reported.
The specific changes made are:-
The resource level attribute
packages
has been renamed topackage_data
,
as these are really package data that are being detected, such as manifests,
lockfiles or other package data. This has the data attributes of apackage_data
field plus others:package_uuid
,package_data_files
andfiles
. -
A new top-level attribute
packages
has been added which contains package
instances created frompackage_data
detected in the codebase. -
A new codebase level attribute
dependencies
has been added which contains dependency
instances created from lockfiles detected in the codebase. -
The package attribute
root_path
has been deleted frompackage_data
in favour
of the new format where there is no root conceptually, just a list of files for each
package. -
There is a new resource-level attribute
for_packages
which refers to
packages through package_uids (pURL + uuid string). Apackage_adder
function is now used to associate a Package to a Resource that is part of
it. This gives us the flexibility to use the packagedcode Package handlers
in other contexts wherefor_packages
on Resource is not implemented in the
same way as scancode-toolkit. -
The package_data attribute
dependencies
(which is a list of DependentPackages),
now has a new attributeresolved_package
with a package data mapping.
Also therequirement
attribute is renamed toextracted_requirement
.
There is a newextra_data
to collect extra data as needed.
-
-
For Pypi packages, python_requires is treated as a package dependency.
License Clarity Scoring Update:
-
We are moving away from the original license clarity scoring designed for
ClearlyDefined in the license clarity score plugin. The previous license
clarity scoring logic produced a score that was misleading when it would
return a low score due to the stringent scoring criteria. We are now using
more general criteria to get a sense of what provenance information has been
provided and whether or not there is a conflict in licensing between what
licenses were declared at the top-level key files and what licenses have been
detected in the files under the top-level. -
The license clarity score is a value from 0-100 calculated by combining the
weighted values determined for each of the scoring elements:-
Declared license:
- When true, indicates that the software package licensing is documented at
top-level or well-known locations in the software project, typically in a
package manifest, NOTICE, LICENSE, COPYING or README file. - Scoring Weight = 40
- When true, indicates that the software package licensing is documented at
-
Identification precision:
- Indicates how well the license statement(s) of the software identify known
licenses that can be designated by precise keys (identifiers) as provided in
a...
- Indicates how well the license statement(s) of the software identify known
-
v31.0.0rc5
This is one of the last release candidate for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared with 31.0.0rc3 in particular the ability to properly report licenses in system package scans.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc5/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed since 31 rc3
- Release 31 rc4 prep by @pombredanne in #3036
- Add package_adder argument to assemble() #3034 by @JonoYang in #3035
Full Changelog: v31.0.0rc3...v31.0.0rc5
v31.0.0rc3
This is a penultimate release candidate for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared with 31.0.0rc2.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc3/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Do not fail without packages in cyclonedx #2987 by @AyanSinhaMahapatra in #3005
- Fix relaunching scancode on Apple silicon using Rosetta 2 emulation #2835 by @MarcelBochtler in #3018
- Clarify
unknown
license keys #2827 by @AyanSinhaMahapatra in #3023 - Yield Packages before other yieldables #3028 by @pombredanne in #3031
- Prepare Release 31.0.0rc3 by @pombredanne in #3029
New Contributors
- @MarcelBochtler made their first contribution in #3018
Full Changelog: v31.0.0rc2...v31.0.0rc3
v31.0.0rc2
This is a release candidate for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared with 31.0.0rc1.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc2/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Improve npm package processing by @pombredanne in #2997
- Update license detection by @pombredanne in #2998
- Add new license rules and license - Early summer 2022 by @pombredanne in #2999
- Bump version to 31.0.0rc2 by @JonoYang in #3000
Full Changelog: v31.0.0rc1...v31.0.0rc2
v31.0.0rc1
This is a release candidate for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared with 31.0.0b5.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc1/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Add black and isort as testing dependencies #2969 by @johnmhoran in #2970
- Rename precise_license_detection field #2967 by @JonoYang in #2968
- Convert package data dict to PackageData #2971 by @JonoYang in #2973
- Update extractcode --shallow option description by @lf32 in #2959
- Support shortcut flags for cli by @lf32 in #2951
- Consider only copyrights in summry #2972 by @JonoYang in #2974
- Reimplement get installed packages by @JonoYang in #2988
- Report extracted_requirement correctly by @TG1999 in #2984
- Improve packagecode and other release prep by @pombredanne in #2992
New Contributors
Full Changelog: v31.0.0b5...v31.0.0rc1
v31.0.0b5
This is a beta release for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared by b4.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0b5/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Add link to scancode-toolkit-reference-scans by @AyanSinhaMahapatra in #2952
- Modify pypi PKG-INFO parse by @AyanSinhaMahapatra in #2953
- Prepare Release 31.b5 by @pombredanne in #2962
Full Changelog: v31.0.0b4...v31.0.0b5
v31.0.0b4
This is a beta release for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared by b3.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0b4/CHANGELOG.rst for an overview of the changes.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Populate for packages field correctly #2929 by @JonoYang in #2939
- Prepare Release 31b4 by @pombredanne in #2941
- Duplicated dependencies package results by @JonoYang in #2944
- Prepare Release 31b4 by @pombredanne in #2947
Full Changelog: v31.0.0b3...v31.0.0b4
v31.0.0b3 - 2022-04-30
This is a beta release for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0b3/CHANGELOG.rst for an overview of the changes.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Report
packages
at top level with file levelpackage_manifests
by @AyanSinhaMahapatra in #2710 - Updated install.rst by @beastrun12j in #2722
- Omnibus fall license improvements by @pombredanne in #2706
- Improve license detection by @pombredanne in #2737
- api.get_licenses: clarify and improve docstring for "min_score" argument by @zacchiro in #2763
- rules with "unqualified" license names are references, not notices by @petergardfjall in #2759
- Fix invalid license yaml files by resolving duplicated keys by @fangxlmr in #2776
- Fix azure pipeline vmimage deprecations by @AyanSinhaMahapatra in #2775
- Allow license rules to require the presence of certain defining keywords by @mrombout in #2773
- Add first draft ROADMAP by @pombredanne in #2736
- Add CycloneDx output option by @agschrei in #2698
- Remove regular expression futurewarning by @soimkim in #2788
- fix docstring in debian_copyright.py by @adii21-Ux in #2786
- fixes missing whitespace in prerequisites list by @altsalt in #2778
- Add PackageManifest Class by @AyanSinhaMahapatra in #2748
- Add new licenses and new detection rules by @pombredanne in #2765
- Rename first column of csv output to "path" by @JRavi2 in #2016
- Detect unknown licenses #1675 by @akugarg in #2592
- Improve copyright handling #2350 by @pombredanne in #2791
- Fixing OSI identifier for BSD-3-Clause; see also SPDX license metadata by @karsten-klein in #2797
- Fix GPL license detection false positive #2793 by @KevinJi22 in #2799
- 2789 inconsistent doc html app by @kunalchhabra37 in #2795
- Fixed inconsistency in --html-app FILE in cli-reference by @maynaS in #2790
- Replace freenode references with libera chat by @purna135 in #2816
- Adopt nexB/skeleton and bump dependencies by @pombredanne in #2818
- Fix bug recognizing license as license_notice instead of license_text by @adii21-Ux in #2817
- Fix incorrect license detection #2777 by @KevinJi22 in #2811
- Remove skeleton from docs by @AyanSinhaMahapatra in #2830
- Detect SPDX-FileContributor tags as authors by @pombredanne in #2838
- New license and copyright rule by @adii21-Ux in #2837
- Add key phrase tags to GPL detection rule by @pombredanne in #2821
- Make --version output valid YAML for parsing #2856 by @KevinJi22 in #2858
- Add Direct Note for Windows Users (New Comers) by @OsmiumOP in #2857
- Fixed Typo in Documentation by @OsmiumOP in #2862
- Remove version check locally by @adii21-Ux in #2860
- License improvement winter 2022 by @pombredanne in #2828
- Update link to documentation by @AyanSinhaMahapatra in #2867
- Improve license detection by @pombredanne in #2871
- Detect dependencies from build.gradle files by @pombredanne in #2822
- Fix small typo inside notes snippet by @Harshil-Jani in #2829
- Add Package Instances #2691 by @AyanSinhaMahapatra in #2825
- Improve license clarity scoring by @pombredanne in #2875
- Do not raise exception on package data mismatch #2886 by @AyanSinhaMahapatra in #2887
- Release 31 by @pombredanne in #2888
- Add primary license in summary by @JonoYang in #2884
- Remove usage of get_terminal_size in click by @AyanSinhaMahapatra in #2916
- Fix doc builds by @AyanSinhaMahapatra in #2896
- Update summary plugin by @JonoYang in #2914
- Shorten long file names by @pombredanne in #2918
- Added new copyright test cases by @abhishak3 in #2891
- Add system packages support in the new packages model by @AyanSinhaMahapatra in #2909
- Fix typo in summary: ambigous->ambiguous by @pombredanne in #2922
- Add system environment to scan headers by @pombredanne in #2923
- Update METADATA.bzl parser by @JonoYang in #2924
- Spring 2022 license updates by @pombredanne in #2921
- Process single package data file correctly by @pombredanne in #2933
- Fix package/dependency creation bugs by @AyanSinhaMahapatra in #2932
New Contributors
- @beastrun12j made their first contribution in #2722
- @zacchiro made their first contribution in #2763
- @fangxlmr made their first contribution in #2776
- @mrombout made their first contribution in #2773
- @agschrei made their first contribution in #2698
- @soimkim made their first contribution in #2788
- @adii21-Ux made their first contribution in #2786
- @altsalt made their first contribution in #2778
- @karsten-klein made their first contribution in #2797
- @KevinJi22 made their first contribution in #2799
- @kunalchhabra37 made their first contribution in #2795
- @maynaS made their first contribution in #2790
- @purna135 made their first contribution in #2816
- @OsmiumOP made their first contribution in #2857
- @Harshil-Jani made their first contribution in #2829
- @abhishak3 made their first contribution in #2891
Full Changelog: v30.1.0...v31.0.0b3
v30.1.0 - 2021-09-25
This is a bug fix release for these bugs:
We now return the package in the summaries as before.
There is also a minor API change: we no longer return a count of "null" empty
values in the summaries for license, copyrights, etc.
Thank you to:
- Thomas Druez @tdruez
See also https://github.com/nexB/scancode-toolkit/tree/v30.0.0 for details on the main changes in v30.0.x
What's Changed
- Prepare bugfix release 30.0.1 #2713 by @pombredanne in #2715
- Return package details in summary #2717 by @pombredanne in #2718
Full Changelog: v30.0.1...v30.1.0
v30.0.1 - 2021-09-24
This is a minor bug fix release for these bugs:
We now correctly work with all supported Click versions.
Thank you to:
See also https://github.com/nexB/scancode-toolkit/tree/v30.0.0 for details on the main changes in v30.0.x
Full Changelog: v30.0.0...v30.0.1