-
-
Notifications
You must be signed in to change notification settings - Fork 65
Labels
✨ goal: improvementImprovement to an existing featureImprovement to an existing feature💻 aspect: codeConcerns the software code in the repositoryConcerns the software code in the repository🚦 status: awaiting triageHas not been triaged & therefore, not ready for workHas not been triaged & therefore, not ready for work🟩 priority: lowLow priority and doesn't need to be rushedLow priority and doesn't need to be rushed
Description
Problem
Zenodo is a major repository for open access research outputs with 5.5M+ records, but is not currently included in our commons quantification project. Adding Zenodo would significantly expand our coverage of Creative Commons licensed content, particularly in academic and research domains.
Description
Implement data collection from Zenodo using their REST API to gather license information for quantifying the commons. This involves:
- Fetching records with structured license metadata
- Classifying Creative Commons and other open licenses
- Generating reports by year, resource type, and language
- Handling API rate limiting and pagination
Zenodo Useful Links
Official Documentation
- REST API Documentation: https://developers.zenodo.org/
- API Reference: https://zenodo.org/api/docs
- General Documentation: https://help.zenodo.org/
- Developer Documentation: https://developers.zenodo.org/
- Search Guide: https://help.zenodo.org/guides/search/
- Zenodo Homepage: https://zenodo.org/
API Endpoints
- Base URL:
https://zenodo.org/api/records - Records Search:
https://zenodo.org/api/records - Single Record:
https://zenodo.org/api/records/{id} - Communities:
https://zenodo.org/api/communities
Technical Details
Query Strategy
GET https://zenodo.org/api/records?q=*&size=100&page=1&sort=bestmatch
Parameters:
q: Query string (use*for all records)size: Records per page (300) implementation choicepage: Page number for paginationsort: Sorting method (bestmatch recommended)
API Types Available
-
REST API (Recommended)
- Format: JSON
- Authentication: None required for public records
- Structured license data:
metadata.license.id
-
OAI-PMH (Not recommended)
- Format: XML Dublin Core
- Unreliable license parsing from free-text fields
(dc:rights)
Key Metadata Fields
- License:
metadata.license.id(structured, e.g., "cc-by-4.0") - Access Rights:
metadata.access_right("open", "restricted", "embargoed") - Publication Date:
metadata.publication_date(ISO format) - Resource Type:
metadata.resource_type.title - Language:
metadata.language(ISO codes)
Implementation
- I would be interested in implementing this feature.
Metadata
Metadata
Assignees
Labels
✨ goal: improvementImprovement to an existing featureImprovement to an existing feature💻 aspect: codeConcerns the software code in the repositoryConcerns the software code in the repository🚦 status: awaiting triageHas not been triaged & therefore, not ready for workHas not been triaged & therefore, not ready for work🟩 priority: lowLow priority and doesn't need to be rushedLow priority and doesn't need to be rushed
Type
Projects
Status
Triage