-
-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Labels
good first issueNew-contributor friendlyNew-contributor friendlyhelp wantedOpen to participation from the communityOpen to participation from the community✨ goal: improvementImprovement to an existing featureImprovement to an existing feature🌟 goal: additionAddition of new featureAddition of new feature🏷 status: label work requiredNeeds proper labelling before it can be worked onNeeds proper labelling before it can be worked on💬 talk: discussionOpen for discussions and feedbackOpen for discussions and feedback💻 aspect: codeConcerns the software code in the repositoryConcerns the software code in the repository🟨 priority: mediumNot blocking but should be fixed soonNot blocking but should be fixed soon🤖 aspect: dxConcerns developers' experience with the codebaseConcerns developers' experience with the codebase
Description
Overview
First
- Read the documentation on Welcome — Creative Commons Open Source
Ways to Contribute
- Pull Requests (PRs)
- Contribute a PR for an existing issue per Contribution Guidelines — Creative Commons Open Source
- Help review PRs
- Issues
- Create a new issue recommending a data source that should be included. Include information like:
- quantity of records
- types of metadata available
- API documentation link
- API requirements and limitations
- Also see
sources.md- Sources | Openverse (any of the listed sources are potential sources for this project)
- Create a new issue related to a single script or data source:
- Scripts should be using
.env(theskumar/python-dotenv) and notquery_secrets.pyor similar - Scripts mustn't be monolithic--they should be limited to a single phase (ex. query, process, report. See [Feature] Automate Data Gathering and Analysis/Rendering #22)
- Scripts must be designed to be run from the repository root via pipenv (ex.
pipenv run PATH/SCRIPT.PY)- Script should determine its own path and set appropriate global variables (ex.
DIR_ROOT,DIR_SCRIPT)
- Script should determine its own path and set appropriate global variables (ex.
- Scripts have a lot of duplication between them. Begin a shared library (remember to keep issues as small and descrete as possible--limit each issue/PR to a single script or data source).
- Scripts should be using retries with exponential backoff (ex. [Feature] Use requests included exponential backoff #2)
- Scripts should be using
- Create a new issue recommending a data source that should be included. Include information like:
Tips
Conventions and best practices
- Always sort data and lists (both implicit and explicit) naturally (Natural sort order - Wikipedia)
- Example: sorted constants
quantifying/scripts/1-fetch/wikipedia_fetch.py
Lines 32 to 44 in 46dcd3f
# Constants FILE_LANGUAGES = os.path.join( PATHS["data_phase"], "wikipedia_count_by_languages.csv" ) HEADER_LANGUAGES = [ "LANGUAGE_CODE", "LANGUAGE_NAME_EN", "LANGUAGE_NAME", "COUNT", ] QUARTER = os.path.basename(PATHS["data_quarter"]) WIKIPEDIA_BASE_URL = "https://en.wikipedia.org/w/api.php" WIKIPEDIA_MATRIX_URL = "https://meta.wikimedia.org/w/api.php" - Example: sorted data
from operator import itemgetter
data.sort(key=itemgetter("TOOL_IDENTIFIER", "CATEGORY_CODE"))
- Example: sorted constants
- [Feature] Use requests included exponential backoff #2
- Constistent encoding (UTF-8) and newlines (unix) across platorms #217
Plot process data
Any plots in phase 3-report should graph data without significantly modifying it. This means that development of phase 2-process and phase 3-report usually needs to be done at the same time.
Put another way, there should be a 1:1 relationship between phase 3-report plots and phase 2-process CSV files.
sumanbalayar08
Metadata
Metadata
Assignees
Labels
good first issueNew-contributor friendlyNew-contributor friendlyhelp wantedOpen to participation from the communityOpen to participation from the community✨ goal: improvementImprovement to an existing featureImprovement to an existing feature🌟 goal: additionAddition of new featureAddition of new feature🏷 status: label work requiredNeeds proper labelling before it can be worked onNeeds proper labelling before it can be worked on💬 talk: discussionOpen for discussions and feedbackOpen for discussions and feedback💻 aspect: codeConcerns the software code in the repositoryConcerns the software code in the repository🟨 priority: mediumNot blocking but should be fixed soonNot blocking but should be fixed soon🤖 aspect: dxConcerns developers' experience with the codebaseConcerns developers' experience with the codebase
Type
Projects
Status
Backlog