Skip to content

Conversation

naveensrinivasan
Copy link
Member

  • Please check if the PR fulfills these requirements
  • Tests for the changes have been added (for bug fixes/features)

https://github.com/google/go-github supports Conditional requests
https://github.com/google/go-github#conditional-requests

As we are scaling more and more projects this would add a lot of value.

Initial run fetches information using httpcache as a middleware,
which caches the HTTP response initially in a large disk (PVC),
probably move to Redis later as a cache instead of disk.

Subsequent cron runs will utilize the httpcache for checking content modification and
load it from the cache if it isn't modified, which reduces the hitting the
Rate Limit of the GitHub API.

Also fixed the golang-ci warnings.

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
    None

  • Other information:

Subsequent cache runs on 50 repositories takes about 18 minutes with 3 GitHub tokens

❯ time ./cron.sh
github.com/18F/identity-idp
github.com/1N3/Sn1per
github.com/24pullrequests/24pullrequests
github.com/AElfProject/AElf
github.com/Activiti/Activiti
github.com/AdguardTeam/AdguardFilters
github.com/AlchemyCMS/alchemy_cms
github.com/AliasIO/wappalyzer
github.com/Amanieu/parking_lot
github.com/abonas/kubeclient
github.com/abpframework/abp
github.com/acassen/keepalived
github.com/acmesh-official/acme.sh
github.com/actions/runner
github.com/activeadmin/activeadmin
github.com/activemerchant/active_merchant
github.com/activerecord-hackery/ransack
github.com/activescaffold/active_scaffold
github.com/actix/actix
github.com/actix/actix-extras
github.com/actix/actix-net
github.com/actix/actix-web
github.com/adafruit/Adafruit_CircuitPython_Bundle
github.com/adempiere/adempiere
github.com/afragen/github-updater
github.com/aframevr/aframe
github.com/ai/autoprefixer-rails
github.com/aio-libs/aiohttp
github.com/airbnb/javascript
github.com/airbrake/airbrake
github.com/akaunting/akaunting
github.com/akeneo/pim-community-dev
github.com/akka/akka
github.com/akka/akka-http
github.com/akkadotnet/akka.net
github.com/alacritty/alacritty
github.com/alanmcgovern/monotorrent
github.com/alexcrichton/cc-rs
github.com/alexcrichton/curl-rust
github.com/alexcrichton/proc-macro2
github.com/alexcrichton/toml-rs
github.com/alexreisner/geocoder
github.com/alibaba/dragonwell8
github.com/alibaba/druid
github.com/alibaba/fastjson
github.com/alibaba/nacos
github.com/allenai/allennlp
github.com/allinurl/goaccess
github.com/alphagov/whitehall
github.com/alpinelinux/aports
./cron.sh  46.68s user 19.95s system 6% cpu 18:26.70 total

Folder size

du -sh ./cache
801M	

Files in the folder

find ./cache -type f | wc -l
10347

@@ -7,10 +7,11 @@ require (
github.com/golang/protobuf v1.4.3 // indirect
github.com/google/go-github/v32 v32.1.0
github.com/kr/text v0.2.0 // indirect
github.com/naveensrinivasan/httpcache v1.2.1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fork of gregjones/httpcache#104 this PR.

@github-actions
Copy link

Integration tests failure for 35f3b88dbb4e5dad84ca2916dc6db3d8e7e32d05

@github-actions
Copy link

Integration tests success for 35f3b88dbb4e5dad84ca2916dc6db3d8e7e32d05

@github-actions
Copy link

Integration tests success for 73ebc1a54963240d4ff9241dce169df0f5131478

@github-actions
Copy link

Integration tests failure for e5b609b096fc67b393e928b3245a122f3919ef31

@github-actions
Copy link

Integration tests success for e5b609b096fc67b393e928b3245a122f3919ef31

Copy link
Contributor

@dlorenc dlorenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple tiny nits, LGTM!

@@ -153,6 +152,20 @@ Signed-Releases: Fail 0
Signed-Tags: Fail 10
```

### Caching

Scorecard uses `httpcache` with <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests> for caching httpresponse. The default cache is in-memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the etags stuff in GitHub - the real benefit is avoiding the API quota

// shouldUseDiskCache checks the env variables USE_DISK_CACHE and DISK_CACHE_PATH to determine if
// disk should be used for caching.
func shouldUseDiskCache() (string, bool) {
if isDiskCache := os.Getenv(UseDiskCache); isDiskCache != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can avoid this if statement and go straight into ParseBool since "" parses as false.

Copy link
Contributor

@inferno-chromium inferno-chromium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! This is very exciting.

.gitignore Outdated
@@ -23,3 +23,6 @@ results.json

# tools
bin

#temp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this.

README.md Outdated

To use disk cache two env variables have to be set `USE_DISK_CACHE=true` and `DISK_CACHE_PATH=./cache`.

There are not TTL on cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"There is no"

}
}
}
return "", false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe nil instead of ""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't do nil for a string in go

t.Parallel()
tests := []struct {
name string
want string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/want/diskCachePath
s/want1/useDiskCache

The GitHub API supports conditional requests
https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests

https://github.com/google/go-github supports Conditional requests
https://github.com/google/go-github#conditional-requests

As we are scaling more and more projects this would add a lot of value.

Initial run fetches information using `httpcache` as a middleware,
which caches the HTTP response initially in a large disk (PVC),
probably move to Redis later as a cache instead of disk.

Subsequent `cron runs` will utilize the `httpcache` for checking content modification and
load it from the cache if it isn't modified, which reduces the hitting the
Rate Limit of the GitHub API.
@github-actions
Copy link

Integration tests success for 9645825ac6716a3a988b682ddc00145bb62695df

@naveensrinivasan naveensrinivasan merged commit db81680 into main Feb 22, 2021
@naveensrinivasan naveensrinivasan deleted the feat/caching branch January 14, 2022 22:52
justaugustus pushed a commit to justaugustus/scorecard that referenced this pull request May 25, 2022
* set GITHUB_TOKEN as default token

* updates

* Update doc

* Update doc

* updates

* updates

* update

* update

* update

* update

* updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants