Skip to content

Conversation

azeemshaikh38
Copy link
Contributor

  • Please check if the PR fulfills these requirements
  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    Adds framework/skeleton code for setting up PubSub batch job as discussed in Scale scorecard from 2K to 100K to a million repos #318 (comment). This PR also includes code for Master server which publishes data to the repo_request topic.

  • What is the current behavior? (You can also link to an open issue here)
    N/A

  • What is the new behavior (if this is a feature change)?
    No new behaviour added yet in this PR.

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
    No.

@github-actions
Copy link

Integration tests success for ad1438e92e7317bcb6098618f397543bb5f4481a

Copy link
Member

@naveensrinivasan naveensrinivasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large PR! For large it is recommended to open a WIP, to get early feedback.

@azeemshaikh38 azeemshaikh38 changed the title 🌱 Setup PubSub framework code. 🌱 Setup PubSub framework code May 10, 2021
@naveensrinivasan
Copy link
Member

@azeemshaikh38 Some documentation on a large PR would help with the review. The approach and your thought process.

For example here is HTTP caching PR #203 and some related docs https://github.com/ossf/scorecard#caching
#80 (comment)

I would suggest that some docs be included for review as well as others to contribute in the future.

@azeemshaikh38
Copy link
Contributor Author

Re: large PR comment. Each individual piece didn't make sense on its own, so had to wait to send this out only after all parts were in order.

Re: documentation. The PR is based on this comment - #318 (comment). What this is PR trying to achieve is - given a list of repos, "master/main.go" will send out requests to a PubSub topic. To do this, it follows these steps:

  1. Uses "data/reader.go" to read from the input file. "data/reader.go" abstracts away the format of the file itself and provides a simple HasNext(), GetNext() type interface.
  2. Uses "data/request.proto" to create a request proto. Each request corresponds to a single JSON shard file in Scale scorecard from 2K to 100K to a million repos #318 (comment).
  3. Uses "data/publisher.go" to publish to a given topic.
  4. Uses "data/blob.go" to write some metadata once all requests are successfully sent.

I have tried to be as generic as possible to avoid GoogleCloud specific logic. So as much as possible, logic is encapsulated inside gocloud.dev packages. The config/config.go serves a single place to update and use different cloud services if need be.

@naveensrinivasan
Copy link
Member

Re: large PR comment. Each individual piece didn't make sense on its own, so had to wait to send this out only after all parts were in order.

Re: documentation. The PR is based on this comment - #318 (comment). What this is PR trying to achieve is - given a list of repos, "master/main.go" will send out requests to a PubSub topic. To do this, it follows these steps:

  1. Uses "data/reader.go" to read from the input file. "data/reader.go" abstracts away the format of the file itself and provides a simple HasNext(), GetNext() type interface.
  2. Uses "data/request.proto" to create a request proto. Each request corresponds to a single JSON shard file in #318 (comment).
  3. Uses "data/publisher.go" to publish to a given topic.
  4. Uses "data/blob.go" to write some metadata once all requests are successfully sent.

I have tried to be as generic as possible to avoid GoogleCloud specific logic. So as much as possible, logic is encapsulated inside gocloud.dev packages. The config/config.go serves a single place to update and use different cloud services if need be.

Thanks this helps!

Copy link
Member

@naveensrinivasan naveensrinivasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the basic structure. Expect few things that aren't idiomatic go and few nits.

Copy link
Contributor

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! some cursory comments. will take a closer look tomorrow.

@github-actions
Copy link

Integration tests success for bcabfe7adf47012dd6b67ecee1e165c6ed036a12

@azeemshaikh38 azeemshaikh38 force-pushed the azeems/pubsub branch 2 times, most recently from abba692 to ac4e29b Compare May 11, 2021 15:05
@github-actions
Copy link

Integration tests success for abba69221dfd6cb60636ce55ad3803929b661a03

@github-actions
Copy link

Integration tests success for ac4e29b049ec43a67be9b1f127e81412878e8a20

oliverchang
oliverchang previously approved these changes May 12, 2021
Copy link
Contributor

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@github-actions
Copy link

Integration tests success for 0f01b1501a5f4dad1ef50a5f79edba08e0a82753

@naveensrinivasan
Copy link
Member

There is a binary file in this commit https://github.com/ossf/scorecard/blob/0f01b1501a5f4dad1ef50a5f79edba08e0a82753/cron/cron which could be an accidental commit.

@github-actions
Copy link

Integration tests success for 3830a1089a884df8acb1107d0f71a33cd2fef17d

Copy link
Contributor

@inferno-chromium inferno-chromium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putting approval back as a setting change had reset it.

@github-actions
Copy link

Integration tests success for 53cdbc71e51e7e7186cd4c89bd9e8d828d03bd87

@azeemshaikh38
Copy link
Contributor Author

There is a binary file in this commit https://github.com/ossf/scorecard/blob/0f01b1501a5f4dad1ef50a5f79edba08e0a82753/cron/cron which could be an accidental commit.

Deleted.

@github-actions
Copy link

Integration tests success for 5d3ee501d9198d8f93d086a2d56893df8fc9b032

@github-actions
Copy link

Integration tests success for c892d11ce339a47b9d6ad04327268d9af4659952

@github-actions
Copy link

Integration tests success for a13ab0e1e39368c0e4e9a1129ace61fdb98ad356

@azeemshaikh38 azeemshaikh38 merged commit 6437c93 into main May 14, 2021
@azeemshaikh38 azeemshaikh38 deleted the azeems/pubsub branch May 14, 2021 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants