Skip to content

Conversation

VoLKyyyOG
Copy link

@VoLKyyyOG VoLKyyyOG commented Jun 14, 2022

Summary of Changes

Hey team, thanks for reviewing this PR!

Whilst I have heavily modified Amundsen's logic in our company repository (Afterpay, as part of Square/Block), I figured I'd add some of these generalised changes directly into the core repo as they greatly helped us make Amundsen more flexible and deliver our vision.

The motivation originates from #1828 which makes the original code quite convoluted and creates duplicated nodes/records. However, with #1877 being an epic speed-up to the original Neo4JCSV process (hopefully the PR is merged soon), the original naive method of duplicating TableMetadata for multiple programmatic descriptions does not work.

The changes here aim to mitigate this issue and allow multiple programmatic descriptions to be added and also benefit from the significantly reduced runtime using APOC.

Change Log:

  • Added support Table level Badges
  • Added support for multiple programmatic descriptions to reduce duplicate calls to TableMetadata
  • Added a sample script under example/scripts/sample_base_postgres_metadata_extractor for examples
  • Added some extra comments that I felt were useful when reading through the code (they were quite sparse)

Example Changes:
(all sensitive info has been removed)

  • Table Badges

image

  • Programmatic Descriptions

image

Tests

The code has been tested and deployed in our prod environment and I am more than happy to have a quick knowledge share on my changes / tests. This consists of over 40 thousand tables and 500,000 thousand columns.

However testing from the core repo side would be appreciated as our Amundsen is greatly modified and as such, our environment will not capture all test cases.

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

@VoLKyyyOG VoLKyyyOG requested a review from a team as a code owner June 14, 2022 07:09
@boring-cyborg boring-cyborg bot added area:databuilder From databuilder folder category:models labels Jun 14, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 14, 2022

Congratulations on your first Pull Request and welcome to Amundsen community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/amundsen-io/amundsen/blob/main/CONTRIBUTING.md)

@VoLKyyyOG
Copy link
Author

Damn can't get this sign-off to work, would be appreciated if a core repo owner can help me resolve it

@VoLKyyyOG
Copy link
Author

@chonyy would appreciate if you can take a look through this PR as well!

@chonyy
Copy link
Contributor

chonyy commented Jun 14, 2022

Hey @akiratwang , you will have to sign-off every commit in order to pass the DCO 😅
And then you will need a maintainer's approval to get the unit test workflow started.

@feng-tao feng-tao requested a review from allisonsuarez June 14, 2022 16:04
@feng-tao feng-tao added the keep fresh Disables stalebot from closing an issue label Jun 14, 2022
@VoLKyyyOG
Copy link
Author

Hey @akiratwang , you will have to sign-off every commit in order to pass the DCO 😅 And then you will need a maintainer's approval to get the unit test workflow started.

Wait rip as hahaha. Would it be better off to just create a new branch? I'll resolve the flake8 issues as well in the new PR then.

@VoLKyyyOG VoLKyyyOG closed this Jun 15, 2022
@VoLKyyyOG VoLKyyyOG reopened this Jun 15, 2022
@VoLKyyyOG VoLKyyyOG closed this Jun 15, 2022
@VoLKyyyOG VoLKyyyOG deleted the feature/add_multi_programmatic_descriptions branch June 15, 2022 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:databuilder From databuilder folder keep fresh Disables stalebot from closing an issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants