-
Notifications
You must be signed in to change notification settings - Fork 89
Issue 520 #632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 520 #632
Conversation
Someone is attempting to deploy a commit to the Clean and Green Philly Team on Vercel. A member of the Team first needs to authorize it. |
… confusing extra info from the output
Moving this conversation about primary keys here. @nlebovits- Is it correct to say that @zigouras- can you document how the tests work and how to run them? |
@brandonfcohen1 the tests are commented, indicating what each one does. Some of them are POCs, just running code to see the output. Some have assertions. I would not run them as part of CI/CD but they are a good place for people to look to understand the underlying code. |
What does the |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zigouras I'm having a hard time following what is happening in your code. Best I can tell script.py
has all of the actual data loading commented out and it's replaced elsewhere.
Can you add to the readme and/or PR what is happening here?
@brandonfcohen1 you are probably looking at an earlier commit in this PR. Look at the latest commit. The backend setup doc describes the new backup and diff process. |
OK then we have to figure out how to enforce a primary key on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got this running the new script for the first time:
RuntimeError: data-diff command did not exit with success. Traceback (most recent call last):
File "/root/.local/share/virtualenvs/app-lp47FrbD/bin/data-diff", line 8, in <module>
sys.exit(main())
^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/__main__.py", line 342, in main
_data_diff(dbt_project_dir=project_dir_override, dbt_profiles_dir=profiles_dir_override, state=state, **kw)
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/__main__.py", line 611, in _data_diff
_print_result(stats, json_output, diff_iter)
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/__main__.py", line 423, in _print_result
rich.print(diff_iter.get_stats_string())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/diff_tables.py", line 139, in get_stats_string
diff_stats = self._get_stats(is_dbt)
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/diff_tables.py", line 100, in _get_stats
list(self) # Consume the iterator into result_list, if we haven't already
^^^^^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/diff_tables.py", line 95, in __iter__
for i in self.diff:
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/diff_tables.py", line 266, in _diff_tables_wrapper
raise error
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/diff_tables.py", line 239, in _diff_tables_wrapper
yield from self._diff_tables_root(table1, table2, info_tree)
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/joindiff_tables.py", line 164, in _diff_tables_root
yield from self._bisect_and_diff_tables(table1, table2, info_tree)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.11/site-packages/data_diff/diff_tables.py", line 299, in _bisect_and_diff_tables
raise NotImplementedError(f"Cannot use a column of type {kt} as a key")
NotImplementedError: Cannot use a column of type Text(_notes=[], collation=None) as a key
This happens the first time you run it because there are null opa_ids in the vacant_properties table. To rerun, drop the backup_ schema, |
can you include instructions on how to run
got it, can you include in documentation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you include more details on how to set up and test email/slack locally
Per the backend doc:
The unit tests I am still waiting on @nlebovits to get the API key to write to our clean-and-green-philly-back-end Slack channel. In the meantime you could create your own private Slack channel and create an app API key to test writing to it per these instructions: |
I added a section in the backend doc for this. |
…red with the -w flag in data-diff
@brandonfcohen1 I changed the way the data-diff is done so it does not create the primary key constraints in the pg db or need any preliminary clean up of data with sql. This keeps your existing script logic and feature layer classes unchanged and is a more extensible, loosely coupled solution. I leveraged the -w flag in the data-diff program to apply a where clause to limit the records being compared. Please look at my latest commit to this PR. You should be able to run the |
@nlebovits is this something you can look at if Brandon can't? |
Implementation of issue #520. Backup of database, archiving and diff reporting.