-
Notifications
You must be signed in to change notification settings - Fork 15
Add primary key recommendation for tables with UNIQUE NOT NULL columns #2985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
var issues []QueryIssue | ||
|
||
// Skip for partitioned or inherited tables, as PK rules differ | ||
shouldSkip := func(table string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a TODO for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a ticket and TODO for this: https://yugabyte.atlassian.net/browse/DB-18077
return false | ||
} | ||
|
||
for table, uLists := range p.tableUniqueConstraints { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, ideally, we should be looking at unique indexes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO and created a ticket for this too for now since the refactor is a big change:
https://yugabyte.atlassian.net/browse/DB-18078
// Primary key columns by table (qualified table name) | ||
tablePrimaryKeys map[string][]string | ||
// Unique constraint columns by table (list of column sets) | ||
tableUniqueConstraints map[string][][]string | ||
// NOT NULL columns by table | ||
tableNotNullColumns map[string]map[string]bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's create a TableMetadata struct that stores all the relevant information for issue detection rather than creating a map for every use-case.
rough notes from discussion:
map[table_name]Table
struct Table:
hasPk : true/false
constraints: []
indexes: []
columns:
column1:
datatype: varchar
IsNotNull: true
constraints: []
getPkRecommendation:
for table in tables:
if not table.hasPk:
candidatePKs := []
for index in table.indexes:
// check if unique index
// check if all are not null.
*/
// type TableMetadata struct {
// HasPk bool
// PkColumnNames []string
// Columns []ColumnMetadata
// Constraints []ConstraintMetadata
// NotNullColumns []string
// }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done. I have consolidated a lot of variables in TableMetadata or am deriving them using info stored in TableMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level looks good! minor comments
@priyanshi-yb can you also take a look? I skipped the part in ddl_processor.go
|
||
// Recommend PK when UNIQUE + all NOT NULL but no PK exists | ||
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL = "MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL" | ||
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Recommend adding primary key on unique NOT NULL column(s)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this "Missing Primary Key For Table". The issue name field should ideally descirbe the issue, not the recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Missing primary key for table when unique and not null columns exist"
Made this the issue name. Does this look fine?
// Recommend PK when UNIQUE + all NOT NULL but no PK exists | ||
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL = "MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL" | ||
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Recommend adding primary key on unique NOT NULL column(s)" | ||
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "Table has a UNIQUE constraint on column(s) that are all NOT NULL but does not define a PRIMARY KEY. Consider creating a primary key on these column(s) to improve data integrity and performance." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more informative here:
(taken from https://docs.yugabyte.com/preview/api/ysql/the-sql-language/statements/ddl_create_table/#primary-key:~:text=a%20primary%20key-,PostgreSQL%27s%20table%20storage%20is%20heap%2Doriented%E2%80%94so%20a%20table%20with%20no,.,-Foreign%20key)
PostgreSQL's table storage is heap-oriented—so a table with no primary key is viable. However YugabyteDB's table storage is index-oriented (see DocDB Persistence), so a table isn't viable without a primary key.
Therefore, if you don't specify a primary key at table-creation time, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH.
However, if there are candidate primary keys (unique keys + not NULL), it is recommended to define them as a Primary Key, so that a secondary index structure can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have written this since most of seemed to add to the description:
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "PostgreSQL's table storage is heap-oriented, however, YugabyteDB's table storage is index-oriented. So, a table without a primary key is viable in PostgreSQL but isn't in YugabyteDB. If you don't specify a primary key, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH. However, if there are candidate primary keys (unique + NOT NULL), it is recommended to define them as a Primary Key to avoid secondary index structures."
Let me know if we should shorten this or remove some of it.
tableIndexes map[string][]*queryparser.Index | ||
// Table metadata consolidated into a single structure | ||
// Key is qualified table name (schema.table), value is TableMetadata | ||
tableMetadata map[string]*TableMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: tablesMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// list of foreign key constraints in the exported schema | ||
foreignKeyConstraints []ForeignKeyConstraint | ||
// Helper methods for ParserIssueDetector to work with TableMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we put the struct methods below the struct definition just for better organization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I moved the ParserIssueDetector methods below the struct definition
// key is table name, value is map of column name to ColumnMetadata | ||
columnMetadata map[string]map[string]*ColumnMetadata | ||
// TableMetadata stores all relevant information for a table needed for issue detection | ||
type TableMetadata struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks much more organized now! cc: @priyanshi-yb
} | ||
} | ||
} | ||
if len(alter.DropNotNullColumns) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just for safety, correct? ideally we wouldn't expect DROP statements in the pg_dump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is what I was discussing with you regarding whether we should be parsing things at both TABLE and ALTER levels etc. We won't see them since these will be baked into the column definitions only in the dump
} | ||
if len(alter.DropNotNullColumns) > 0 { | ||
qualifiedTable := alter.GetObjectName() | ||
if tm := p.getTableMetadata(qualifiedTable); tm != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: why getTableMetadata
, and not getOrCreateTableMetadata
here like in the other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the DropNotNullColumns case, we only modify existing columns. So if the table doesn't exist, there's nothing to modify, so getTableMetadata felt more appropriate.
But yeah two things here. First, we should probably never encounter a case where the table metadata is not defined while reading alter statements since they are all after the table definitions in the dump.
And secondly even if this case happens maybe its better to track the columns which haven't been created in the metadata.
I have switched to using getOrCreateTableMetadata. Either way I don't think there will be much problem because of the first point.
for _, constraint := range table.Constraints { | ||
if constraint.ConstraintType != queryparser.FOREIGN_CONSTR_TYPE { | ||
continue | ||
switch constraint.ConstraintType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could add helper method(s) to TableMetadata to addConstraint(type, name, columns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah makes sense
return q.ObjectName + "|" + joined | ||
} | ||
|
||
// Helper function to sort QueryIssues for consistent comparison |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check out assert.ElementsMatch()
. I think that'd avoid having to do all of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think elements match works well on shallow objects. But in this we have nested lists etc like column options which will still require some sort of sorting to check properly.
I have simplified the checking logic a little more though
expected: nil, | ||
}, | ||
{ | ||
name: "PKREC: No recommendation if PK exists", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more case comes to mind:
PK (a), unique(b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one minor comment!
Pls get the alter logic reviewed by Priyanshi
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Recommend adding primary key on unique NOT NULL column(s)" | ||
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "Table has a UNIQUE constraint on column(s) that are all NOT NULL but does not define a PRIMARY KEY. Consider creating a primary key on these column(s) to improve data integrity and performance." | ||
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Missing primary key for table when unique and not null columns exist" | ||
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "PostgreSQL's table storage is heap-oriented, however, YugabyteDB's table storage is index-oriented. So, a table without a primary key is viable in PostgreSQL but isn't in YugabyteDB. If you don't specify a primary key, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH. However, if there are candidate primary keys (unique + NOT NULL), it is recommended to define them as a Primary Key to avoid secondary index structures." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the Postgres-related stuff. Because our issue description should not assume the source DB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for ddl_processor changes
@@ -862,6 +886,9 @@ type AlterTable struct { | |||
IsDeferrable bool | |||
ConstraintColumns []string | |||
PartitionedChild string // In case this is a partitioned table | |||
// In case the ALTER TABLE contains multiple subcommands, collect all SET/DROP NOT NULL columns | |||
SetNotNullColumns string | |||
DropNotNullColumns string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SetNotNullColumn
, DropNotNullColumn
Describe the changes in this pull request
This PR adds a new performance issue detection that recommends adding primary keys to tables with UNIQUE constraints on NOT NULL columns but no primary key defined.
The system now detects when a table has UNIQUE constraints on NOT NULL columns without a primary key and generates a performance recommendation to add a primary key on those columns.
Changes include:
Describe if there are any user-facing changes
New MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL performance issues will be reported in the assessment report.
How was this pull request tested?
Manually
Does your PR have changes in callhome/yugabyted payloads? If so, is the payload version incremented?
Yeah payload for the new issue:
{\"category\":\"performance_optimizations\",\"category_description\":\"Recommendations to source schema or queries to optimize performance on YugabyteDB.\",\"type\":\"MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL\",\"name\":\"Recommend adding primary key on unique NOT NULL column(s)\",\"impact\":\"LEVEL_1\",\"object_type\":\"TABLE\",\"object_name\":\"\",\"details\":{\"PrimaryKeyColumnOptions\":[[\"schema_c5fe28962969d6b5.table_686a32fed783ef0a.col_6f5d29e3c494e279\"]]}}
Does your PR have changes that can cause upgrade issues?