Skip to content

Conversation

ShivanshGahlot
Copy link
Collaborator

@ShivanshGahlot ShivanshGahlot commented Aug 21, 2025

Describe the changes in this pull request

This PR adds a new performance issue detection that recommends adding primary keys to tables with UNIQUE constraints on NOT NULL columns but no primary key defined.

The system now detects when a table has UNIQUE constraints on NOT NULL columns without a primary key and generates a performance recommendation to add a primary key on those columns.

Changes include:

  • Enhanced DDL parser to track NOT NULL constraints from CREATE TABLE and ALTER TABLE
  • Added logic to analyze UNIQUE constraints and their column nullability
  • Implemented PK recommendation engine for qualifying UNIQUE NOT NULL columns
  • Added comprehensive unit tests for various scenarios
image

Describe if there are any user-facing changes

New MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL performance issues will be reported in the assessment report.

How was this pull request tested?

Manually

Does your PR have changes in callhome/yugabyted payloads? If so, is the payload version incremented?

Yeah payload for the new issue:

{\"category\":\"performance_optimizations\",\"category_description\":\"Recommendations to source schema or queries to optimize performance on YugabyteDB.\",\"type\":\"MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL\",\"name\":\"Recommend adding primary key on unique NOT NULL column(s)\",\"impact\":\"LEVEL_1\",\"object_type\":\"TABLE\",\"object_name\":\"\",\"details\":{\"PrimaryKeyColumnOptions\":[[\"schema_c5fe28962969d6b5.table_686a32fed783ef0a.col_6f5d29e3c494e279\"]]}}

Does your PR have changes that can cause upgrade issues?

Component Breaking changes?
MetaDB No
Name registry json No
Data File Descriptor Json No
Export Snapshot Status Json No
Import Data State No
Export Status Json No
Data .sql files of tables No
Export and import data queue No
Schema Dump No
AssessmentDB No
Sizing DB No
Migration Assessment Report Json No
Callhome Json No
YugabyteD Tables No
TargetDB Metadata Tables No

@ShivanshGahlot ShivanshGahlot changed the title Shivansh/pk if unique key Add primary key recommendation for tables with UNIQUE NOT NULL columns Aug 21, 2025
@ShivanshGahlot ShivanshGahlot marked this pull request as ready for review August 22, 2025 11:22
var issues []QueryIssue

// Skip for partitioned or inherited tables, as PK rules differ
shouldSkip := func(table string) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a TODO for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a ticket and TODO for this: https://yugabyte.atlassian.net/browse/DB-18077

return false
}

for table, uLists := range p.tableUniqueConstraints {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, ideally, we should be looking at unique indexes as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO and created a ticket for this too for now since the refactor is a big change:
https://yugabyte.atlassian.net/browse/DB-18078

Comment on lines 135 to 140
// Primary key columns by table (qualified table name)
tablePrimaryKeys map[string][]string
// Unique constraint columns by table (list of column sets)
tableUniqueConstraints map[string][][]string
// NOT NULL columns by table
tableNotNullColumns map[string]map[string]bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's create a TableMetadata struct that stores all the relevant information for issue detection rather than creating a map for every use-case.

rough notes from discussion:

map[table_name]Table

struct Table:
	hasPk : true/false
	constraints: []
	indexes: []
	columns: 
		column1:
			datatype: varchar
			IsNotNull: true
	constraints: []

getPkRecommendation:
	for table in tables:
		if not table.hasPk:
			candidatePKs := []
			for index in table.indexes:
				// check if unique index
				// check if all are  not null. 

*/
// type TableMetadata struct {
// 	HasPk bool
// 	PkColumnNames []string
// 	Columns []ColumnMetadata
// 	Constraints []ConstraintMetadata
// 	NotNullColumns []string
// }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done. I have consolidated a lot of variables in TableMetadata or am deriving them using info stored in TableMetadata

Copy link
Collaborator

@makalaaneesh makalaaneesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level looks good! minor comments
@priyanshi-yb can you also take a look? I skipped the part in ddl_processor.go


// Recommend PK when UNIQUE + all NOT NULL but no PK exists
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL = "MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL"
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Recommend adding primary key on unique NOT NULL column(s)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this "Missing Primary Key For Table". The issue name field should ideally descirbe the issue, not the recommendation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Missing primary key for table when unique and not null columns exist"

Made this the issue name. Does this look fine?

// Recommend PK when UNIQUE + all NOT NULL but no PK exists
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL = "MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL"
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Recommend adding primary key on unique NOT NULL column(s)"
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "Table has a UNIQUE constraint on column(s) that are all NOT NULL but does not define a PRIMARY KEY. Consider creating a primary key on these column(s) to improve data integrity and performance."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more informative here:
(taken from https://docs.yugabyte.com/preview/api/ysql/the-sql-language/statements/ddl_create_table/#primary-key:~:text=a%20primary%20key-,PostgreSQL%27s%20table%20storage%20is%20heap%2Doriented%E2%80%94so%20a%20table%20with%20no,.,-Foreign%20key)

PostgreSQL's table storage is heap-oriented—so a table with no primary key is viable. However YugabyteDB's table storage is index-oriented (see DocDB Persistence), so a table isn't viable without a primary key.

Therefore, if you don't specify a primary key at table-creation time, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH.
However, if there are candidate primary keys (unique keys + not NULL), it is recommended to define them as a Primary Key, so that a secondary index structure can be avoided.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have written this since most of seemed to add to the description:

MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "PostgreSQL's table storage is heap-oriented, however, YugabyteDB's table storage is index-oriented. So, a table without a primary key is viable in PostgreSQL but isn't in YugabyteDB. If you don't specify a primary key, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH. However, if there are candidate primary keys (unique + NOT NULL), it is recommended to define them as a Primary Key to avoid secondary index structures."

Let me know if we should shorten this or remove some of it.

tableIndexes map[string][]*queryparser.Index
// Table metadata consolidated into a single structure
// Key is qualified table name (schema.table), value is TableMetadata
tableMetadata map[string]*TableMetadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: tablesMetadata

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// list of foreign key constraints in the exported schema
foreignKeyConstraints []ForeignKeyConstraint
// Helper methods for ParserIssueDetector to work with TableMetadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we put the struct methods below the struct definition just for better organization?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I moved the ParserIssueDetector methods below the struct definition

// key is table name, value is map of column name to ColumnMetadata
columnMetadata map[string]map[string]*ColumnMetadata
// TableMetadata stores all relevant information for a table needed for issue detection
type TableMetadata struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks much more organized now! cc: @priyanshi-yb

}
}
}
if len(alter.DropNotNullColumns) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just for safety, correct? ideally we wouldn't expect DROP statements in the pg_dump.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is what I was discussing with you regarding whether we should be parsing things at both TABLE and ALTER levels etc. We won't see them since these will be baked into the column definitions only in the dump

}
if len(alter.DropNotNullColumns) > 0 {
qualifiedTable := alter.GetObjectName()
if tm := p.getTableMetadata(qualifiedTable); tm != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: why getTableMetadata, and not getOrCreateTableMetadata here like in the other cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the DropNotNullColumns case, we only modify existing columns. So if the table doesn't exist, there's nothing to modify, so getTableMetadata felt more appropriate.

But yeah two things here. First, we should probably never encounter a case where the table metadata is not defined while reading alter statements since they are all after the table definitions in the dump.
And secondly even if this case happens maybe its better to track the columns which haven't been created in the metadata.

I have switched to using getOrCreateTableMetadata. Either way I don't think there will be much problem because of the first point.

for _, constraint := range table.Constraints {
if constraint.ConstraintType != queryparser.FOREIGN_CONSTR_TYPE {
continue
switch constraint.ConstraintType {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could add helper method(s) to TableMetadata to addConstraint(type, name, columns)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense

return q.ObjectName + "|" + joined
}

// Helper function to sort QueryIssues for consistent comparison
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check out assert.ElementsMatch() . I think that'd avoid having to do all of this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think elements match works well on shallow objects. But in this we have nested lists etc like column options which will still require some sort of sorting to check properly.

I have simplified the checking logic a little more though

expected: nil,
},
{
name: "PKREC: No recommendation if PK exists",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more case comes to mind:
PK (a), unique(b)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Collaborator

@makalaaneesh makalaaneesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment!
Pls get the alter logic reviewed by Priyanshi

MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Recommend adding primary key on unique NOT NULL column(s)"
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "Table has a UNIQUE constraint on column(s) that are all NOT NULL but does not define a PRIMARY KEY. Consider creating a primary key on these column(s) to improve data integrity and performance."
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Missing primary key for table when unique and not null columns exist"
MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "PostgreSQL's table storage is heap-oriented, however, YugabyteDB's table storage is index-oriented. So, a table without a primary key is viable in PostgreSQL but isn't in YugabyteDB. If you don't specify a primary key, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH. However, if there are candidate primary keys (unique + NOT NULL), it is recommended to define them as a Primary Key to avoid secondary index structures."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove the Postgres-related stuff. Because our issue description should not assume the source DB

Copy link
Contributor

@priyanshi-yb priyanshi-yb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for ddl_processor changes

@@ -862,6 +886,9 @@ type AlterTable struct {
IsDeferrable bool
ConstraintColumns []string
PartitionedChild string // In case this is a partitioned table
// In case the ALTER TABLE contains multiple subcommands, collect all SET/DROP NOT NULL columns
SetNotNullColumns string
DropNotNullColumns string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SetNotNullColumn, DropNotNullColumn

@ShivanshGahlot ShivanshGahlot merged commit 96a2804 into main Aug 28, 2025
78 checks passed
@ShivanshGahlot ShivanshGahlot deleted the shivansh/pk-if-unique-key branch August 28, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants