Add primary key recommendation for tables with UNIQUE NOT NULL columns #2985

ShivanshGahlot · 2025-08-21T11:24:41Z

Describe the changes in this pull request

This PR adds a new performance issue detection that recommends adding primary keys to tables with UNIQUE constraints on NOT NULL columns but no primary key defined.

The system now detects when a table has UNIQUE constraints on NOT NULL columns without a primary key and generates a performance recommendation to add a primary key on those columns.

Changes include:

Enhanced DDL parser to track NOT NULL constraints from CREATE TABLE and ALTER TABLE
Added logic to analyze UNIQUE constraints and their column nullability
Implemented PK recommendation engine for qualifying UNIQUE NOT NULL columns
Added comprehensive unit tests for various scenarios

Describe if there are any user-facing changes

New MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL performance issues will be reported in the assessment report.

How was this pull request tested?

Manually

Does your PR have changes in callhome/yugabyted payloads? If so, is the payload version incremented?

Yeah payload for the new issue:

{\"category\":\"performance_optimizations\",\"category_description\":\"Recommendations to source schema or queries to optimize performance on YugabyteDB.\",\"type\":\"MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL\",\"name\":\"Recommend adding primary key on unique NOT NULL column(s)\",\"impact\":\"LEVEL_1\",\"object_type\":\"TABLE\",\"object_name\":\"\",\"details\":{\"PrimaryKeyColumnOptions\":[[\"schema_c5fe28962969d6b5.table_686a32fed783ef0a.col_6f5d29e3c494e279\"]]}}

Does your PR have changes that can cause upgrade issues?

Component	Breaking changes?
MetaDB	No
Name registry json	No
Data File Descriptor Json	No
Export Snapshot Status Json	No
Import Data State	No
Export Status Json	No
Data .sql files of tables	No
Export and import data queue	No
Schema Dump	No
AssessmentDB	No
Sizing DB	No
Migration Assessment Report Json	No
Callhome Json	No
YugabyteD Tables	No
TargetDB Metadata Tables	No

makalaaneesh · 2025-08-26T08:48:51Z

yb-voyager/src/query/queryissue/parser_issue_detector.go

+	var issues []QueryIssue
+
+	// Skip for partitioned or inherited tables, as PK rules differ
+	shouldSkip := func(table string) bool {


let's add a TODO for this.

Added a ticket and TODO for this: https://yugabyte.atlassian.net/browse/DB-18077

makalaaneesh · 2025-08-26T08:49:06Z

yb-voyager/src/query/queryissue/parser_issue_detector.go

+		return false
+	}
+
+	for table, uLists := range p.tableUniqueConstraints {


As discussed, ideally, we should be looking at unique indexes as well.

Added a TODO and created a ticket for this too for now since the refactor is a big change:
https://yugabyte.atlassian.net/browse/DB-18078

makalaaneesh · 2025-08-26T08:50:42Z

yb-voyager/src/query/queryissue/parser_issue_detector.go

+	// Primary key columns by table (qualified table name)
+	tablePrimaryKeys map[string][]string
+	// Unique constraint columns by table (list of column sets)
+	tableUniqueConstraints map[string][][]string
+	// NOT NULL columns by table
+	tableNotNullColumns map[string]map[string]bool


As discussed, let's create a TableMetadata struct that stores all the relevant information for issue detection rather than creating a map for every use-case.

rough notes from discussion:

map[table_name]Table struct Table: hasPk : true/false constraints: [] indexes: [] columns: column1: datatype: varchar IsNotNull: true constraints: [] getPkRecommendation: for table in tables: if not table.hasPk: candidatePKs := [] for index in table.indexes: // check if unique index // check if all are not null. */ // type TableMetadata struct { // HasPk bool // PkColumnNames []string // Columns []ColumnMetadata // Constraints []ConstraintMetadata // NotNullColumns []string // }

This is done. I have consolidated a lot of variables in TableMetadata or am deriving them using info stored in TableMetadata

makalaaneesh

High level looks good! minor comments
@priyanshi-yb can you also take a look? I skipped the part in ddl_processor.go

makalaaneesh · 2025-08-27T17:10:14Z

yb-voyager/src/query/queryissue/constants.go

+
+	// Recommend PK when UNIQUE + all NOT NULL but no PK exists
+	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL             = "MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL"
+	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME  = "Recommend adding primary key on unique NOT NULL column(s)"


Let's call this "Missing Primary Key For Table". The issue name field should ideally descirbe the issue, not the recommendation.

MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME = "Missing primary key for table when unique and not null columns exist"

Made this the issue name. Does this look fine?

makalaaneesh · 2025-08-27T17:15:54Z

yb-voyager/src/query/queryissue/constants.go

+	// Recommend PK when UNIQUE + all NOT NULL but no PK exists
+	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL             = "MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL"
+	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME  = "Recommend adding primary key on unique NOT NULL column(s)"
+	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "Table has a UNIQUE constraint on column(s) that are all NOT NULL but does not define a PRIMARY KEY. Consider creating a primary key on these column(s) to improve data integrity and performance."


Let's be more informative here:
(taken from https://docs.yugabyte.com/preview/api/ysql/the-sql-language/statements/ddl_create_table/#primary-key:~:text=a%20primary%20key-,PostgreSQL%27s%20table%20storage%20is%20heap%2Doriented%E2%80%94so%20a%20table%20with%20no,.,-Foreign%20key)

PostgreSQL's table storage is heap-oriented—so a table with no primary key is viable. However YugabyteDB's table storage is index-oriented (see DocDB Persistence), so a table isn't viable without a primary key.

Therefore, if you don't specify a primary key at table-creation time, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH.
However, if there are candidate primary keys (unique keys + not NULL), it is recommended to define them as a Primary Key, so that a secondary index structure can be avoided.

I have written this since most of seemed to add to the description:

MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "PostgreSQL's table storage is heap-oriented, however, YugabyteDB's table storage is index-oriented. So, a table without a primary key is viable in PostgreSQL but isn't in YugabyteDB. If you don't specify a primary key, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH. However, if there are candidate primary keys (unique + NOT NULL), it is recommended to define them as a Primary Key to avoid secondary index structures."

Let me know if we should shorten this or remove some of it.

makalaaneesh · 2025-08-27T17:19:27Z

yb-voyager/src/query/queryissue/parser_issue_detector.go

-	tableIndexes map[string][]*queryparser.Index
+	// Table metadata consolidated into a single structure
+	// Key is qualified table name (schema.table), value is TableMetadata
+	tableMetadata map[string]*TableMetadata


nit: tablesMetadata

makalaaneesh · 2025-08-27T17:20:22Z

yb-voyager/src/query/queryissue/parser_issue_detector.go


-	// list of foreign key constraints in the exported schema
-	foreignKeyConstraints []ForeignKeyConstraint
+// Helper methods for ParserIssueDetector to work with TableMetadata


nit: can we put the struct methods below the struct definition just for better organization?

Yeah I moved the ParserIssueDetector methods below the struct definition

makalaaneesh · 2025-08-27T17:20:41Z

yb-voyager/src/query/queryissue/parser_issue_detector.go

-	// key is table name, value is map of column name to ColumnMetadata
-	columnMetadata map[string]map[string]*ColumnMetadata
+// TableMetadata stores all relevant information for a table needed for issue detection
+type TableMetadata struct {


Thanks! Looks much more organized now! cc: @priyanshi-yb

makalaaneesh · 2025-08-28T05:07:59Z

yb-voyager/src/query/queryissue/parser_issue_detector.go

+				}
+			}
+		}
+		if len(alter.DropNotNullColumns) > 0 {


this is just for safety, correct? ideally we wouldn't expect DROP statements in the pg_dump.

Yep this is what I was discussing with you regarding whether we should be parsing things at both TABLE and ALTER levels etc. We won't see them since these will be baked into the column definitions only in the dump

makalaaneesh · 2025-08-28T05:08:32Z

yb-voyager/src/query/queryissue/parser_issue_detector.go

+		}
+		if len(alter.DropNotNullColumns) > 0 {
+			qualifiedTable := alter.GetObjectName()
+			if tm := p.getTableMetadata(qualifiedTable); tm != nil {


curious: why getTableMetadata, and not getOrCreateTableMetadata here like in the other cases?

In the DropNotNullColumns case, we only modify existing columns. So if the table doesn't exist, there's nothing to modify, so getTableMetadata felt more appropriate.

But yeah two things here. First, we should probably never encounter a case where the table metadata is not defined while reading alter statements since they are all after the table definitions in the dump.
And secondly even if this case happens maybe its better to track the columns which haven't been created in the metadata.

I have switched to using getOrCreateTableMetadata. Either way I don't think there will be much problem because of the first point.

makalaaneesh · 2025-08-28T05:09:44Z

yb-voyager/src/query/queryissue/parser_issue_detector.go

 		for _, constraint := range table.Constraints {
-			if constraint.ConstraintType != queryparser.FOREIGN_CONSTR_TYPE {
-				continue
+			switch constraint.ConstraintType {


nit: you could add helper method(s) to TableMetadata to addConstraint(type, name, columns)

Yeah makes sense

makalaaneesh · 2025-08-28T05:13:42Z

yb-voyager/src/query/queryissue/parser_issue_detector_test.go

+		return q.ObjectName + "|" + joined
+	}
+
+	// Helper function to sort QueryIssues for consistent comparison


check out assert.ElementsMatch() . I think that'd avoid having to do all of this.

I think elements match works well on shallow objects. But in this we have nested lists etc like column options which will still require some sort of sorting to check properly.

I have simplified the checking logic a little more though

makalaaneesh · 2025-08-28T05:14:55Z

yb-voyager/src/query/queryissue/parser_issue_detector_test.go

+			expected: nil,
+		},
+		{
+			name:     "PKREC: No recommendation if PK exists",


one more case comes to mind:
PK (a), unique(b)

makalaaneesh

LGTM with one minor comment!
Pls get the alter logic reviewed by Priyanshi

makalaaneesh · 2025-08-28T10:09:39Z

yb-voyager/src/query/queryissue/constants.go

-	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME  = "Recommend adding primary key on unique NOT NULL column(s)"
-	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "Table has a UNIQUE constraint on column(s) that are all NOT NULL but does not define a PRIMARY KEY. Consider creating a primary key on these column(s) to improve data integrity and performance."
+	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_ISSUE_NAME  = "Missing primary key for table when unique and not null columns exist"
+	MISSING_PRIMARY_KEY_WHEN_UNIQUE_NOT_NULL_DESCRIPTION = "PostgreSQL's table storage is heap-oriented, however, YugabyteDB's table storage is index-oriented. So, a table without a primary key is viable in PostgreSQL but isn't in YugabyteDB. If you don't specify a primary key, YugabyteDB will use the internal ybrowid column as PRIMARY KEY and the table will be sharded on ybrowid HASH. However, if there are candidate primary keys (unique + NOT NULL), it is recommended to define them as a Primary Key to avoid secondary index structures."


Maybe remove the Postgres-related stuff. Because our issue description should not assume the source DB

priyanshi-yb

LGTM for ddl_processor changes

priyanshi-yb · 2025-08-28T12:46:48Z

yb-voyager/src/query/queryparser/ddl_processor.go

@@ -862,6 +886,9 @@ type AlterTable struct {
 	IsDeferrable                bool
 	ConstraintColumns           []string
 	PartitionedChild            string // In case this is a partitioned table
+	// In case the ALTER TABLE contains multiple subcommands, collect all SET/DROP NOT NULL columns
+	SetNotNullColumns  string
+	DropNotNullColumns string


nit: SetNotNullColumn, DropNotNullColumn

ShivanshGahlot added 3 commits August 19, 2025 10:17

Added detection of Unique not null keys without primary key initital

7497aa3

Updated code to report all possible options

ead4e72

Added a few comments

fe410bf

ShivanshGahlot changed the title ~~Shivansh/pk if unique key~~ Add primary key recommendation for tables with UNIQUE NOT NULL columns Aug 21, 2025

Processing callhome data

d8701e6

ShivanshGahlot marked this pull request as ready for review August 22, 2025 11:22

ShivanshGahlot requested review from sanyamsinghal, priyanshi-yb and makalaaneesh and removed request for sanyamsinghal and priyanshi-yb August 22, 2025 11:22

Removed debug output from tests

c1e9e8a

makalaaneesh reviewed Aug 26, 2025

View reviewed changes

ShivanshGahlot added 2 commits August 26, 2025 13:42

Refactored the code to use tableMetadata

e29cd31

Constraint slice stores all information about FK,PK etc

41820f5

ShivanshGahlot requested a review from makalaaneesh August 26, 2025 15:16

Added todos

03c7dd1

makalaaneesh reviewed Aug 28, 2025

View reviewed changes

Made suggested changes

0771295

makalaaneesh approved these changes Aug 28, 2025

View reviewed changes

ShivanshGahlot added 2 commits August 28, 2025 11:01

Made suggested changes

50546b4

Made suggested changes

2896603

priyanshi-yb reviewed Aug 28, 2025

View reviewed changes

Made suggested changes

304fc53

ShivanshGahlot merged commit 96a2804 into main Aug 28, 2025
78 checks passed

ShivanshGahlot deleted the shivansh/pk-if-unique-key branch August 28, 2025 13:28

Add primary key recommendation for tables with UNIQUE NOT NULL columns #2985

Add primary key recommendation for tables with UNIQUE NOT NULL columns #2985

Uh oh!

Conversation

ShivanshGahlot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the changes in this pull request

Describe if there are any user-facing changes

How was this pull request tested?

Does your PR have changes in callhome/yugabyted payloads? If so, is the payload version incremented?

Does your PR have changes that can cause upgrade issues?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

makalaaneesh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

makalaaneesh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

priyanshi-yb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ShivanshGahlot commented Aug 21, 2025 •

edited

Loading