Skip to content

BrennanColberg/semantic-diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

semantic-diff

BrennanColberg/semantic-diff is a library designed to describe, as accurately and humanly as possible, the difference between two strings. It was initially developed as a core logic engine for Quotid, the world's best app for long-form memorization, where a key feature is the ability to surface to a user what precise mistakes they made when regurgitating a passage.

This ability comes naturally to humans ("oh, you moved X to Y, and said Z instead of W!"), but is surprisingly difficult to algorithmically identify. Even LLMs struggle to intuit which mistakes/differences were made (as of writing in Q2 2025; eventually they will likely solve this with high reliability).

To this goal, the package has five important parts:

  1. a system of abstraction to represent semantic diffs appropriately (types)
  2. a harness/protocol for semantic diff generation in other programs (harness)
  3. a test suite that captures many humanly-obvious mistake patterns (tests)
  4. a deterministic implementation that tries to meet the tests (algorithm)
  5. [TODO] a LLM prompt/harness for hooking OpenRouter models up as differs (prompt)

As of writing, the deterministic system still performs better than LLMs— and is obviously much lower-latency and -cost. Eventually, though, to reiterate from above, I expect that LLMs will ultimately "solve" this problem. Since this package is designed with solutions in mind, it supports both, and will (1) make it possible to verify when LLMs get good at this, while also (2) enabling an elegant switchover once that happens.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published