LCM (Large Concept Models) – A Deep Dive

A few days ago, Meta released a novel architecture they call Large Concept Models (LCM). Unlike typical Transformer-based models, LCM leverages only a small portion of the Transformer machinery. Everything else in LCM is a fresh twist on how we usually think about language modeling, focusing on sentence-level embeddings in a hyperbolic representation space.

This README will walk you through:

What LCM is
How it differs from Transformers
Key takeaways from the original paper
Experiments and enhancements
Notable results

Introduction
Understanding the LCM Architecture
Why It’s Different from Transformers
Enhancing the Baseline
- Adding Curvature and Noise
- Cosine Similarity Attention
Taking It to the Next Level
- Introducing Fractal Geometry
- Curvature Adjustment
Benchmark Results
Conclusions and Future Work
How to Cite
License

Introduction

Large Concept Models represent an experimental approach to language modeling in a sentence representation space, as opposed to the more common token-level (word- or subword-level) approaches we see in Transformers. Instead of dealing with tokens step-by-step, LCM can take an entire sentence, embed it, perform all of its transformations in a hyperbolic latent space, then map it back out to a sentence embedding.

In short, LCM is a big leap in how we conceptualize language modeling:

It’s not purely Transformer-based.
It auto-regressively processes sentence embeddings, not word tokens.
Its hidden dimension can be treated as a probabilistic “box”, allowing for unique geometric transformations within the architecture.

Understanding the LCM Architecture

Embedding Input Space

LCM doesn’t operate on tokens the way Transformers do. Instead, each sentence is represented by a single embedding vector. For instance, you can feed in embeddings derived from GloVe (like glove.6B.300d) or other sentence-level embeddings. These embeddings serve as the inputs to LCM.

Sentence Embedding – Each input sentence is mapped to a high-dimensional embedding vector.
No Standard Tokenization – LCM relies on entire sentence embeddings, skipping the usual subword or token-based approach.

The Hidden Hyperbolic Space

The big twist in LCM is that it uses a hyperbolic space to hold intermediate representations. You can think of it as a “3D (or multi-dimensional) digital box” that introduces a probabilistic or “quantum-inspired” aspect. Instead of your usual hidden layers, the architecture uses:

Linear layers for direct transformations.
Probabilistic ‘Box’ for the hidden dimension.

This hidden hyperbolic representation is somewhat analogous to putting network weights in “Schrodinger’s box.” In a typical neural net, you have a straightforward chain of linear and nonlinear transformations. In LCM, you still get the linear transformations, but they’re passed through this mysterious hyperbolic subspace which becomes the source of the model’s “probabilistic magic.”

The Transformer Decoder

LCM retains a Transformer decoder component to handle its auto-regressive generation. However, this decoder is simpler than typical Transformer blocks because it’s only one part of the overall architecture. Here’s how it plugs into the pipeline:

Map input sentence embeddings to the model's hidden dimension (hyperbolic space).
Pass those representations through a small Transformer decoder block for auto-regressive generation.
Project outputs back to the embedding space.

Mapping Back to Output Space

After the decoder, LCM projects the hidden representation back to a sentence embedding. Because LCM is trained on sentence-level embeddings, the output also corresponds to a sentence-level vector, effectively capturing an entire sentence in a single shot.

Why It’s Different from Transformers

Sentence-Level vs. Token-Level:
LCM ingests entire sentences as single vectors, whereas Transformers typically handle sequences of tokens.
Hyperbolic Subspace:
LCM places the hidden dimension inside a hyperbolic or curved manifold, unlike Transformers that generally use Euclidean geometry throughout.
Probabilistic “Box”:
Instead of the standard “matrix multiply -> activation -> skip-connection” pattern, LCM has a specialized subspace that acts as a probabilistic container (or “Schrodinger’s box”) for each hidden vector.

Enhancing the Baseline

When experimenting with the initial LCM layout (embeddings + hyperbolic hidden space + Transformer decoder + final projection), there are a few immediate enhancements that can significantly boost performance.

Adding Curvature and Noise

One of the first experiments involved:

Introducing Curvature: Mapping the hidden dimension to a Poincaré Ball (a specific model of hyperbolic geometry), effectively bending the space for better representation power.
Injecting Noise: Adding Gaussian noise to embeddings during training to toughen the model against overfitting and make the learned representations more robust.

Cosine Similarity Attention

Instead of the typical “dot-product” attention, switching to cosine similarity improved alignment between embeddings in the hidden space. This approach helps the model better measure how close or far two sentence embeddings are, directly in angle/shape rather than raw magnitude.

Results with these enhancements:

The model quickly reached near 100% accuracy on small-scale tasks.
Loss dropped significantly, reflecting more stable training.

Taking It to the Next Level

Introducing Fractal Geometry

To push LCM off the rails (in a good way), the next experiment replaced the hidden “box” with a 3D fractal shape—specifically a cube made out of pyramids. This choice effectively fractalizes the hyperbolic dimension:

Shape: The hidden dimension is now a 3D fractal (pyramids fused into a cube).
Fractal Probabilistic Box: The network can roam around in a fractal manifold, combining the notion of “Schrodinger’s box” with self-similar geometry.

The intuition: If the hyperbolic dimension is the place where the model’s “probabilistic magic” occurs, then giving it a rich, fractal geometry might further boost representational capacity.

Curvature Adjustment

A final layer of complexity was added by letting the curvature of the hyperbolic space be self-adjusting. Rather than a fixed curvature of ~55°, the model can tweak it up or down at each epoch. Experimental runs showed that LCM consistently dialed the curvature upward, hinting at deeper potential for optimization.

Benchmark Results

First Iteration

Loss started around 1.0 and quickly dropped to 0.86 in just 5 epochs.
Accuracy soared to 100% (with small fluctuations) after a handful of epochs.

Fractal Geometry Variation

Initial Loss began an order of magnitude lower (~0.10 vs. 1.0).
Ended near 0.09, which is extremely close to perfect reconstruction.
Accuracy locked in at 100% by around 14 epochs and stayed there.

Curvature Self-Adjustment

The model spontaneously adjusted the curvature from ~0.51 to ~0.53 each epoch, consistently pushing the geometry in ways that improved performance.

These benchmarks already surpass the typical performance curves you’d see from a similarly sized Transformer on the same data.

Conclusions and Future Work

LCM is a remarkably fresh approach to language modeling. It’s simultaneously:

Intuitive: The architecture is still linear transformations plus a “probabilistic hidden space.”
Unique: It treats entire sentences, rather than token sequences, and uses a hyperbolic manifold.
Powerful: Early experiments, even those done with simple tweaks, show near 100% accuracy and extremely low loss.

Potential avenues for further research:

Larger-Scale Models: The largest LCM from Meta is ~7B parameters, but there’s no reason not to go bigger.
Curvature Exploration: Automated curvature tuning might yield further gains.
Alternative Geometry: Fractal polyhedra, higher-dimensional fractals, or other exotic shapes for the hidden dimension.
Integration with Other Embeddings: Testing BERT-based or sentence-transformer embeddings as input seeds.

Overall, LCM promises a new frontier in how we conceptualize “hidden layers” and “latent spaces” for language modeling. If you enjoy playing with architecture-level changes in NLP systems, LCM is definitely one to watch...

How to Cite

If you use or reference LCM in your own work, please cite the original paper from Meta. (Once they provide a formal citation format, we’ll include it here.)

@misc{metaLCM,
  author = {Meta AI Researchers},
  title  = {Large Concept Models},
  year   = {2024},
  howpublished = {\url{https://github.com/meta-ai/LCM}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LCM_sphere.ipynb		LCM_sphere.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LCM (Large Concept Models) – A Deep Dive

Table of Contents

Introduction

Understanding the LCM Architecture

Embedding Input Space

The Hidden Hyperbolic Space

The Transformer Decoder

Mapping Back to Output Space

Why It’s Different from Transformers

Enhancing the Baseline

Adding Curvature and Noise

Cosine Similarity Attention

Taking It to the Next Level

Introducing Fractal Geometry

Curvature Adjustment

Benchmark Results

First Iteration

Fractal Geometry Variation

Curvature Self-Adjustment

Conclusions and Future Work

How to Cite

About

Uh oh!

Releases

Packages

Languages

License

MagellaX/LCM-experiments

Folders and files

Latest commit

History

Repository files navigation

LCM (Large Concept Models) – A Deep Dive

Table of Contents

Introduction

Understanding the LCM Architecture

Embedding Input Space

The Hidden Hyperbolic Space

The Transformer Decoder

Mapping Back to Output Space

Why It’s Different from Transformers

Enhancing the Baseline

Adding Curvature and Noise

Cosine Similarity Attention

Taking It to the Next Level

Introducing Fractal Geometry

Curvature Adjustment

Benchmark Results

First Iteration

Fractal Geometry Variation

Curvature Self-Adjustment

Conclusions and Future Work

How to Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages