Full-Duplex Spoken Language Models (FD-SLMs)

This is an evolving Github repository for the paper: From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models, which is under review at ICASSP 2026. In this paper, we survey the field of Full-Duplex Spoken Language Models (FD-SLMs), which enable synchronous human–AI dialogue via simultaneous speaking and listening, achieving a more realistic human-computer interaction experience.

This is a survey, but more than a survey —— due to ICASSP's page limitation, we have omitted and abbreviated many technical details in the paper, which are highly valuable for guiding the future implementation of a Full-Duplex Spoken Language Model for practical production. Therefore, we will continue to update relevant content at this link.

If you find any mistakes, please don’t hesitate to open an issue, or contact to [email protected] directly.

Introduction

Background

Taxonomy

Note: A modular implementation does not necessarily imply plug-and-play compatibility with other SLMs—for example, VITA-1.5 and Freeze-Omni. They are end-to-end models and can only be integrated as a whole.

Existing Works

In this section, we will list all existing papers on full-duplex SLMs, covering both models and benchmarks.

Models

Learned Synchronization ( End-to-End ) :

Engineered Synchronization ( Modular ) :

Pseudo Full-Duplex :

Non-independent Models :

We define non-independent models as either prior or subsequent works from the same author team of an existing model, or fine-tuned variants built upon existing full-duplex models.

Benchmarks

Model Structure (only for e2e) :

After this, let's assume we set aside the issue of Transformer, no matter how it might be implemented—such as with a dual-tower architecture (dGSLM) or token interleaving (NTPP). In an end-to-end implementation solution, we must answer another fundamental question: who serves as the system's clock for perceiving the external world?

Some may ask: traditional SLMs don't incorporate clocks, yet they still function properly. In fact, it is not that they lack this ability, but rather that they employ a more subtle method, which is the more familiar turn-taking in conversation.

Training Datasets

We have compiled as comprehensive a list as possible of all existing datasets available for full-duplex training and provided the methods for obtaining them.

Dataset	Lang	Scene	Access	License	Channels	Hours	Reference
AMI Meeting Corpus	EN	meeting	Free	CC BY 4.0	8	~100	AMI (Univ. of Edinburgh)
ICSI Meeting Corpus	EN	meeting	Free	CC BY 4.0	~6	~70	ICSI (Edinburgh portal)
ISL Meeting Speech Part 1	EN	meeting	Paid	LDC EULA	8	~10	LDC2004S05
LibriCSS	EN	meeting	Free		7	10	LibriCSS (GitHub)
Fisher English	EN	phone	Paid	LDC EULA	2	~1,960	LDC2004S13 / LDC2005S13
SEAME (Mandarin–English CS)	EN+ZH	interview	Paid	LDC EULA	2	~192	LDC2015S04
HKUST Mandarin Telephone	ZH	phone	Paid	LDC EULA	2	~149	LDC2005S15
NIST Meeting Pilot	EN	meeting	Paid	LDC EULA	~16	~15	LDC2004S09
CHiME‑6	EN	dinner‑party	Free	CC BY‑SA 4.0	16	50+	OpenSLR SLR150
DiPCo (Dinner Party Corpus)	EN	dinner‑party	Free	CDLA‑Permissive‑1.0	35	~5	Zenodo DOI
AliMeeting (M2MeT)	ZH	meeting	Free	CC BY‑SA 4.0	8	118.75	OpenSLR SLR119
AISHELL‑4	ZH	meeting	Free		8	~120	OpenSLR SLR111
MISP‑Meeting	ZH	meeting	Application		8	125	MISP 2025 Data
AISHELL‑5	ZH	in‑car	Free	CC BY‑SA 4.0	8	100+	OpenSLR SLR159
Switchboard‑1 Release 2	EN	phone	Paid	LDC EULA	2	~260	LDC97S62
Fisher Spanish Speech	ES	phone	Paid	LDC EULA	2	~163	LDC2010S01
Fisher Levantine Arabic CTS	AR	phone	Paid	LDC EULA	2	~45	LDC2007S02

Training Strategy

Our Benchmark

Based on FD-Bench and Full-Duplex-Bench (v1.5)—especially the latter, for which we extend special thanks to Professor Hung-yi Lee—we have developed an even more convenient benchmark built upon the engineering details of the ICASSP HumDial Challenge. Our goal is to enable as close to one-click evaluation of your model as possible and ultimately provide a quantifiable score. We name this benchmark Badcat. For details, please refer to Badcat-Benchmark/README.md.

Pre-LLM Era Related Research

We believe that in the age of AI, it is more important than ever to honor the foundational work of our predecessors—whose ideas can be revitalized and find new life in the era of large language models, much like how LSTM once revolutionized NLP. In the pre-LLM era, the Spoken Dialogue Systems (SDS) community had long been exploring full-duplex interaction. Therefore, we list a selection of representative works from this line of research and provide brief summaries. We encourage readers to consult the original papers to fully grasp the authors’ ideas.

Citation

If you find our survey useful for your research, please 📚cite📚 the following paper:

@article{chen2025FD-SLMs,
  title={From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models},
  author={Yuxuan Chen, Haoyuan Yu}
  journal={arXiv preprint arXiv:2509.14515},
  year={2025}
}

Change log

Update (October 31)

This repository will now receive regular updates to maintain the most current survey content.

I recently participated in the ICASSP HumDial Challenge, which temporarily delayed updates to this repository. However, this experience provided valuable insights into full-duplex implementation, particularly regarding modular approaches.

Additionally, several recent full-duplex papers have been published, such as FLM-Audio[2509.02521], and will be incorporated into upcoming updates.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Badcat-Benchmark		Badcat-Benchmark
Model Papers		Model Papers
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Full-Duplex Spoken Language Models (FD-SLMs)

Introduction

Background

Taxonomy

Existing Works

Models

Learned Synchronization ( End-to-End ) :

Engineered Synchronization ( Modular ) :

Pseudo Full-Duplex :

Non-independent Models :

Benchmarks

Model Structure (only for e2e) :

Training Datasets

Training Strategy

Our Benchmark

Pre-LLM Era Related Research

Citation

Change log

About

Uh oh!

Releases

Packages

elpsykongloo/FD-SLMs

Folders and files

Latest commit

History

Repository files navigation

Full-Duplex Spoken Language Models (FD-SLMs)

Introduction

Background

Taxonomy

Existing Works

Models

Learned Synchronization ( End-to-End ) :

Engineered Synchronization ( Modular ) :

Pseudo Full-Duplex :

Non-independent Models :

Benchmarks

Model Structure (only for e2e) :

Training Datasets

Training Strategy

Our Benchmark

Pre-LLM Era Related Research

Citation

Change log

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages