- Fine-tuned a pre-trained language model (LLaMA-2-7b-hf) on the full script dataset of Friends TV Show (223 episodes).
- Generated new episode dialogues in the same style and structure as the original scripts.
- To explore the capability of LLMs in mimicking character-specific dialogue and scene-driven scripts.
- To experiment with lightweight fine-tuning techniques (LoRA + 8-bit quantization) under limited hardware resources.
- Collected, cleaned, and structured the entire Friends script dataset (223 episodes).
- Converted full scripts into episode-wise text segments with consistent formatting.
- Chunking the scripts into the model maximum size by slicing window thought.
- Adding
[End of episode]
as a mark of ending to each episode's data.
- Removed metadata (e.g., writer/transcriber notes) using a custom
clean_script()
function. - Retained scene descriptions like
[Scene: ...]
and structured character dialoguesCharacter: Line
. - Saved each episode as a clean, prompt-ready sample.
- Clean formatting improves the learning of dialogue structure and flow.
- Scene and character cues provide valuable context for style-specific generation.
- Used
meta-llama/Llama-2-7b-hf
model as the backbone. - Applied LoRA and BitsAndBytes Config to optimize for low-memory fine-tuning. Reduce the RAM usage from over 40G to 30.9G on Google Colab A100 GPU.
- Loaded model in 8-bit using
bitsandbytes
, reduced GPU memory usage. - Applied LoRA with target modules
["q_proj", "v_proj"]
, rank=8, alpha=32. - Constructed training samples with tokenizer chunking from cleaned scripts.
- LLaMA-2 is powerful and open-access.
- LoRA and 8-bit quantization reduce memory load while maintaining performance.
- Designed a flexible prompt generator to trigger script creation.
- Controlled role, scene, and number of lines in each generated sample.
- Seed prompt includes
[Scene: ...]
+ 1–2 initial lines of dialogue. - Generated samples using
generate()
withmax_new_tokens
and temperature sampling. - Post-processed to preserve dialogue structure.
- Prompt engineering is critical to maintain structure and character consistency.
- Flexibility in prompt design enables diverse generation scenarios.
- Evaluated model performance using Perplexity and ROUGE.
- Used
Trainer.evaluate()
andmath.exp(eval_loss)
for Perplexity. - Computed ROUGE using Huggingface
datasets
metrics
- Perplexity shows language modeling quality.
- ROUGE provide quantitative feedback on similarity.
- Perplexity and ROUGE couldn't evaluate this causal LLMs well, one reason is the meaning could be similar even you use different words.
- Compared generated dialogues under different prompt and sampling settings.
- Assessed quality via structure, tone, and coherence.
- Measured style similarity with original scripts.
- Verified that character tone matches (e.g., Chandler’s sarcasm).
- Tested generation length, diversity, and repetition control.
- Structural and stylistic alignment is key to believable TV scripts.
- Prompt setup and temperature impact the creativity and control of generated lines.
- Successfully fine-tuned a powerful LLM to generate new TV script dialogues.
- Built a robust pipeline for cleaning, training, generating, and evaluating.
- Demonstrated style-aware, character-consistent generation results.