Skip to content

CodyQin/ChatDKU-Advising-RAGAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

ChatDKU Advising FAQ Evaluation with RAGAS

This project evaluates the accuracy and relevance of responses generated by ChatDKU Advising using RAGAS. The evaluation is conducted by comparing ChatDKU Advising's answers to the official FAQ responses provided by the Academic Advising Office at Duke Kunshan University (DKU).

About ChatDKU

ChatDKU is a RAG-based AI chatbot designed to enhance campus interaction by providing academic advising and administrative support. The system integrates multi-source retrieval, query optimization, and context-aware prompt engineering to deliver high-quality responses.

⚠️ Due to its potential future role as an official DKU resource, the complete project code remains private. However, this evaluation project provides insight into its effectiveness.

🔹 ChatDKU Introduction Video

校园大模型ChatDKU上线昆山杜克大学 (Bilibili)

🔹 Live Demo

ChatDKU-Advising
(Requires DKU VPN & NetID Login to access)

Features

  • Uses ragas.dataset_schema.SingleTurnSample to structure FAQ data.
  • Evaluates generated responses using RAGAS metrics:
    • BLEU Score: Measures n-gram overlap between generated and reference responses.
    • ROUGE Score: Compares recall-based textual similarity.
    • Non-LLM String Similarity & Distance Measure: Evaluates lexical similarity between responses.

Dataset

The FAQ dataset consists of officially provided advising questions and answers, covering topics such as:

  • Academic Honors
  • Academic Standing
  • CR/NC Grading
  • Course Load
  • Course Registration
  • Course Repeat
  • Course Withdrawal
  • Credits Transfer
  • Global Education
  • Graduation
  • Incomplete Grade
  • Leave of Absence
  • PE & NSPHST

Usage

Run the script to test ChatDKU Advising's responses against reference answers and obtain performance metrics.

Dependencies

  • ragas
  • asyncio

Purpose

This project is part of my signature work and graduation project at DKU. It aims to assess the reliability of ChatDKU Advising in providing accurate academic guidance, ensuring alignment with DKU’s official advising policies.

📊 Evaluation Summary of ChatDKU-Advising

This section provides visualizations of the evaluation results, including average scores, radar charts, score distributions, and bar charts.


📌 Average Scores Report

Average Scores Report

📌 Corrected Radar Chart

Corrected Radar Chart

📊 Score Distributions & Bar Chart

Score Distributions Histogram
Score Distributions Histogram
Bar Chart of Average Scores
Bar Chart of Average Scores

ChatDKU was evaluated using 104 questions categorized into 13 domains from Duke Kunshan University’s official FAQ documents. The system’s responses were compared against reference answers using three metrics: Levenshtein (textual similarity), BLEU (phrase-level precision), and ROUGE (content recall). Key findings include:

  • Overall Performance: ChatDKU achieved strong results in Levenshtein (avg. 0.7877) and ROUGE (avg. 0.8111), indicating high structural and content-level alignment with official answers. However, BLEU scores (avg. 0.6018) highlighted inconsistencies in exact phrase matching.
  • Top Categories: Courses like Course Repeat (Levenshtein: 0.9377) and Credits Transfer (ROUGE: 0.9600) excelled due to standardized responses.
  • Challenges: Categories like Leave of Absence (Levenshtein: 0.6143) and Graduation (BLEU: 0.4739) showed lower scores, suggesting structural or content gaps.

Strengths & Limitations

  • Strengths: ChatDKU demonstrates robust retrieval capabilities and context-aware response generation, particularly for structured queries.
  • Limitations: Phrase-level precision and handling open-ended questions remain areas for improvement. Traditional metrics like BLEU may understate the value of semantically accurate but paraphrased answers.

Future Directions

Enhancing prompt engineering, expanding dataset coverage, and integrating advanced evaluation methods"""

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages