ChatDKU Advising FAQ Evaluation with RAGAS

This project evaluates the accuracy and relevance of responses generated by ChatDKU Advising using RAGAS. The evaluation is conducted by comparing ChatDKU Advising's answers to the official FAQ responses provided by the Academic Advising Office at Duke Kunshan University (DKU).

About ChatDKU

ChatDKU is a RAG-based AI chatbot designed to enhance campus interaction by providing academic advising and administrative support. The system integrates multi-source retrieval, query optimization, and context-aware prompt engineering to deliver high-quality responses.

⚠️ Due to its potential future role as an official DKU resource, the complete project code remains private. However, this evaluation project provides insight into its effectiveness.

🔹 ChatDKU Introduction Video

校园大模型ChatDKU上线昆山杜克大学 (Bilibili)

🔹 Live Demo

ChatDKU-Advising
(Requires DKU VPN & NetID Login to access)

Features

Uses ragas.dataset_schema.SingleTurnSample to structure FAQ data.
Evaluates generated responses using RAGAS metrics:
- BLEU Score: Measures n-gram overlap between generated and reference responses.
- ROUGE Score: Compares recall-based textual similarity.
- Non-LLM String Similarity & Distance Measure: Evaluates lexical similarity between responses.

Dataset

The FAQ dataset consists of officially provided advising questions and answers, covering topics such as:

Academic Honors
Academic Standing
CR/NC Grading
Course Load
Course Registration
Course Repeat
Course Withdrawal
Credits Transfer
Global Education
Graduation
Incomplete Grade
Leave of Absence
PE & NSPHST

Usage

Run the script to test ChatDKU Advising's responses against reference answers and obtain performance metrics.

Dependencies

ragas
asyncio

Purpose

This project is part of my signature work and graduation project at DKU. It aims to assess the reliability of ChatDKU Advising in providing accurate academic guidance, ensuring alignment with DKU’s official advising policies.

📊 Evaluation Summary of ChatDKU-Advising

This section provides visualizations of the evaluation results, including average scores, radar charts, score distributions, and bar charts.

📌 Average Scores Report

📌 Corrected Radar Chart

📊 Score Distributions & Bar Chart

Score Distributions Histogram

Bar Chart of Average Scores

ChatDKU was evaluated using 104 questions categorized into 13 domains from Duke Kunshan University’s official FAQ documents. The system’s responses were compared against reference answers using three metrics: Levenshtein (textual similarity), BLEU (phrase-level precision), and ROUGE (content recall). Key findings include:

Overall Performance: ChatDKU achieved strong results in Levenshtein (avg. 0.7877) and ROUGE (avg. 0.8111), indicating high structural and content-level alignment with official answers. However, BLEU scores (avg. 0.6018) highlighted inconsistencies in exact phrase matching.
Top Categories: Courses like Course Repeat (Levenshtein: 0.9377) and Credits Transfer (ROUGE: 0.9600) excelled due to standardized responses.
Challenges: Categories like Leave of Absence (Levenshtein: 0.6143) and Graduation (BLEU: 0.4739) showed lower scores, suggesting structural or content gaps.

Strengths & Limitations

Strengths: ChatDKU demonstrates robust retrieval capabilities and context-aware response generation, particularly for structured queries.
Limitations: Phrase-level precision and handling open-ended questions remain areas for improvement. Traditional metrics like BLEU may understate the value of semantically accurate but paraphrased answers.

Future Directions

Enhancing prompt engineering, expanding dataset coverage, and integrating advanced evaluation methods"""

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Evaluation.py		Evaluation.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChatDKU Advising FAQ Evaluation with RAGAS

About ChatDKU

🔹 ChatDKU Introduction Video

🔹 Live Demo

Features

Dataset

Usage

Dependencies

Purpose

📊 Evaluation Summary of ChatDKU-Advising

📌 Average Scores Report

📌 Corrected Radar Chart

📊 Score Distributions & Bar Chart

Strengths & Limitations

Future Directions

About

Uh oh!

Releases

Packages

Languages

CodyQin/ChatDKU-Advising-RAGAS

Folders and files

Latest commit

History

Repository files navigation

ChatDKU Advising FAQ Evaluation with RAGAS

About ChatDKU

🔹 ChatDKU Introduction Video

🔹 Live Demo

Features

Dataset

Usage

Dependencies

Purpose

📊 Evaluation Summary of ChatDKU-Advising

📌 Average Scores Report

📌 Corrected Radar Chart

📊 Score Distributions & Bar Chart

Strengths & Limitations

Future Directions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages