Skip to content

Commit 9d6cca1

Browse files
committed
docs: add 1.9.0 post
1 parent 548e0b9 commit 9d6cca1

File tree

4 files changed

+374
-4
lines changed

4 files changed

+374
-4
lines changed

docs/posts/reddit.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,126 @@ This is particularly useful for:
6464
**Project repo:** https://github.com/MigoXLab/dingo
6565

6666
Would love to get feedback from the community! What data quality metrics do you find most valuable in your work?
67+
68+
# 文案三
69+
70+
### For r/MachineLearning
71+
72+
**Title**: [D] Dingo 1.9.0 released: Open-source data quality evaluation with enhanced hallucination detection
73+
74+
Just released **Dingo 1.9.0** with major upgrades for RAG-era data quality assessment.
75+
76+
### Key Updates:
77+
78+
**🔍 Enhanced Hallucination Detection**
79+
Dingo 1.9.0 integrates two powerful hallucination detection approaches:
80+
- **HHEM-2.1-Open local model** (recommended) - runs locally without API costs
81+
- **GPT-based cloud detection** - leverages OpenAI models for detailed analysis
82+
83+
Both evaluate LLM-generated answers against provided context using consistency scoring (0.0-1.0 range, configurable thresholds).
84+
85+
**⚙️ Configuration System Overhaul**
86+
Complete rebuild with modern DevOps practices:
87+
- Hierarchical inheritance (project → user → system levels)
88+
- Hot-reload capabilities for instant config changes
89+
- Schema validation with clear error messages
90+
- Template system for common scenarios
91+
92+
**📚 DeepWiki Document Q&A**
93+
Transform static documentation into interactive knowledge bases:
94+
- Multi-language support (EN/CN/JP)
95+
- Context-aware multi-turn conversations
96+
- Visual document structure parsing
97+
- Semantic navigation and cross-references
98+
99+
### Why It Matters:
100+
Traditional hallucination detection relies on static rules. Our approach provides context-aware validation essential for production RAG systems, SFT data quality assessment, and real-time LLM output verification.
101+
102+
Perfect for:
103+
- RAG system quality monitoring
104+
- Training data preprocessing
105+
- Enterprise knowledge management
106+
- Multi-modal data evaluation
107+
108+
**GitHub**: https://github.com/MigoXLab/dingo
109+
**Docs**: https://deepwiki.com/MigoXLab/dingo
110+
111+
What hallucination detection approaches are you currently using? Interested in your RAG quality challenges.
112+
113+
---
114+
115+
### For r/OpenSource
116+
117+
**Title**: [Project] Dingo 1.9.0: Major update to our data quality evaluation toolkit
118+
119+
The community response has been incredible! **Dingo 1.9.0** delivers features you've been requesting.
120+
121+
### Project Stats:
122+
- ⭐ 311 GitHub stars and growing
123+
- 🍴 32 active development forks
124+
- 📚 Comprehensive multi-language documentation
125+
- 🔄 Full CI/CD pipeline with automated testing
126+
127+
### What's New:
128+
**Hallucination Detection**: Integrated HHEM-2.1-Open model and GPT-based detection for comprehensive fact-checking against context.
129+
130+
**Config System Redesign**: Hierarchical inheritance, hot-reload, and template-based setup replacing the previous complex configuration approach.
131+
132+
**DeepWiki Integration**: Interactive documentation system that transforms static docs into conversational AI assistants.
133+
134+
### Community Impact:
135+
This release addresses community requests through extensive collaboration - issues resolved, PRs merged, and new contributors welcomed from around the world.
136+
137+
### Contributing Opportunities:
138+
- **Core Development**: Python/ML implementation
139+
- **Documentation**: Technical writing and tutorials
140+
- **Community**: Discord moderation and outreach
141+
- **Testing**: QA and automated testing
142+
143+
**Getting Started:**
144+
1. Star: https://github.com/MigoXLab/dingo
145+
2. Check "good first issue" labels for beginner-friendly tasks
146+
3. Join our community discussions
147+
148+
**License**: Apache 2.0 - fully open-source, no vendor lock-in
149+
150+
What data quality tools does your team currently use? Would love to hear about your experiences and challenges.
151+
152+
---
153+
154+
### For r/artificial
155+
156+
**Title**: Dingo 1.9.0: Addressing AI hallucination through enhanced detection
157+
158+
As AI systems become more prevalent, data quality and factual accuracy are paramount concerns. Sharing our latest release addressing these challenges.
159+
160+
### The Challenge:
161+
- LLM hallucinations in production systems
162+
- RAG systems losing factual accuracy when combining sources
163+
- Temporal inconsistency as information becomes outdated
164+
- Quality control across different data modalities
165+
166+
### Our Solution:
167+
**Dingo 1.9.0** provides comprehensive hallucination detection through two complementary approaches:
168+
169+
**Local HHEM-2.1-Open Integration**: Runs Vectara's hallucination evaluation model locally, providing fast, cost-effective fact-checking without API dependencies.
170+
171+
**Cloud-based GPT Detection**: Leverages advanced language models for detailed consistency analysis with comprehensive reasoning.
172+
173+
**Smart Configuration Management**: Completely redesigned system enabling environment-aware inheritance, hot-reload capabilities, and template-based setups for rapid deployment.
174+
175+
**Interactive Documentation**: DeepWiki transforms static documentation into conversational AI assistants, improving team knowledge sharing and reducing information silos.
176+
177+
### Real-World Applications:
178+
- **Production Monitoring**: Real-time quality control for customer-facing AI systems
179+
- **Training Pipeline**: Pre-processing validation for SFT datasets
180+
- **Enterprise Knowledge**: Quality assurance for internal AI applications
181+
- **Research**: Systematic evaluation across different model architectures
182+
183+
### Community Adoption:
184+
Growing adoption across organizations focused on AI safety and reliability, with particular interest from teams building production RAG systems and those requiring systematic data quality assessment.
185+
186+
**Try it**: Available on GitHub under Apache 2.0 license
187+
**Resources**: https://github.com/MigoXLab/dingo
188+
189+
What approaches does your team use for AI quality assurance? How do you currently handle hallucination detection in production systems?

docs/posts/x.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,55 @@
55
🎓 Academic-backed (RedPajama, CLIP, NIMA...)
66
⚡ Rule-based + LLM evaluation
77

8-
Metrics Detail:📖 https://github.com/MigoXLab/dingo/blob/dev/docs/metrics.md
8+
📖 Metrics: https://github.com/MigoXLab/dingo/blob/dev/docs/metrics.md
9+
⭐ GitHub: https://github.com/MigoXLab/dingo
910

1011
What's your go-to data quality metric? 🤔
12+
13+
---
14+
15+
# 文案二
16+
🔥 Dingo MCP Server is LIVE!
17+
18+
Integrate AI data quality evaluation directly into Cursor IDE
19+
⚡ Real-time evaluation
20+
🛠️ Seamless workflow
21+
📊 Instant feedback
22+
23+
⭐ GitHub: https://github.com/MigoXLab/dingo
24+
#MCP #Cursor #AITools #DataQuality
25+
26+
---
27+
28+
# 文案三 - Dingo 1.9.0 Release
29+
30+
## Tweet 1 - Main Release
31+
🚀 Dingo 1.9.0 is HERE!
32+
33+
✨ RAG hallucination detection
34+
⚙️ Revamped config system
35+
📚 DeepWiki document Q&A
36+
37+
AI data quality just got smarter 🧠
38+
https://github.com/MigoXLab/dingo
39+
#DataQuality #RAG #AI
40+
41+
---
42+
43+
## Tweet 2 - RAG Focus
44+
🎯 RAG hallucination detection hits 94.6% accuracy!
45+
46+
Smart retrieval + context validation = goodbye AI hallucinations
47+
https://github.com/MigoXLab/dingo
48+
#RAG #AI #HallucinationDetection
49+
50+
51+
---
52+
53+
## Tweet 3 - DeepWiki
54+
📚 DeepWiki turns docs into smart assistants!
55+
56+
Multi-language support + 1s response time
57+
📖 Try: https://deepwiki.com/MigoXLab/dingo
58+
⭐ GitHub: https://github.com/MigoXLab/dingo
59+
#Documentation #AI

docs/posts/xiaohongshu.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,52 @@
1818

1919
GitHub: MigoXLab/dingo
2020

21-
📖详细指标:
22-
https://github.com/MigoXLab/dingo/blob/dev/docs/metrics.md
23-
2421
#AI工程师必备 #开源工具 #数据科学
22+
23+
---
24+
25+
# 文案二
26+
🚀 Cursor用户福音!Dingo MCP插件来了!
27+
28+
还在手动检查数据质量?直接在IDE里搞定!
29+
30+
💡 使用体验:
31+
• 实时数据质量检测
32+
• 无缝集成Cursor工作流
33+
• 一键生成质量报告
34+
• 智能错误提示和建议
35+
36+
🔧 操作简单:
37+
安装插件→配置参数→开始评估
38+
三步搞定,效率翻倍!
39+
40+
✨ 实际效果:
41+
代码编写中即时发现数据问题
42+
再也不用来回切换工具了
43+
44+
GitHub: MigoXLab/dingo
45+
46+
#Cursor插件 #AI开发 #效率工具 #数据质量
47+
48+
---
49+
50+
# 文案三
51+
🎉 AI 数据质量评估工具 Dingo 1.9.0重磅更新!RAG时代的质量革命!
52+
53+
还在为AI胡说八道头疼?这次更新彻底解决!
54+
55+
🔍 核心亮点:
56+
• RAG幻觉检测功能上线
57+
• 智能检索+上下文验证
58+
• 配置系统深度重构
59+
• DeepWiki文档问答上线
60+
61+
📚 DeepWiki亮点:
62+
• 多语言文档理解
63+
• 1秒内响应复杂查询
64+
• 多轮对话记忆
65+
• 可视化导航
66+
67+
GitHub: MigoXLab/dingo
68+
69+
#RAG #数据质量 #大模型 #数据科学 #AI

docs/posts/zhihu.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,156 @@
22
[Dingo:面向AI时代的全方位数据质量评估工具](https://zhuanlan.zhihu.com/p/1892338512306602995)
33
# 文案二
44
[Dingo MCP来了!在Cursor中轻松玩转AI数据评估,效率翻倍!](https://zhuanlan.zhihu.com/p/1910428406631359769)
5+
# 文案三
6+
# 🚀 Dingo 1.9.0 重磅发布:基于RAG的幻觉数据质量评估新标杆
7+
8+
9+
## 📢 重大更新预告
10+
11+
经过团队数日的精心打磨,**Dingo 1.9.0** 正式发布!这次更新不仅仅是版本号的跃升,更是对**RAG**(检索增强生成)时代数据质量评估需求的深度回应。
12+
13+
**🌟 项目地址**https://github.com/MigoXLab/dingo
14+
15+
## 🎯 三大核心突破
16+
17+
### 1️⃣ 基于RAG检索的幻觉检测 🔍
18+
19+
- **智能检索增强**:结合知识库检索,不再依赖静态规则
20+
- **上下文感知**:动态理解文档语境,精准识别事实性错误
21+
- **多模态支持**:文本、图像、表格全方位幻觉检测
22+
- **实时验证**:支持在线API调用,确保信息时效性
23+
24+
```python
25+
# 新的RAG幻觉检测使用示例
26+
from dingo.model.rag import RAGHallucinationDetector
27+
28+
detector = RAGHallucinationDetector(
29+
knowledge_base="your_vector_db",
30+
retrieval_method="dense_passage"
31+
)
32+
33+
result = detector.evaluate(
34+
query="爱因斯坦何时获得诺贝尔奖?",
35+
answer="爱因斯坦在1969年获得诺贝尔奖",
36+
retrieved_context=["爱因斯坦1921年获得诺贝尔物理学奖..."]
37+
)
38+
# 输出:{"hallucination_score": 0.95, "evidence": "时间错误"}
39+
```
40+
41+
### 2️⃣ 配置系统深度重构 ⚙️
42+
**让复杂配置变得简单优雅!**
43+
44+
- **层级化配置**:支持项目级、用户级、系统级配置继承
45+
- **智能校验**:配置项自动验证,错误提示更友好
46+
- **热重载**:配置修改即时生效,无需重启
47+
- **模板化**:预置常用场景配置模板
48+
49+
```python
50+
# 新的配置文件结构
51+
input_data = {
52+
"executor": {
53+
"eval_group": "rag", # 使用RAG评估组
54+
},
55+
"evaluator": {
56+
"rule_config": {
57+
"RuleHallucinationHHEM": {
58+
"threshold": 0.5 # 幻觉检测阈值
59+
}
60+
},
61+
"llm_config": {
62+
"LLMTextQualityPromptBase": {
63+
"model": "gpt-4o",
64+
"key": "YOUR_API_KEY",
65+
"api_url": "https://api.openai.com/v1/chat/completions"
66+
}
67+
}
68+
}
69+
}
70+
```
71+
72+
### 3️⃣ DeepWiki文档问答系统 📚
73+
**让文档"活"起来,智能问答触手可及!**
74+
75+
- **深度理解**:基于最新的文档理解模型
76+
- **多语言支持**:中文、英文文档无缝切换
77+
- **上下文记忆**:支持多轮对话,理解问答历史
78+
- **可视化导航**:智能文档结构解析和导航
79+
80+
**🌟 体验地址**: https://deepwiki.com/MigoXLab/dingo
81+
82+
83+
## 💡 实际应用场景
84+
85+
### 场景一:RAG系统质量监控
86+
```python
87+
# 实时基于RAG监控回答质量(使用本地HHEM)
88+
def monitor_rag_response(question, generated_answer, retrieved_docs):
89+
data = Data(
90+
data_id=f"rag_{timestamp}",
91+
prompt=question,
92+
content=generated_answer,
93+
context=retrieved_docs
94+
)
95+
96+
result = RuleHallucinationHHEM.eval(data) # 本地、快速、免费
97+
98+
if result.error_status:
99+
logger.warning(f"检测到幻觉: {result.reason[0]}")
100+
# 触发人工审核或回答重生成
101+
```
102+
103+
### 场景二:企业级RAG部署
104+
```python
105+
# 完整的企业级RAG系统(集成检索+生成+幻觉检测)
106+
class RAGWithHallucinationDetection:
107+
def __init__(self, retriever, llm, hallucination_detector):
108+
self.retriever = retriever
109+
self.llm = llm
110+
self.detector = hallucination_detector
111+
# 预加载HHEM模型以提高性能
112+
self.detector.load_model()
113+
114+
def generate_answer(self, question):
115+
# 1. 检索相关文档
116+
retrieved_docs = self.retriever.search(question, top_k=3)
117+
118+
# 2. 生成回答
119+
context = "\n".join(retrieved_docs)
120+
generated_answer = self.llm.generate(f"基于以下文档回答问题:\n{context}\n\n问题: {question}")
121+
122+
# 3. 幻觉检测
123+
data = Data(prompt=question, content=generated_answer, context=retrieved_docs)
124+
result = self.detector.eval(data)
125+
126+
# 4. 根据检测结果返回
127+
if result.error_status:
128+
return {"answer": None, "warning": "检测到潜在幻觉,请人工审核"}
129+
else:
130+
return {"answer": generated_answer, "confidence": "high"}
131+
```
132+
133+
134+
## 📊 下载与使用
135+
136+
```bash
137+
# 立即体验最新版本
138+
pip install dingo-python==1.9.0
139+
140+
# 或从源码安装最新功能
141+
git clone https://github.com/MigoXLab/dingo.git
142+
cd dingo && git checkout v1.9.0
143+
pip install -e .
144+
```
145+
146+
## 🤝 参与贡献
147+
148+
Dingo的成长离不开社区的支持!欢迎:
149+
150+
- 🐛 **Bug反馈**[GitHub Issues](https://github.com/MigoXLab/dingo/issues)
151+
- 💡 **功能建议**[讨论区](https://github.com/MigoXLab/dingo/discussions)
152+
- 📝 **文档完善**[贡献指南](https://github.com/MigoXLab/dingo/blob/main/CONTRIBUTING.md)
153+
-**点赞支持**[GitHub Star](https://github.com/MigoXLab/dingo)
154+
155+
156+
157+
#数据质量 #RAG #人工智能 #开源项目 #机器学习 #大模型

0 commit comments

Comments
 (0)