- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 15
 
Description
Search before asking
- I have searched the HUB issues and discussions and found no similar questions.
 
Question
I’m currently working on an AI-based symbol recognition project focused on ancient Chinese Oracle Bone Characters (甲骨文). The system is designed to recognize and classify these ancient symbols while distinguishing authentic Oracle Bone characters from mimicked or artificially generated (fake) ones that visually resemble the originals.
In the current dataset, there are 30 symbols in total — 15 authentic Oracle Bone Characters and 15 mimicked (false) symbols intentionally designed to resemble the real ones. The objective is to train and evaluate an AI model capable of learning subtle differences in stroke structure, curvature, and spatial composition between true and false symbols.
I would like to seek guidance or insights on the following aspects:
Which AI architecture or model type (e.g., CNNs, Vision Transformers, multimodal models, etc.) would be best suited for symbol recognition and authenticity differentiation in ancient scripts?
What are the best practices for dataset preparation and annotation when dealing with a limited number of symbols and stylistic irregularities found in historical scripts?
How can the model be trained to identify authenticity features — such as original stroke weight, spacing, or engraving patterns — that distinguish true Oracle Bone symbols from mimicked ones?
Are there any publicly available datasets or references containing verified Oracle Bone Characters that could help expand or validate my dataset?
What evaluation metrics or benchmarking methods would be most effective for comparing recognition accuracy and authenticity detection in such a small-scale symbolic dataset?
This project aims to enhance the accuracy of AI-based ancient script recognition and explore methods to prevent confusion between authentic historical symbols and synthetic or misleading reproductions.
Any technical guidance, research direction, or references related to model training or dataset expansion would be greatly appreciated.
Thank you for your time and support.
Additional
No response