Skip to content

Query Regarding AI Model for Oracle Bone Script (甲骨文) Recognition and Differentiation from Mimicked Symbols #1215

@Team-Q-Botix

Description

@Team-Q-Botix

Search before asking

Question

I’m currently working on an AI-based symbol recognition project focused on ancient Chinese Oracle Bone Characters (甲骨文). The system is designed to recognize and classify these ancient symbols while distinguishing authentic Oracle Bone characters from mimicked or artificially generated (fake) ones that visually resemble the originals.

In the current dataset, there are 30 symbols in total — 15 authentic Oracle Bone Characters and 15 mimicked (false) symbols intentionally designed to resemble the real ones. The objective is to train and evaluate an AI model capable of learning subtle differences in stroke structure, curvature, and spatial composition between true and false symbols.

I would like to seek guidance or insights on the following aspects:

Which AI architecture or model type (e.g., CNNs, Vision Transformers, multimodal models, etc.) would be best suited for symbol recognition and authenticity differentiation in ancient scripts?

What are the best practices for dataset preparation and annotation when dealing with a limited number of symbols and stylistic irregularities found in historical scripts?

How can the model be trained to identify authenticity features — such as original stroke weight, spacing, or engraving patterns — that distinguish true Oracle Bone symbols from mimicked ones?

Are there any publicly available datasets or references containing verified Oracle Bone Characters that could help expand or validate my dataset?

What evaluation metrics or benchmarking methods would be most effective for comparing recognition accuracy and authenticity detection in such a small-scale symbolic dataset?

This project aims to enhance the accuracy of AI-based ancient script recognition and explore methods to prevent confusion between authentic historical symbols and synthetic or misleading reproductions.

Any technical guidance, research direction, or references related to model training or dataset expansion would be greatly appreciated.

Thank you for your time and support.

Image

Additional

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    classifyImage Classification issues, PR'squestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions