Asiatic Society launches AI initiative to decipher ancient manuscripts in archives
Times of India | 11 April 2025
123 Kolkata: The Asiatic Society is embarking on a groundbreaking initiative to unlock the vast repository of ancient wisdom hidden within its archives. With over 52,000 manuscripts—many undeciphered for centuries—the institution is now leveraging transcription technology and machine learning to make these texts accessible to scholars worldwide.
India boasts the largest known collection of ancient and medieval manuscripts, yet many remain unread due to the painstaking manual labour required for transcription. Automating this process presents a monumental opportunity to uncover invaluable knowledge about the past. Most of these documents are written in diverse languages and scripts, with Sanskrit being the most prevalent, said Asiatic Society administrator Lt. Col. Anant Singh, who conceived the project.
In collaboration with the Centre for Development of Advanced Computing (CD AC), Asiatic Society has recently launched Project Vidhvanika (meaning decoding of knowledge), an ambitious effort to develop language models for ancient scripts. According to Singh, the project will assess existing resources to ensure efforts are focused on scripts that lack a functional language model. "There is no point in developing a language model if one already exists. The real challenge is that experts who understand and interpret these scripts are dwindling," he said.
To train and validate these models, researchers are utilising digitised handwritten Sanskrit manuscripts or other language manuscripts, some of which date back to many centuries BC. Scholars manually transcribe lines of text into specialised software, measuring transcription time to gauge difficulty. This data helps refine the algorithm, reducing errors and enhancing accuracy, said a CDAC scientist.
"Serious efforts to create a fully functional model for these ancient scripts have not been undertaken. In most cases, there is simply no software available," Singh added.
The project envisions a National Language Learning Initiative, aimed at developing a machine learning-based system capable of reading, transcribing, and translating manuscripts into multiple world languages. Optical Character Recognition (OCR) technology will play a pivotal role by analysing script depth in rock edicts and inscriptions, potentially yielding even greater transcription accuracy, said a CDAC scientist.
Researchers are also exploring quantum computing to enhance transcription capabilities.