Overview

Prime Tech developed a suite of AI-driven language processing tools designed to improve communication, automate transcription, and enhance text processing in Bengali. These solutions addressed challenges in speech recognition, text-to-speech synthesis, punctuation correction, and handwriting recognition, catering to industries such as customer support, legal documentation, education, and financial services.

Problem Statement

The lack of robust AI models for the Bengali language resulted in inefficient manual transcription, poor accessibility for visually impaired users, and difficulties in processing unstructured handwritten data. Organizations faced challenges in automating these processes, leading to increased operational costs and reduced efficiency.

Solution

  • Bangla Speech-to-Text System: We implemented this solution using Kaldi and Vosk-API to process local Bengali dialects and transcribe spoken words accurately.
  • Bangla Text-to-Speech: We developed this model with Tacotron and MelGAN to convert Bengali text into a natural-sounding voice.
  • Bangla Punctuation Model: We trained a BERT-based model to intelligently punctuate Bengali text for readability.
  • ICR / Handwriting Recognition: We used TensorFlow and custom OCR engines to interpret handwritten Bangla forms, especially in banking.

Technology Stack

Python, Kaldi, Vosk-API, Tacotron, MelGAN, BERT, TensorFlow, NLTK, Spacy, Docker, AWS, CUDA