Language Detector

Automatically detect the language of a string, text, document in seconds.
CONTACT US
Or give us a call on
+34-96 333 6 333
Processing multilingual information or information in multiple languages has become ever more relevant in today’s digital world. Pangea’s Language Detector identifies the language and character encoding of incoming documents. It supports more than 84 languages, covering major Western and Eastern European, Semitic, Central Asian, Turkish, Japanese, Chinese, etc.

Pangea Language Detector can be successfully used:

null

As a pre-process before machine translation

null

To pre-filter text and improve the quality of input text data when training algorithms (most natural processing algorithms have monolingual texts as training data - adding other languages can decrease the performance of document management systems);

null

To organize data (speech-to-text, documents, etc.) before other processes;

null

To mine bilingual texts for machine translation from online resources;

null

For retrieval, grouping and understanding relevant information (user’s texts, emails and etc.) in multilingual environment.

Pangea Language Detector accurately determines not only the language of the whole document, but also the language of each snippet, paragraph or fragment.

Our Language Detector combines both statistical and neural technologies in order to obtain the highest recognition results. Our proprietary language detection algorithm is based on a strong mathematical model of vector spacing algorithm. We create a multidimensional space of vectors scanning document contests and use N-grams notion for calculating frequencies. The algorithm analyzes the positions of the necessary vectors in space to determine their similarity. Finally, combined algorithm results are corrected using special linguistic rules developed by our language team.

For evaluation purposes, we have created a demo page to detect the most popular languages achieving language identification accuracy from 95% to 99% (typical competitors’ results: 86 – 96%). The average processing speed was over 8000 KB/s.

CONTACT US
Or give us a call on
+34-96 333 6 333