LANGUAGE DETECTOR

Automatically detect the language of a sequence, text or document in a matter of seconds

Do you need to identify the language and character encoding of your documents?

Process text before Machine Translation

Pre-enhance the text and improve the quality of the received data when training algorithms

Organize data (speech to text, documents, etc.) prior to other processes

Extract bilingual texts from online resources for machine translation

Retrieve, group, and understand relevant information (user texts, e-mails, etc.) in a multilingual environment

Pangeanic's language detector accurately determines both the language of the entire document and the language of each fragment, paragraph or section

Our language detector combines statistical and neural technologies to obtain the best recognition results. Our own algorithm is based on a mathematically sound model of the vector spacing algorithm.

We create a multidimensional space with vectors that analyze the content of the documents and use the notion of n-grams to compute the frequencies. The algorithm analyzes the positions of the required vectors in space to determine their similarity.

Finally, the combined results of the algorithm are corrected using special linguistic rules developed by our team of expert linguists.

For evaluation purposes, we created a demo page to detect the most popular languages with a language identification accuracy of 95% to 99% (typical competitor results: 86% to 96%). The average processing speed was over 8000 KB/s.

LANGUAGE DETECTOR

Our language detector can successfully be used to:

Pangeanic's language detector accurately determines both the language of the entire document and the language of each fragment, paragraph or section

Want to find out more about our language detector?

Subscribe to our newsletter: