PangeaMT has several general baseline engines in several domains like technical, legal, software, electronics. These are general engines with no specific data and general terminology. They are trained with EU, UN data and data PangeaMT has gathered from different sources. Our mother company, Pangeanic is a full member of TAUS (Translation Automation Users Association). We have also been able to gather a lot of clean data from donors like Microsoft, Adobe, Dell, Molina Healthcare, to name a few.

Our baseline engines are general enough to cover most language areas and can provide good results in some language pairs. Their BLEU score is between 50%-60%. That means that a large set of data, typically 20,000 words have not been used in the training so that the engines have not had a chance to learn them. And then we have asked the engine to translate them. As BLEU is a score and not a human evaluation, it is only an indication that our engines are quite fit for use.

Adding specific client terminology and bilingual sets will help us “tune” the engine to the style, expressions and intended use. It will “speak” more like what you really need. This is why PangeaMT systems are like no other: the platform will learn as soon as you add new material, either your own post-edited material or material you gather from other sources. If you think it is a bilingual or monolingual set related to your area of knowledge or intended machine translation application…upload it!

Remember that all data will be filtered through a cleaning process and that you ought to also pre-filter any material that is not actually well translated or not adequate for machine learning. Introducing “noise” can have really detrimental effects on good data, as it brings statistics down.

As part of our involvement with European Union research, we have applied search engine-type of translation memory recalls prior to machine translation requests to our engines. This technique utilizes the latest technology in fast multi-entry recall and is based on Elastic Search. We call it ElasticTM. It guarantees better flowing sentences when material shares certain similarities as it applies a hybrid approach to machine translation with translation memory recall and a powerful statistical machine translation with strong pre-process modules. It is not the best of both worlds… it is the best of 4 worlds!

So, enjoy the free trial period and PangeaMT engines!

Leave a Reply

Your email address will not be published. Required fields are marked *

three − 3 =

Post comment