Text Classification / Categorizer

Automatically categorize documents according to knowledge classifiers.
CONTACT US
Pangea automatic text classification and categorization consists of a collection of modules that implement common classification and categorization tasks. This may be related to Text Classification or operate as a separately, at high-level, also finding a set of defined relationships among those modules.
The various details are flexible – for example, you can choose what categorization algorithm to use, what features (words or otherwise) of the documents should be used (or how to automatically choose these features), what format the documents are in, and so on.
The customization process of using this module typically involves obtaining a collection of pre-categorized documents from the organization. Pangea trains its deep neural networks to recognize the features of each document and the difference with other documents. This creates a “knowledge graph” representation, training a categorizer to recognize a particular knowledge set. This trained set is saved and queries can be set against it.

There are several ways to carry out the queries. The top-level Text Classification and Categorizer module provides an umbrella class for top level category classifier operations, but you may use the interfaces of the individual classes in each class.

Our semantic tool automatically classifies documents by content and organizes them within general categories such as Eurovoc or it can be customized to your organization’s structure, terminology and processes. The Categories can be Legal, Compliance, Human Resources, Research and Development, Accounts and Finance, Reports (Sales, Management, etc.), Customer Feedback, Newsletters, and many more. The definition of categories is a free user’s choice not restricted by categorization algorithms.

7.1 Text Classification / Categorizer accuracy

Text Classification and Categorization of documents is often a difficult task even for humans well-trained in the particular domain of knowledge, and there are many things a human would consider that none of these algorithms consider. One document, for example, may belong to more than one Category. Our Use Cases provide previous applications in Fintech with over 90% accuracy in defined domains. Some human supervision may remain due to unexpected or new types of documents.

The Pangea Text Classification / Categorizer is an ideal solution for:

null

Enterprise content / Knowledge management;

null

Financial documentation categorization;

null

Insurance document pre-classification;

null

Evaluation of new trends in business, science and technology.

null

Business information management;

null

Patent prior art search and analysis;

null

Automated helpdesk systems;

The Pangea Categorizer is available as a server application for on-premises or SaaS deployment.

7.2 Categorization technology

The Pangea Categorizer algorithms are based on deep machine learning techniques. Our approach to document categorization is run in two phases: the training phase and the prediction stage.

At the training stage, the Pangea Categorizer builds a classifier by learning from a set of model documents for each category. Its learning algorithm uses a wide range of semantic features extracted from document texts:

null

Words with part of speech tags;

null

Noun phrases and syntactic dependency between them;

null

Complex semantic relations detected our Linguistic Processor.

This training process creates models which at prediction stage use the vector space model to categorize documents. Each input text is compared with semantic features from the model category and the degree of proximity between them is calculated. The document is assigned to the category with maximum relevance value.
CONTACT US