Text Classification / Categorizer
Automatically categorize documents according to knowledge classifiers.
Or give us a call on
Pangea automatic text classification and categorization consists of a collection of modules that implement common classification and categorization tasks. This may be related to Text Classification or operate as a separately, at high-level, also finding a set of defined relationships among those modules.
The various details are flexible – for example, you can choose what categorization algorithm to use, what features (words or otherwise) of the documents should be used (or how to automatically choose these features), what format the documents are in, and so on.
The customization process of using this module typically involves obtaining a collection of pre-categorized documents from the organization. Pangea trains its deep neural networks to recognize the features of each document and the difference with other documents. This creates a “knowledge graph” representation, training a categorizer to recognize a particular knowledge set. This trained set is saved and queries can be set against it.
There are several ways to carry out the queries. The top-level Text Classification and Categorizer module provides an umbrella class for top level category classifier operations, but you may use the interfaces of the individual classes in each class.
Our semantic tool automatically classifies documents by content and organizes them within general categories such as Eurovoc or it can be customized to your organization’s structure, terminology and processes. The Categories can be Legal, Compliance, Human Resources, Research and Development, Accounts and Finance, Reports (Sales, Management, etc.), Customer Feedback, Newsletters, and many more. The definition of categories is a free user’s choice not restricted by categorization algorithms.
7.1 Text Classification / Categorizer accuracy
Text Classification and Categorization of documents is often a difficult task even for humans well-trained in the particular domain of knowledge, and there are many things a human would consider that none of these algorithms consider. One document, for example, may belong to more than one Category. Our Use Cases provide previous applications in Fintech with over 90% accuracy in defined domains. Some human supervision may remain due to unexpected or new types of documents.
The Pangea Text Classification / Categorizer is an ideal solution for:
The Pangea Categorizer is available as a server application for on-premises or SaaS deployment.
7.2 Categorization technology
The Pangea Categorizer algorithms are based on deep machine learning techniques. Our approach to document categorization is run in two phases: the training phase and the prediction stage.
At the training stage, the Pangea Categorizer builds a classifier by learning from a set of model documents for each category. Its learning algorithm uses a wide range of semantic features extracted from document texts:
This training process creates models which at prediction stage use the vector space model to categorize documents. Each input text is compared with semantic features from the model category and the degree of proximity between them is calculated. The document is assigned to the category with maximum relevance value.
Or give us a call on