Skim Engine Updates: Address Extraction

Skim Engine Updates: Entity Extraction

We’ve been working on upgrading and improving our Skim Engine, expanding its already existing capabilities for data extraction.

Generally, the Skim Engine is able to transform unstructured data into structured and machine-readable data. The available data is used for standard processing that can be applied for retrieving information.

For this particular feature of entity extraction, we focused on any occupational roles found in unstructured text. These are normalised to an O*NET Standard Occupational Classification (SOC) Code and include a general job category description.

Updates with our customers in mind

It’s often useful for users to have a list of Job Roles from the text of the page so they can be linked to Person entities. This is useful, for example, for tracking the movements of executives, or other staff, to and from your competitors, offering immediate and structured business intelligence.

Technical approach

The entity extractor uses a Neural Network model that was trained and developed in-house. The process had two phases.

First, we simply show the model terms in the O*NET database in the context of news and company team pages, to give it a basic understanding of Job Titles.

The second phase involved our team of annotators teaching the model further by correcting the instances it was least certain about.

With the model trained, we also implemented Job Title normalisation: once a Job Title is extracted, it is mapped back to its entry in the O*NET database, providing additional information about the role found – namely, its general category, codes, etc.

Overall the process worked very well and with a minimal amount of manual labelling we produced a high-performance model for the product.

Solve a problem

Understanding the roles mentioned in web pages has a number of applications in Business Intelligence. For example, one company can monitor news, competitors’ pages and social media for trends of staff from the same general employment category (e.g. Chief Executives, Software Developers) joining or leaving.

This would provide valuable intelligence by signalling that the company may be in trouble, undergoing a takeover, moving into a new area and so on.

Further, global HR managers can extract job information across all regional offices, and ensure data is accurate within their own systems, combining external and internal data.

What else can be done?

Ultimately, we work closely with businesses that are in need of help in order for them to achieve their goals.

If you are thinking of how we can help your organisation in its next phase, let’s talk!

We are organising Exclusive AI Innovation Workshops in which our Data Scientists will give you insights on AI in your industry, share their knowledge and creative thinking as well as help to build together an AI roadmap tailored to your goals.

Want to keep up with AI, Machine Learning, Data Science and use cases in various industries? Sign up to our monthly newsletter here.

Our mission

Skim’s mission is to empower people to use data more effectively and to demystify artificial intelligence. Rather than holding up the common narrative of machines replacing humans, we see how machines can help humans to have easier lives and better businesses.

Supported by


London office
27 Finsbury Circus,
London EC2M 5NT

Portugal office
R. de Cândido dos Reis 81,
4050-152 Porto, Portugal

+44 207 129 7497