Data Science

A Benchmark Comparison Of Content Extraction From HTML Pages

Introduction Content extraction is the task of separating boilerplate such as comments, navigation bars, social media links, ads, etc, from the main body of text of an article formatted as HTML. The main content typically accounts for only a small portion of a page’s source code (highlighted in red in the image below). Extraction is…

Read More

Summarising lists, a popular web content type.

  One of the biggest challenges we’re trying to tackle at Skim Technologies is how to extract the most important information from a web page, in order to vastly improve the way people consume information in a mobile first world…  This post details some of the work we’ve been doing to automatically summarise lists. We…

Read More

A benchmark comparison of extractive summarisation systems

In this post, we report the results of the comparative evaluation of our Skim API against similar commercial and open-source extractive summarisation systems. Results indicate that our summarisation system consistently outperforms the analysed benchmarks, in terms of ROUGE-N.   The Information Age we are living in, fuelled by the advent of the World Wide Web…

Read More

Our mission

Skim’s mission is to empower people to use data more effectively and to demisty artificial intelligence. Rather than holding up the common narrative of machines replacing humans, we see how machines can help humans to have easier lives and better businesses.

Supported by

Contact

London office
27 Finsbury Circus,
London EC2M 5NT

+44 207 129 7497

sales@skimtechnologies.com

 

Portugal office
R. de Cândido dos Reis 81,
4050-152 Porto, Portugal

skim-logo