Extracting Media from HTML
Over the last decade, there has been a rapid development from static webpages with little interactive content to media-rich and dynamic web pages. This can be attributed to new web technologies that make embedding of multimedia content incredibly simple. For example, adding a video in HTML5 is as easy as wrapping it in <video> tags.…
Read MoreA Benchmark Comparison Of Content Extraction From HTML Pages
Introduction Content extraction is the task of separating boilerplate such as comments, navigation bars, social media links, ads, etc, from the main body of text of an article formatted as HTML. The main content typically accounts for only a small portion of a page’s source code (highlighted in red in the image below). Extraction is…
Read MoreA benchmark comparison of extractive summarisation systems
In this post, we report the results of the comparative evaluation of our Skim API against similar commercial and open-source extractive summarisation systems. Results indicate that our summarisation system consistently outperforms the analysed benchmarks, in terms of ROUGE-N. The Information Age we are living in, fuelled by the advent of the World Wide Web…
Read MoreWhy we’ve changed from Scrum to Kanban…
In the two short years we’ve been developing software at Skim.it, we’ve always taken an agile approach to our work. It’s not always been easy, and now more so than ever – like true Agile ninja’s – we’re questioning, testing and improving our processes. There’s a misguided belief among tech startups that the only way…
Read MoreA brief introduction to summarisation
Our chief data scientist shares her thoughts on summarisation, by answering some of the most burning questions on this exciting discipline of information management.
Read MoreWhy you should stop running your own node server
In this post, Florian – our Fullstack developer – reveals why he thinks the future is serverless…
Read More