A Benchmark Comparison Of Content Extraction From HTML Pages
I just published a post about one of the projects I have been involved with at work. It is aimed at developers with some understanding of machine learning, so not as technical as I would have liked, but hey! Many thanks to everyone else who worked on this- Chris Charlton, Marcia Oliveira and Maria Lehl.
Full text here.