Aug 2, 2017
A Benchmark Comparison Of Content Extraction From HTML Pages
I just published a post about one of the projects I have been involved with at work. It is aimed at developers with some understanding of machine learning, so not as technical as I would have liked, but hey! Many thanks to everyone else who worked on this- Chris Charlton, Marcia Oliveira and Maria Lehl.
Jun 19, 2017
Gold standard data: lessons from the trenches
This article is a draft of a talk I am giving at PyData Berlin in July 2017. It is intended for a non-technical audience, but I plan to expand it into a more technical piece soonTM.
Oct 10, 2014
An exploration of scipy sparse matrices
My colleague Matti Lyra recently faced an interesting computational problem. He wanted to see how quickly a stream of temporaly-ordered documents evolves, and he chose to do it by looking at how often new words appear in the steam. This post is about how to do this efficiently in Python.
Jul 14, 2014
This article explains the basics of profiling Python code. The hardest part is installing all the great tools that make it trivial to find the bottleneck in your code.
Subscribe via RSS