NYT Madison and Hive

Madison is a public-facing website that incentivizes readers to help identify ads from The New York Times archives. Advertisements are notoriously difficult to programmatically decipher with tools that work well for other newspaper content, like articles, due to their creative and inconsistent layouts and typefaces. So we created Madison, giving our audience a few ways to contribute to this effort depending on their interest and amount of free time.

Madison runs on an open source platform I created called Hive. Hive allows developers produce crowdsourcing applications for a variety of contexts. Informed by our work on Streamtools, Hive’s technical architecture takes advantage of Go’s efficiency in parsing and transmitting JSON along with its straightforward interface to Elasticsearch. Combining the speed of a compiled language with the flexibility of a search engine means Hive is able to handle a wide variety of user-submitted contributions on diverse sets of tasks.