What Is a DeepDive Analysis?

DeepDive is a trained data analysis system developed by Stanford that allows developers to perform data analysis on a deeper level than other systems. DeepDive is targeted towards developers who are already familiar with Python and SQL, notes the DeepDive website.

Unlike traditional systems, DeepDive works well with noisy or imprecise data resulting from misspellings and human mistakes. It is also able to extract large amounts of data from numerous sources that include millions of PDF files, websites, documents and more. Developers are able to write simple rules to inform the system’s learning process based on their knowledge of a domain, and DeepDive’s “distant” training ability means developers do not need to provide training for every prediction. According to Stanford’s DeepDive website, the system uses a high-performance learning and inference engine thanks to cutting-edge techniques.

DeepDive analysis has been used for both broad and specific domains, and it is particularly effective in scientific applications such as those used for geology and paleobiology. DeepDive has also collaborated with Wikipedia to create Wisci, which enriches the online information database with structured data. According to the official website, the DeepDive project is headed by Christopher Ré of Stanford University along with his team, and several alpha versions have been released as of September 2014.