The OutWit platform is a Web harvester and download management environment developed by OutWit Technologies and originally released as a first public beta in May 2008.
The central module of the platform, the OutWit Kernel, includes a library of recognition and extraction functions, packaged as a free extension for Mozilla Firefox. Around the kernel can be created specific applications using the application programming interface API. The platform's license allows advanced users to build and distribute their own original tools —called outfits— taking advantage of the Kernel's features for specific applications. Each outfit is a small XUL extension, with its own user interface, features, scripts, scrapers, directory of Web sources...
The technology is presented as a step towards a semantic browser which will recognize data and media elements using metadata when present and inferring semantic information when possible. The software automatically browses through Web sources to harvest information objects and organize them into reusable and sharable collections or mashups.
OutWit Hub is the first tool based on the OutWit platform. The beta version gathers a series of features to ease Web searches and organize collections. By breaking-down the elements of a Web page into different types of data, i.e. images, links, email addresses, text, tables etc., the program allows users to manipulate only the desired data and use it in a variety of applications. the application automatically browses through Web sources in full screen, analyzing each page’s navigation links and guessing the most pertinent next page URL. This way, with or without programming skills or technical knowledge, users can create automatic agents and scrapers to gather and format the information they seek.
While some of the data extraction functions are traditional web/screen scraping features, requiring the creation of a specific extraction masks for a page, others act more as intelligent filters eliminating all data not specifically requested.