The presentation will give a short overview of the architecture of search engines and how machine learning can help improving search engines. In addition some projects you can take part in will be briefly introduced. Developers of search engines today do not only face technical problems such as designing an efficient crawler or distributing search requests among servers. Search has become a problem of identifying reliable information in an adversarial environment. Since the web is used for purposes as diverse as trade, communication, and advertisement search engines need to be able to distinguish different types of web pages. In this paper we describe some common properties of the WWW and social networks. We show one possibility of exploiting these properties for classifying web pages.
Secdocs is a project aimed to index high-quality IT security and hacking documents. These are fetched from multiple data sources: events, conferences and generally from interwebs.
Serving 8166 documents and 531.0 GB of hacking knowledge, indexed from 2419 authors from 163 security conferences.