Last lecture we discussed the major topics in machine learning and an important classification algorithm entitled Support Vector Machines (SVM). Modern search engines use a combination of these fundamental techniques to find the relevance of documents w.r.t a given search query.
Papers to read: Standard SVM [Cortes and Vapnik, 1995]
We also discussed a two dimensional problem and showed how to expand a machine learning problem to infinite dimensions.
Lecture 8 covered probabilistic retrieval models and some review of basic probability theory.
Also covered were some of the early retrieval models. Vector space models and cosine similarity.
Slides for Lect 8.
Today we discussed the fundamentals behind text statistics and how to calculate probabilities of n-grams appearing in a document. Here are the slides for Lect. 7.
Some useful links.
Google’s n-grams data
Text REtreival conference
A corpus of twitter data:Tweets2011
Lecture 6 covered crawling issues, indexing, Google’s BigTable, detecting near duplicate and duplicate content.
Also, the next paper was handed out.
R. Song et al,. “Learning Block Importance Models for Web Pages”
Here are the slides from lecture 6
Today we went through Google’s original page rank algorithm and then discussed important modifications. We showed how to derive the “Google Matrix” and how it relates to Markov chains. Here are the slides from lecture 4.