Today we discussed the fundamentals behind text statistics and how to calculate probabilities of n-grams appearing in a document. Here are the slides for Lect. 7.
Lecture 6 covered crawling issues, indexing, Google’s BigTable, detecting near duplicate and duplicate content.
Also, the next paper was handed out.
R. Song et al,. “Learning Block Importance Models for Web Pages”
Here are the slides from lecture 6
Today we went through Google’s original page rank algorithm and then discussed important modifications. We showed how to derive the “Google Matrix” and how it relates to Markov chains. Here are the slides from lecture 4.