Lecture 7 - Text Statistics & Document Parsing

February 22nd, 2012

Today we discussed the fundamentals behind text statistics and how to calculate probabilities of n-grams appearing in a document. Here are the slides for Lect. 7.

Some useful links.
Google’s n-grams data
Text REtreival conference
A corpus of twitter data:Tweets2011

Information Retrieval, Indexing, BigTable Lect 6 CSCI 494

February 15th, 2012

Lecture 6 covered crawling issues, indexing, Google’s BigTable, detecting near duplicate and duplicate content.
Also, the next paper was handed out.
R. Song et al,. “Learning Block Importance Models for Web Pages”

Here are the slides from lecture 6

Deriving The Google Matrix

February 1st, 2012

Today we went through Google’s original page rank algorithm and then discussed important modifications. We showed how to derive the “Google Matrix” and how it relates to Markov chains. Here are the slides from lecture 4.

