Archive for February, 2012

Lecture 7 - Text Statistics & Document Parsing

February 22nd, 2012 admin No comments

Today we discussed the fundamentals behind text statistics and how to calculate probabilities of n-grams appearing in a document. Here are the slides for Lect. 7.

Some useful links.
Google’s n-grams data
Text REtreival conference
A corpus of twitter data:Tweets2011

Categories: Uncategorized Tags:

Information Retrieval, Indexing, BigTable Lect 6 CSCI 494

February 15th, 2012 admin No comments

Lecture 6 covered crawling issues, indexing, Google’s BigTable, detecting near duplicate and duplicate content.
Also, the next paper was handed out.
R. Song et al,. “Learning Block Importance Models for Web Pages”

Here are the slides from lecture 6

Categories: Uncategorized Tags:

Deriving The Google Matrix

February 1st, 2012 admin No comments

Today we went through Google’s original page rank algorithm and then discussed important modifications. We showed how to derive the “Google Matrix” and how it relates to Markov chains. Here are the slides from lecture 4.

Categories: Lectures Tags: ,