Archive

Archive for January, 2012

Lecture 3 - CSCI494 - Anatomy of Search Engine (Coding a Basic Crawler)

January 26th, 2012 admin No comments

CSCI494 Lect. 3 Jan. 25 2012 (Slides)

Last lecture (1/25/12) we discussed the elements of a search engine including crawlers, spiders, indexers, repositories, lexicons, ranking modules, and query processors. We also talked about fetching URLs and reviewed initial code for a crawler for assignment 2.

Here are the slides:
Anatomy of A Search Engine. Assignment 2 Building a Basic Crawler.

Next week we will derive “The Google Matrix” formula.

It’s Nonsense to Claim that Google + is the Fastest Growing Social Network in History.

January 19th, 2012 admin No comments

Summary
Google + announced they hit 90 million users. By setting reasonable “initial conditions” we can show that there is not sufficient evidence to conclude Google + is “the fastest growing social network in history”. If we use Facebook’s published historical growth rate as a simple predictive model for true “viral adoption rates” we would expect Google + to announce that their user base is at approximately 165 million users by the end of March 2012. Data for Google + adoption is skewed because of a forced usage approach, free advertising, and an initial seeding by a large data base not originally available to other established social networks. Until we have more official data points there is no evidence that a viral (exponential) adoption rate, comparable to Facebook, is really taking place. Data from Google insights for Facebook and Google show large discrepancies.

Google’s Announcement
Last week Google announced their fourth quarter earnings. Included in their quarterly report was a statement about Google + adoption rates. Larry Page, CEO of Google said: “I am super excited about the growth of Android, Gmail, and Google+, which now has 90 million users globally – well over double what I announced just three months ago.

So, in other words, it doubled over the time period since they last reported a number. Google + was announced ~ 8 months ago. Since then, there has been a lot of discussion about how many people have signed up to use this new service. We are seeing a lot of public articles using headlines with phrases like “explosive growth” and “the fastest growing social network in history”. While the numbers are large the comments are related to absolute numbers with no reference to any expected growth rate. Absolute numbers by themselves are not very interesting. We need to look at numbers relative to something else to get meaning. Google has now published two official data points. To validate the claim (hype) that Google plus is the fastest growing social network we can compare data points where the “critical masses” of the user base are the same. We now have a few valid historical models to compare the viral growth rates of social networks against. In addition, the initial hype surrounding the announcement of Google + has worn off.

The initial claims that Google + was “the fastest growing network” are nonsense for a couple reasons. The first reason is that Google + started with a “seed set” of subscribers who are willing to signup for any new Google product. The conversion rate from users of Google products to any new product is high and can be approximated from past product launch adoption rates. Facebook and Twitter did not have this advantage when they launched. The second reason is that, until now, we have only seen a doubling in the number of users in the last six months. If true viral adoption of Google + is occurring then we would expect to see an exponential form similar to 2^t growth. The difficult question is “When can we reasonably expect exponential growth to form?” Google has other factors, like forced adoption, that are skewing the data.

Exponential Curves and Social Networks
The beautiful thing about social networks is their ability to create exponential curves (similar to 2^t). It is difficult to produce exponential curves in other types of marketing. The effect is really caused by one simple mechanism. The connection that your connections have and the fact that those connections can observe the interactions. Otherwise things, in general, look more linear. Marketing in other media usually appears more linear in nature. Of course the type of curves we can generate depend on many variables. Including how long a brand or product has been in existence. For example, you will not typically see exponential curves in a market where the product is a commodity. In the case of Google + we are talking about the viral adoption of “the next greatest social network”. Yes, we have been told that it is now “not a social network” and that it is “so much more”. This may be the long-term intent, however, in order for it to be what “it is now not”, and provide the kind of personalized search Google wants to provide, then people have to provide social data on a consistent basis to this network . So we need to call it what it is. Google + is a social network with some neat features.

Thus far we have heard a lot of big numbers but little discussion about the shape of the graphs or numbers relative to something else. Shown below is a graph of the estimated growth rate of Google + over the last six months. Paul Allen lists his estimated statistics but there is no analysis of what this data means. There are really only two confirmed data points. The red line, below, represents linear growth. From this graph we can’t make a claim that Google + is going to grow faster than Facebook. We also can’t claim that it will grow at the exponential rate we would expect for true viral adoption. A forced adoption rate would start to appear more linear and begin to slow down.
lineargrowth_of_google_plus

The only exponential curve I can find is the relative interest in Google (below) + and it has moved in the wrong direction. And it is not what we see for Facebook.
googleplustrends1

Notice that search volume for Facebook grew in proportion to the adoption rate and only recently leveled off (see below). The search volume has also never fallen exponentially down. Twitter also shows an interest graph similar to Facebook.
facebooktrends

If we take a look at adoption rates on Facebook and Twitter we see exponential curves. This means that the growth rate is dependent upon current value of the function or the “system”. The rate of growth is not a constant. For a social network this means that as the number of user increases we expect the adoption rate to grow because more people are using it. This will happen if the viral mechanisms are working correctly. This new announcement from Google + is only our second data point.

Shown below is a plot I created of Facebook’s and Google + growth since their launch. The blue graph shows Google + growth imposed over the graph when Facebook hit a user base of 90 million. The red graph is Facebook’s growth since launch. The time axis is in months. The growth for Google plus was shifted to the right in order to line up with Facebook’s growth after they hit 90 million users. This is to ensure the curves have similar “initial conditions”.

facebookgrowth

At this point we need to see the next announcement from Google at the end of Q1 2012 (March). If we are going to claim that Google + is growing at a faster rate than Facebook we would expect Google + to hit ~165 million users by the end of March 2012. Until we have more official data points there is no evidence that a viral adoption, comparable to Facebook, is really taking place.

Categories: Social Media, Uncategorized Tags:

Computer Science 494 Search Engines & Social Networks MSU- Spring 2012

January 12th, 2012 admin No comments

I was invited by MSU to teach CSCI 494-01 as an adjunct professor. The course is a senior-level course on search engines & social networks for computer science majors. Here is a description of the course and the topics to be covered:

Syllabus- CSCI 494 01 Spring 2012 - Search Engines and Social Networks

Description:
This course will cover important topics related to search engines, social networks, information retrieval, and data science. Students will study papers, patents, and algorithms written by search engineers and computer scientists. At the end of the course students will understand the algorithms and technology behind modern search engines and social networks.

Prerequisites
Calculus, completion of at least one course in a programming language (Java/C++), and HTML/CSS.

Date Lecture Topic
1.11 Introduction & Overview
1.18 History of Search
1.25 Overview of Search Marketing
2.01 Organic Search: Search Algorithms
2.08 Information Retrieval
2.15 Patents/Papers: Organic Search
2.22 Papers: Organic Search
2.29 Paid Search Algorithms
3.07 Social Networks & Algorithms
3.14 Spring Break – No Class
3.21 Social Networks & Algorithms
Paper titles for presentation due (Groups of three)
3.28 Analytics, Metrics, and Data Science
4.04 Semantic Web Technology (TBD)
4.11 Presentations 20 min TBD
4.18 Presentations 20 min TBD
4.25 Presentations 20 min TBD

Textbook
• Required papers and patents to be handed out during lecture.

Course Outcomes
• Understand algorithms behind modern search engines.
• Understand paid search marketing algorithms.
• Understand the history of search engines.
• Understand important patents related to search engines.
• Be prepared for basic interview questions from companies that develop search engines and/or social networks.
• Discover potential Thesis Topics for Graduate School

Grading
25% of a student’s grade will come from attendance of regular lectures and attendance of the final presentations and 75% will come from the student’s final presentations and papers.
Final presentations will be graded as follows:
50% on the groups 20 minute final presentation
50% on the individual’s 1-2 page paper

At the end of the semester, grades will be determined based on your class average as follows:
• 93+: A
• 90+: A-
• 87+: B+
• 83+: B
• 80+: B-
• 77+: C+
• 73+: C
• 70+: C-
• 67+: D+
• 63: D
• 60: D-

Categories: Uncategorized Tags: