Understanding PageRank
By GrrrOwl
So Google assigns a number or PageRank between one and ten for each web page on the internet. But where does it get this number and how exactly does it calculate this number?
On the back of Larry Page's program BackRub, which analysed pages on the web according to their backlinks, Larry Page and Sergey Brin co-authored a paper entitled “The Anatomy of a Large-Scale Hypertextual Web Search Engine” shortly before putting their Ph.D. at Stanford University, on hold to work on Google full time. The algorithm they used was loosely based on citation analysis (The examinations of frequencies, patterns and graphs of citations in articles and books. Which is often used to determine Nobel Prize winners).
A simplistic version of PageRank would work as follows: imagine a small network of five pages, we will call these A, B, C, D and E. Page A will have a rank of 10 because B, C, D and E all have links pointing to A in addition to it's original rank of 2. Pages B and C each have a rank of 7 because A links to both of them, so they share page A's value of 10 in addition to their starting rank of 2. Page D and E link to one another giving them each a rank of 4.
Essentially what PageRank does then, is to increase the 'importance' of a given page, based on the 'importance' of it's backlinks( inbound links or 'citations'). Where page rank analysis differs from citation analysis is that the printed word is often vetted through various channels for quality of content before being published, whereas the quality of pages on the web is often extremely subjective in nature. In a paper entitled “The PageRank Citation Ranking: Bringing Order to the Web” Sergey Brin and Larry Page say that
“simple backlink counts have a number of problems on the web. Some of these problems have to do with characteristics of the web which are not present in normal academic citation databases.”
With this in mind a number of other factors are taken into account besides simply the number of backlinks a page has linking to it. Such as the quality of the pages linking to it. Google has of course, not shared these with the general public
“Because the web environment contains competing profit seeking ventures, attention getting strategies evolve in response to search engine algorithms. For this reason, any evaluation strategy which counts replicable features of web pages is prone to manipulation.”
So one of the factors involved with PageRank is the Damping factor or 'Random Surfer Model' whereby the probability of a 'random surfer' continually clicking on successive links at random and the unlikely situation that they would continue to do so, is taken into account. The surfer periodically gets bored and jumps to another page altogether. This coupled with the 'Intentional Surfer Model'(browsing habits taken from google toolbar) which is based on real users following links according to their interest and intentions, forms the basis for the SERP(search engine results pages) rank of a page.
Comments
No comments yet.