LinkRanks are our way of measuring the strength, persistence, and vitality of links appearing in weblogs. When PubSub reads a new weblog entry, we pull out any URIs we find and attach them to the entry in a separate field. This allows our users to include domain names or linked file types when creating subscriptions.
From this set of URIs, it’s easy to find the most popular domains. LinkRanks take one more step and calculate scores for each linking site; domains are then scored based on the values of the sites that link to them. The theory is basically that these are the links you’re most likely to click on, if you read a weblog at random.
Unlike Google’s PageRank system, LinkRanks are not iterative. Rather, we base LinkRanks on a simple formula that only looks at local links – links which are within one or two steps of any target site. Also, it’s important to note that we only look at links which are in weblog entries – we don’t read any of the other links on the page, like the side bars or blogrolls.
The intent of this system is not to measure the strength of any particular domain, but rather the relative likelihood that you’d find and follow a link to that domain. As such, the links are what’s really important, not the pages themselves.
To calculate LinkRanks, we generate a link score for each domain. Link scores are calculated in three steps: first, we find a point value for every site that links to other sites. Second, we use the point values to generate link scores for each domain. Finally, we weight the daily scores over a fixed period to arrive at an aggregate score for the site – this ensures that more recent links are given more value than links from several days ago.
The first part of LinkRanks is generating a link score for each target domain.
1. Maps of Linking SitesThe first step is to build a list of all the sites that include a link to a particular domain. This list is built as we read new weblog entries. For a given link “target” T, we find each site S that has a link to T in a weblog entry.
2. Point Values for Each Linked SiteThe next step is to assign a point value to each target domain T, based on the total number of linking sites and the number of inbound links from each site. We also take into account the total number of outbound links from each site S. The rationale here is that if you were to visit site S randomly, you would have a pretty small chance of clicking a link to site T if there were thousands of other links to choose from; if there were just a few, it would be more likely that you’d follow the link to T.
For a given target site T and linking site S, T receives a number of points P
where SO is the total number of outbound links from site S, and ST is the number of links from S to T. The total point value for site T is then the sum of point values assigned from each linking site, such that
3. Link Scores for Each DomainThe actual scores for each domain are based not on the point values of the domains themselves, but on the aggregate point values of sites linking to the domain. For a given domain D, the link score LSD is the sum of the point values of all sites T that link to D,
4. Time-based Weighting To calculate the aggregate link score for a domain, we weight the trailing ten days’ link scores by factors of 2, so that the most recent score has twice the weight of the previous day’s score, and so on, over ten days.
When we have the link scores for each domain, we create a list of LinkRanks by ordering the scores for all domains. LinkRanks are generated once a day, based on the previous day’s data. The generation date is indicated on the top of the page.
This is just a test! We need help refining this formula to make it more representative of weblogs and the web in general. Since this is new, we’re liable to change the ranking scheme at any time; the ranks are therefore likely to be pretty volatile. Please don’t feel badly if your domain has a low ranking – no doubt this is the fault of the formula (and not a reflection on the quality your site).
If you have any ideas on how to develop LinkRanks or how we can improve the formula, please let us know at firstname.lastname@example.org or just write about LinkRanks in your blog.
If you comment on LinkRanks in your blog, we’d like you to try an experiment with us. In weblog entries that talk about LinkRanks, include this URN somewhere:
This can be in the text of the entry, or in an anchor tag, or anywhere else. It doesn’t have to be visible, or linked. We want to see if we can construct a conversation thread around the topic by using a common URN. For reference, the URN form is based on NewsML (more specifically the URN namespace for NewsML resources), and PSI stands for “published subject indicator” (see XML Topic Maps [topicsmaps.org] and Published Subjects [OASIS]). If this works, we want to try to use this kind of URN to bring conversations together around various topics.
We hope that you find LinkRanks useful, or at least interesting, and if so please comment in one of the ways listed above! We’re always happy to hear comments or criticism. And please remember, as always, no wagering.