Monday, January 08, 2007

More on Microsoft's Live Search

Over the weekend, I spent some time digging a bit deeper into the seeming anomalies with Microsoft Live search results that I discussed last week.

My original intent was to take a purely quantitative approach. I planned to search using the names of a few people who I knew would have quite a few articles, blog posts, and/or quotes that would mention Microsoft--as well as many that didn't. And then I would simply summarize the results of those searches run using Live, Google, and Ask, and thereby see if there was a pattern of differences--and specifically if "Microsoft" continued to appear more frequently in the Live results than in the others.

This methodology turned out to be problematic. One problem is that many of the results are "dynamic content." For example a CNET page might list a changing set of stories on the side of the page--which could include stories with a Microsoft-related headline. Who knows what specific contents were present when the page was spidered by a given search engine. A similar issue occurs with dynamic blogrolls. In addition, some searches pointed to the main page for a blog--which has many posts (and therefore is much more likely to contain any given search term). It doesn't seem that this should really be "scored" the same way as a search result that returns a specific post. Finally, not all the search results were relevant (i.e. some pointed to different people); the engines varied in this respect, which would have added further noise to data based solely on counting occurrences of "Microsoft." Thus, I went back to a more impressionistic approach.

My results probably are still the oddest. I discussed the image search in last week's posting. In Web search, it's the two top results that are the real outliers. Microsoft is prominently mentioned in the #1 and #2 results for Gordon Haff: & These are individual posts and neither is returned anywhere near the top by either Google or Ask. Other Live results that include mention of Microsoft (e.g. and email-collaboration.html) are also returned by at least one other search engine.

I also ran searches using a couple other names. A search on Stephen Shankland (all searches were run at about 5PM on January 6) also returned Microsoft in the top two results--including a headline: and However, Google also returned a number of high-ranked hits that included "Microsoft" (starting in the #3 position).

My conclusion after relooking at the results I described last week as augmented by these other search results? There are still puzzling oddities although nothing that I would call a smoking gun that proves systematic bias toward search results containing "Microsoft." The results produced by the search on my own name continue to be particularly hard to explain, given that they point to specific posts about Microsoft that are not highly ranked by any other search engine that I examined. However, it's also true that my admittedly limited testing using other names didn't turn up other examples that were anywhere near as compelling. I'll be interested to see if anything else turns up on this, but for now I'm just moving it to the "that's weird" bucket.

No comments: