Fishing Around: WSJ, despised teams, and misquoting sources

Today, David Biderman of the Wall Street Journal published an article that “examines” who is the most loathed team in baseball. One of the people Biderman contacted, and quoted, was FanSided’s own Ed Carroll. You may know Ed from his excellent work over on our Cleveland Indians site, Deep Left Field. If you aren’t aware of the site, I strongly encourage you to check it out.

Ed published an article today reacting to the WSJ story. Not only does Ed correctly question the methodology of the “internet algorithm” that spit out the results, it turns out he was also misquoted. […]

This concept (that of measuring which major league organization is the most despised or loathed) is an interesting one to pursue, but I’m not sure how an algorithm based off keywords can possibly come to an accurate conclusion. Based on the limited details the story provides we really don’t gain any true insight along those lines.

Among other things, I wonder:

Do the Royals get negative marks when I call into question the decisions of Dayton Moore?

Do the Phillies get negative marks when they inexplicably give Ryan Howard a 5-year $125 million extension and the online writing community tears into the organization as a result?

How inclusive is Nielsen’s sentiment scale and does it factor in the sources of said sentiments?

The Yankees blogosphere, for example is much larger than any other on the internet. Not knowing how this algorithm measures and compiles data and not knowing what sources it draws from, how do we know there isn’t inherent organizational bias in the data? For example, there are 1,028 team based baseball blogs listed on Blogged.com. Of those 1,028 blogs, 16.5% (170) of them deal, in some fashion, with the Yankees. On the other end of the spectrum, only 0.6% (6) deal with the San Diego Padres.

It doesn’t take a member of Mensa to figure out that there is a lot less Padres chatter on the internet than there is Yankees chatter. The more chatter that is out there to be measured, the tougher it is for one opinion to impact the general consensus or “sentiment” relative to a specific team. If 2 of 6 Padres sites criticize the trade of Jake Peavy, that’s 33% of the sources that are against the move. If 20 of 170 Yankees sites criticize letting Hideki Matsui become a member of the Angels, that’s only 11.8% of sources. To state it another way, the smaller the pool of data, the more easily the results are skewed.

Without knowing specific the methodology or standards used to measure sentiments as they relate to specific teams, the results need to be viewed with a great deal of skepticism. I could assume that the Nielsen Co. internet algorithm involved is relatively complex, thorough, and all inclusive but I don’t know that to be the case. Further, I’m pretty sure that the algorithm in question was not built to be used in this fashion. It is a pretty big leap to go from measuring positive and negative reactions to measuring which of the 30 MLB teams is the most “despised.” Having a negative reaction to something is far different than despising a team or even loathing a specific event.

I’m sure in many walks of life this standard of measurement can slide by, but we’re baseball fans. We live for and love a sport that has statistics and analysis of data at the core of its being. As such, we expect more.

(You can stay current on all the Call to the Pen content and news by following us on Twitter, Facebook, or by way of our RSS feed)