When I saw Opinion Space I immediately latched on to using the same concept for book recommendations. I spent a few weeks searching for someone else who had already done this and came up with nothing.
I wanted tag shadow to be heavily user based and was immediate struck with the chicken and egg problem. I needed data to process so that potential users would know what it is that I'm trying to do. I decided to test my code on a version that used data from Amazon.
The first thing I realized when I started gathering data on amazon was that tag usage was rather chaotic. You see this everywhere. One person labels science fiction with the tag "sciFi" whereas another person uses "science fiction." Some people tag all science fiction additionally as "fantasy". Some just settle for "sff" or "speculative fiction." I immediately set about dealing with this issue.
And then I read an article that eased my mind greatly: Ontology is Overrated: Categories, Links, and Tags. I particularly enjoyed the comparison of yahoo versus google, but this is the chunk that really stuck with me:
This looks relatively simple with the Apple/Mac/OSX example, but when we start to expand to other groups of related words, like movies, film, and cinema, the case for the thesaurus becomes much less clear. I learned this from Brad Fitzpatrick's design for LiveJournal, which allows user to list their own interests. LiveJournal makes absolutely no attempt to enforce solidarity or a thesaurus or a minimal set of terms, no check-box, no drop-box, just free-text typing. Some people say they're interested in movies. Some people say they're interested in film. Some people say they're interested in cinema.
The cataloguers first reaction to that is, "Oh my god, that means you won't be introducing the movies people to the cinema people!" To which the obvious answer is "Good. The movie people don't want to hang out with the cinema people." Those terms actually encode different things, and the assertion that restricting vocabularies improves signal assumes that that there's no signal in the difference itself, and no value in protecting the user from too many matches.
Once I decided to just work with whatever input I was given, everything just kind of fell into place. As of this writing, I have a version of the Amazon backed TagShadow with most of the display functionality that I envisioned. Check out this alternate history visualization.