Feed on
Posts
Comments

I’ve always been vaguely uncomfortable with folksonomies. There is something about the concept that just doesn’t sit right with me. Every time I hear people wax on about them, I fidget in my seat; I feel kind of itchy and unsettled at the same time. Perhaps it’s my latent, leftover librarian-like nature.

It took me a while to put my finger on what bothered me. What was making me uncomfortable became clear when I stumbled across Elaine Peterson’s commentary from D-Lib Magazine this last November — a great article entitled “Beneath the Metadata – Some philosophical problems with Folksonomy.” I recommend it.

[I highly recommend the online 'zine too, by the way. If you're struggling with classifications, DRM, database design, information organization and retrieval you may just discover that your friendly neighborhood librarians have been struggling with those same issues for years; in some cases they've developed some pretty well tested tools and concepts. ]

Reading her commentary helped me define my discomfort with folksonomies — they may be “good enough” for some things, but they introduce inaccuracy and irrelevancy in the form of meta-noise. She notes:

…”Folksonomy is a scheme based on philosophical relativism, and therefore it will always include the failings of relativism. A traditional classification scheme will consistently provide better results to information seekers…”

Beautifully put. Traditional classifications (née taxonomies) provide better results. I’d go further and say that folksonomies tend to provide inaccurate or even downright bad results. What folksonomies do have going for them is that they are “approachable.” By that I mean that using them doesn’t take much intellectual investment on the part of the user.

Let’s take a case in point – I saw a post on a friend’s blog the other day entitled “Why is this cacti tagged NPTech?” The post got me thinking about many things:

  1. Why was a photo of a cacti tagged using the “NPTech” tag?
  2. What the hell is the “NPTech” tag and how would one use it anyway?
  3. What purpose do folksonomies serve? How are they different from taxonomies?
  4. What’s good or useful about user-generated content and what’s just entertaining or cute.

[Admittedly user generated stuff can be both good and entertaining, e.g., YouTube in some cases, but lots of times it's just more noise. On the other hand, talking cats are, of course, cute, and "Mr. Macaca" probably swung the Virginia senate race and, with that, the Senate itself. That's probably worth $1.65 billion right there.]

A quick glance at things tagged “NPTech” on Flickr turned up lots of pictures of people in meetings, me included, and a nice picture of a cactus.

A quick scan of things tagged NPTech on Delicious turned up a bunch of stuff – all pretty much undifferentiated. It seems to me that the tag is not that useful — either as a category of stuff or as a retrieval tool.

Classified under “NPTech” were saved bookmarks to job postings at nonprofit agencies (all now probably hopelessly out of date), some bookmarks to SourceForge software projects, some predictable chest beating about Microsoft doing this or that, and lots of other stuff that, in someone’s mind, must have related to Nonprofits and technology – but to me the connection was not always clear, and definitely not that useful.

Admittedly, here, part of the problem is in the definition of a nonprofit – it’s a tax status, after all. Under that same status there are a whole world of organizations – many of which have nothing whatsoever in common, in structure, organization, or mission. Moreover, modifying “technology” with the word “nonprofit” rarely seems to make sense to me. I think it’s related more to what one does with the technology (applied to non-profit ends) than the nature of the technology itself.

Regardless, as a method of finding
stuff, this folksonomic approach left a lot to be desired. Tags (or taxonomies) are all about finding stuff, cataloging things so that you or someone else might find them again. Bottom line: when I am looking for red wagons, or for accounting systems that do fund balances, I don’t want to find a picture of a cactus, or a rant about how Microsoft is attempting to patent human reproduction, fjords, and every word that rhymes with Zune.

The beauty of a taxonomy (sometimes known as a “controlled vocabulary”) is that it is a controlled
vocabulary. Folksonomies are inherently, by definition, un-controlled. And, while I am madly and unrestrainedly in favor of individual freedom, the creativity of chaos, and the like, uncontrolled cataloging is usually a disaster.

Moreover, sometimes crowds are just not that wise and they are definitely not very speedy!

Controlling the vocabulary limits the introduction of noise — or in this case, “metanoise” — into the system, and, hence, increases the accuracy. Taxonomies are all about accuracy because taxonomies are all about finding (retrieving) information. IMHO, taxonomies and folksonomies are only as good as their retrieval accuracy, and folksonomies are piss-poor at it.

Standards are only good when, well… when they’re standardized, when they’re consistent.

[Admittedly, folksonomies do have a terrific branding – what a great name. Who wouldn't like something called a folksonomy? It sounds so friendly, so .. ah.. folksy.]

There are those that argue that a folksonomic approach is “good enough.” Good enough for whom? To me, a folksonomy is only acceptable, only useful, when the following is true:

  • The stuff you’re cataloging is only for you and perhaps a few others.
  • Either the content itself, and/or the retrieval accuracy is just not that important.

I’ll say that again: Folksonomies are only good when retrieval accuracy is not important. It all goes back to the reason you might want to catalog [i.e., "tag"] things.

Why tag? There is only one reason: so you, or someone else, can find it again. The goal is to organize stuff into logical groupings. If the stuff is not that important and/or retrieval accuracy is not that important, well, then a folksy approach is just fine. Or if it’s just for you and organized according to your own charming or twisted idiosyncratic logic, that’s fine too.

Take my Flickr photos for example. A folksonomy is just fine for my photos, or for your photos. In the grand scheme of things, the accuracy is just not that important. The fact that I tagged one set of photos “friends” doesn’t make a lick of difference in the world. Fact is, they aren’t your friends, they’re my friends – hence the tag is only useful to me, and wholly (most likely) inaccurate to you. Or, take the case where I tagged a few photos “Irland” (misspelling Ireland). In the general scheme of things, no one is hurt, the data is not that important.

Here’s another example. In Delicious, I tag a bunch of stuff using the “ToRead” tag. Utterly useless to anybody but me; but tremendously useful to me. It means I can tag stuff I want to read when I run across it. But, I can choose to read it later when slouching on my couch, tablet PC in hand, at home. Otherwise I’d never find it again. It’s totally useless as other than a personal taxonomy. It works for me, but is totally useless to you. [Strangely, Delicious is now occasionally suggesting "ToRead" as a tag when I hit the "remember this" button.]

On the other hand, imagine your emergency room surgeon is consulting a wiki of medical protocols that used a folksonomic classification system…

[Let's see… How about Dr. Gesund tags this protocol under "brain surgery" and Dr. Krank tags it under "trepanning" and then we watch them race each other for the Black and Decker.]

Well, you get my point. When it’s important, folksy just don’t cut it.

I posed this question the other day over coffee (actually green tea for me) with to two friends in a San Francisco coffee shop. Their response was “Why would you ever look to a wiki for medical procedures?” I mumbled agreement. Of course that would be stupid, but have yet to think of a better analogy.

Bright fellows both, they argued that folksonomies were a great ways to collaboratively develop a real taxonomy. With that I had to somewhat agree, but not totally.

While using a folksonomy might be a great way to collaboratively develop a real taxonomy, I’d argue that there are much more efficient and effective ways. Rather than the wisdom of a crowd, I’d recommend the wisdom of a few experts within that crowd. In the end you’d end up with a more accurate and useful taxonomy, with half of the wasted bandwidth, and in probably a tenth of the time.

Actually, I’d argue that your first step should be to check if someone hasn’t already developed a workable taxonomy. There are hundreds of them out there – used to catalog everything from graphic images to books to types of nonprofit organizations, and to the subject or type of a nonprofit program.

In fact, it is my experience that many a collaboration consists of 97 people arguing about “how” to do something and 3 people actually doing it.

Face it — if your end goal is a real taxonomy, using a folksonomic process is pretty inefficient. It’s slow and you’ll probably end up (painfully) recreating a wheel or two. I’ve been through the taxonomy birthing more times that I can count, and trust me, if there is an existing standard vocabulary, you’d be wiser to go with the standard, even an imperfect one, than (re)create one.

14 Responses to “Return to Beneath the Valley of the Metadata”

  1. on 02 Jan 2007 at 9:41 am Laura Quinn

    Bravo, Gavin! I couldn’t agree more (see my own rant on this subject at http://www.idealware.org/blog/2006/10/taxonomy-is-dead-long-live-taxonomy.html). I will say, though, that folksonomies have one substantial advantage over taxonomies - as you mention, they’re more approachable, and that makes them much more scalable. If you’re looking to classify, say, a million documents, you’re much more likely to get *some* order out of a folksonomy. It wouldn’t be as robust for the searcher as if all the documents were classified using a traditional taxonomy, but the investment required to classify all those documents using a traditional taxonomy becomes pretty overwhelming.

    I’m also personally pretty excited about using folksonomies and taxonomies *together*. I don’t see many barriers to combining the approaches to get a best-of-both-worlds outcome. So for instance, because it’s so easy for users to tag things with free-form keywords, let them do so. A traditional rigorous taxonomy scheme includes “synoynm ring” - basically, just a bunch of synonyms mapped together - why not use that to standardize the tags(i.e. “nptech” = “nonprofit tech”)and potentially aggregate them into buckets or paths that make them more browsable. This is also a great way to test and refine a taxonomy over time - an administrator can plan to look at the tags that don’t map into anything in the taxonomy and decide if the taxonomy should be modified to include them.

  2. on 03 Jan 2007 at 12:50 am Beth

    Gavin,

    Thank you for your brilliant reflection on taxonomy and folksonomy — very useful to have the perspective of a librarian. You have clearly answered one of your own questions for us: What purpose do folksonomies serve? How are they different from taxonomies?

    BTW, the article in D-Lib caused a lot of the “you say taxonomy, I say folksonomy” and “tagging is wonderful, tagging is crap” debate on the Museum Computer Network list a few months back when it was first published. There was also a post from David Weinberger -
    http://www.hyperorg.com/blogger/mtarchive/beneath_the_metadata_a_reply.html
    giving another side to some of the arguments made in the d-lib article — whether or not you agree - but another perspective.

    But anyway, I digress. You raised two really important questions about context, usefulness, and why as related to the NpTech Tag and the quality or there lack of quality of “user-generated content.”

    # What the hell is the “NPTech” tag and how would one use it anyway?

    # What’s good or useful about user-generated content and what’s just entertaining or cute.

    There are excellent questions — what’s the best forum to talk about this?

    Some context. The NpTech tag experiment was started a while back — some of the early discussions can be found here: http://h2obeta.law.harvard.edu/59925

    I don’t believe it was started with the intent of being a formal taxonomy. I think it was an experiment in community tagging — if you find a resource of interest to others in the nonprofit technology that relates to to the emerging web technologies - tag it with nptech. Marnie Webb could speak to that.

    About a year ago, Jilliane Smith wrote a great post about the Tagging and the usefulness of the nptech tag over at netsquared:
    http://www.netsquared.org/blog/jillaine/tagging-for-nonprofits

    More recently, Allan Benamer made some good points about how the use of a Google CSE might be the best approach:
    http://www.netsquared.org/blog/kanter/nptech-google-cse-versus-socialbookmarking-rss-feeds-what-do-you-think

    Having followed the NpTech Tag stream closely now for several months in order to write a summary, I go back and forth between tagging is crap, tagging is wonderful myself.

    There is a huge stream of resources that come chugging through the NpTech Tag stream, particularly if you consume the aggregated feeds from social bookmarking services, technorati, and others. It is disorganized, there are repeats, and you have scan it looking for patterns. It is messy. I also do a lot of retagging of items I find so I can refind them for myself later.

    If you read the nptech blogosphere, listservs, and web sites, not everyone in the nptech space is using the tag. I often come across great blog posts from nonprofit and nonprofit technology folks, but they aren’t tagged with nptech. And, there is also a lot of really useful information that is put out there on various listservs that doesn’t make it into the stream So, if you just followed the aggregated stream, you wouldn’t find them. (When I find great stuff that hasn’t been tagged and I think other people might be interested, I tag it with nptech in delicious account and retag it for myself so I can find it later …)

    I find the NpTech Tag stream useful to get a quick scan of what’s conversation on blogs and what people are reading and thinking about in the nptech space. It helps me think about my work — for example - the time I’ve spent reading your post and thinking about the nptech tag, taxonomy, folksonomy, etc.

    One thing that I value about following particular tag streams and it sort of gets to your point about “Rather than the wisdom of a crowd, I’d recommend the wisdom of a few experts within that crowd.” — is the fact you can identify and follow a subject matter expert’s tag stream. I have found, that, for example, reading Robin Goode’s delicious account tagged items for say, online_collaboration, to be extremeley useful versus following items tagged by many people with that tag.

    That whole question “Why is a Catcus in the NpTech Tag Stream?”
    was raised here:
    http://www.netsquared.org/blog/kanter/nptechtag-summary-why-does-that-cactus-have-a-nptech-tag

    This kind of gets back to your other question about user-generated content - what’s the value or is it just cute. I ask - does it make you think?

    Did it inspire this your thoughtful post?

    Thank you for blogging this!

  3. on 03 Jan 2007 at 1:35 am Beth's Blog

    NpTechTag MetaFeed: 2007 Version 1 - Feed Fixed, Tag Still Broken?

    Allan Benamer gave a shout that the NpTech Meta Feed was broken. The NpTech Meta Feed has been revised and move to here: http://feeds.feedburner.com/Nptech_Tag_MetaFeed_07 I’ve asked Marshall to put a forward on the old feed. But is it fixed? Gavin’s

  4. on 05 Jan 2007 at 11:44 am Marnie Webb

    I’m with Laura: the real power is combining the folksonomies and taxonomies. Tagging, of course, has a personal importance and use (and is the reason people are finally willing to add metadata). It doesn’t have the strict definitions of taxonomy — but that may not be the worst thing. I’ve often had a hard time inside a taxonomonical (is that a word?) system because I didn’t know the *right* word and couldn’t find what I was looking for.

    When the nptech tag started one of the ideas was to gather enough data to look and see what words people were using to describe, say, open source (open source, floss, foss, open source software) and then use those words to inform a taxonomy. It’s a taken a long time but I bet there’s enough data in the nptech tag on a combination of bookmarking systems to do a little crunching and get at some of those commonly used terms. Sort of an emergent taxonomy…

    On a day to day level, I do find the tag useful when I combine it w/ other searches. But mostly because the tag is a proximy for “people in the know”. So I search on cms or outreach or events or jobs in combination with the nptech tag and find that I often get useful results. And, certainly, it’s help me as a news feed to keep abreast of the flow.

    FWIW, I find the technorati tags less useful in this way than the social bookmarking tags.

  5. on 05 Jan 2007 at 6:12 pm Holly

    And for a completely emotional response…I love the sure joy of discovery that folksonomies contribute. Was just reading an article on Slate that articulated perfectly what I love about folksonomies:

    Google Video lets you google videos (of course) by their titles and a brief description of each. Each page links to other matches. That’s OK, but predictable. YouTube lets posters tag each clip themselves. For example, I tagged this clip of my 12 seconds on Good Morning America with “boutin wired slate gma.” Whenever you play a YouTube clip, the page shows a half-dozen potential matches. A tag like “slate” could mean all sorts of things, so each page mixes perfect matches with what-the-huh results. A documentary on Scientology links to a South Park episode, which links to comedian Pablo Francisco. A few clicks later I’m watching some merry prankster get an unexpected smackdown. In Web 2.0-speak, this is a “folksonomy.” In English, it means YouTube is a mix of every video genre imaginable.

  6. on 06 Jan 2007 at 3:33 pm Alf Gracombe

    There are no doubt limitations with folksonomies, just as there are with taxonomies. As consumers and organizers of content, I think we have to manage our own expectations about the value that can be had from either. Folksonomies are far from a silver bullet, but I would argue counter to Gavin’s point a bit and suggest taking a longer view here.

    While at iapps, we developed a web development and hosting platform that relies heavily on the traditional taxonomic model - that of the content experts (the owners of the content of a given web property) providing the categories to which content could be attributed. This is, essentially, tagging of content with pre-defined folksonomies. We deliberately took this approach because, as web site and application designers (and admittedly, having some control-freakish tendencies), we didn’t want the content to get too dispersed and encounter problems where, say, a list of categories would get too long and unwieldy, or a horizontal navigation bar would run off the page and blow the page layout away. I think this speaks to one of Gavin’s points, and in that context, I certainly agree.

    But the types of web sites we were building, mostly public facing web sites for foundations and nonprofits, serve a very different purpose than some of the social software applications today that use folksonomies. Folksonomies are certainly sloppy. But a lot of this sloppiness and the inefficiencies in finding relevant content can be attributed to the embryonic stage of social content organization and consumption that the Internet is currently in. But as the semantic web continues to emerge and the user interface designers and information architects adapt to these new paradigms, I anticipate many improvements in how content is “organized” and consumed on the Web.

    The other point I want to make is this: the big social sites (My Space, Flickr, del.icio.us, etc.) are not tailored to any one specific group of people. They’re for anyone and everyone. The only way to target content to yourself in these environments is through your social connections on these sites and folksonomies/tagging. And as Gavin and others have pointed out, it’s messy and not particularly targeted in many cases. But as more “hyper local”, or location-based, sites and communities of interest/practice sites come online, I foresee a dramatic shift in the ability to catch and consume more relevant content (a good example of this can be found at http://outside.in). And it’s in these environments, where the content is at the outset more targeted, and there’s more content and human organization at a local or topic-based level, that I anticipate deriving more benefits from the folksonomy model.

  7. on 06 Jan 2007 at 8:10 pm Beth

    Gee Gavin, I wonder how all these people found their way to this post on your blog and left a comment? Looks like a few people has linked to it too:
    http://www.technorati.com/search/digitaldiner.typepad.com%2F
    Have your other posts have this many comments? Do you think, gasp, that has anything to do with the tagging it NPTechTag?

  8. on 07 Jan 2007 at 9:14 am Kevin

    Is there any living, breathing example of a taxonomic approach working (scaling) to keep-up with the hyper-efficiency we see in peer-production systems? I’m being quite serious here. Can you point me to a working model.

  9. on 07 Jan 2007 at 1:41 pm Beth's Blog

    NpTech Tag Cross Blog Discussion: What do those guidelines look like?

    The Cross Blog Discussion of the NpTechTag has generated some comments and blog posts that I’ve summarized below. Let’s begin with big picture question that Gavin raised: What purpose do folksonomies serve? How are they different from taxonomies? Gavin…

  10. on 08 Jan 2007 at 9:09 am Laura Quinn

    To Kevin’s question, about examples of large scale taxonomies…. This all depends on how large is large in your mind, and how efficient is effecient. There’s no question in my mind that a taxonomy can’t effectively scale to categorize, say, the entire internet, and that it can’t beat the efficiecy of having a bunch of people tag things for free. These qualities are a tradeoff, though, against *comprehensiveness* and *accuracy*. Take for instance, http://www.gettyimages.com, my favority taxonomy example (it’s a stock photo). They have a incredibly successful taxonomy, working against a collection of at least several hundred thousand photos. Do some searches (say “family playing without dad”) - it’s truely an amazingly effective tool, geared towards people who need to find exact and effective things quickly. Expensive to maintain? No doubt. But they’ve prioritized effectiveness.

    In my mind, tagging is great for browsing and for scalability, and it’s much easier to implement. But (good) taxonomies are far superior when people need to quickly see comprehensive results, and can be really helpful in situations where you don’t have critical mass for tagging.

    To Marnie’s comment about taxonomies being hard to use because you don’t know the “right” word to use - that to me is a downside of a *bad* taxonomy, not the approach in general. To my mind, a huge upside of a good taxonomy (carefully created and maintained over time) is that it will map like terms together so you can use the terms that make sense to you rather than having to channel the tagger or taxonomist in order to find things.

  11. on 08 Jan 2007 at 11:20 am Marnie Webb

    @Laura: I agree that not knowing the right word is a problem of a bad taxonomy — and one not maintained over time — but it’s tough to have a really good and thoroughly indexed taxonomy if you don’t have a very high level of skill. I think, though, that a folksonomy can help with is this kind of maintenance. The tools aren’t there yet but it’s possible to imagine tagging systems as data sources that help to create indices, merging different phrases, etc.

  12. on 09 Jan 2007 at 9:26 am Reed Stockman

    My comments are here
    http://afprc11.blogspot.com/2007/01/beths-blog-cross-blog-discussion.html
    Beth Kanter has some interesting discussion from a variety of folks using the NPTECH tag on her blog.
    My short answers to her questions(See Below) are as follows
    1. We use the NPTech tag to highlight news articles that we’ve seen that would also fit the evolving criteria for our AFP Nonprofit Technology News Blog.
    2. Yes we subscribe to the feed to find resources and also to network. We would not however post the feed itself but instead use it as a winnowing device via our aggregator with the additional step of “human” filter.

    3. Yes we read the summaries as well as selected posts. To date we skim the postings as opposed to sub searching( not sure if this is a real term) the tag.

    4. We tag for both promotion and outreach. As a more general comment we also tag to categorize for future retrival (cataloging) of the material by topic. We find ourselves regularly challenged by the quandry of “broad” vs. “specific” tagging.

  13. on 10 Jan 2007 at 7:13 am Kevin

    Thank you Laura for pointing me to the Getty site. That is an amazing search. After trying your suggested search I tried “peeled oranges”. Very nice.

    My point was not that taxonomies don’t work. The particular problem that I am addressing actually has an existing thesaurus, and even a hierarchical tool for determining the correct terms. It’s very usable by professionals. It is totally unusable by authors or their peers. Could the tool be better? I’d like to think so, but that was my original question. I’d like to see one. I’d love nothing more than to slip behind the curtain and see what Getty is using.

    BTW, I have nothing against using professional catalogers, but modern KM screams for disintermediation. As soon as you put middle-people in the process you add incredible cost and unacceptable lag. Folksonomies and user generated tagging, where not perfect squarely addresses these two issues. Is it good enough? That remains to be seen.

    Would the ESP Game put the Getty catalogers out of a job? http://jonschull.blogspot.com/2006/12/human-computation-google-video.html

  14. on 18 Jan 2007 at 4:41 pm Margaret Rouse

    Gavin,
    I LOVE this analogy. It completely cracked me up. May I quote you and link to this article?

    “On the other hand, imagine your emergency room surgeon is consulting a wiki of medical protocols that used a folksonomic classification system…”

Comments RSS

Leave a Reply