Alex Barnett blog

Stuff

The Long Tail of Tags

I made an observation the other day, that then led me to another and then another. Perhaps these are entirely obvious to you but I hadn't previously made the connection between the tags I use, their frequency in my tagcloud and Chris Anderson's 'the Long Tail' theory. Doing a quick search on the web, I haven't found anything specific to this topic, so I thought I share what I found.

First, let's start with the classic tagcloud. Here's a pic of all the tags I've used at del.icio.us:

As per the standard tagcloud visual representation, the size of each tag represents the relative frequency of the tags I've used - the larger the size of the tag, the more I have used that tag relative to another tag in my 'tagcloud'. Sized tagclouds can be a helpful navigational device, providing a view into the distribution of 'interest' about things. So if you look at my tagcloud, you can get a feel of what interests me.

What I had a hunch about, and confirmed via the graphing, is that my interests seem to follow the classic Long Tail / powercurve.

My Long Tail of Tags

On to the data...

  • I have tagged 681 'articles' (URLs)
  • I have used 386 tags
  • The most used used tag is 'RSS'.
    • I've tagged 139 'articles' with the 'RSS' tag, around 36%.
  • I've used Atom tag 4 times, or 1% (btw, I expect to tag more stuff with Atom as my interest in that topic is on the increase)
  • I have tagged 25 articles with the tag 'microformats'

So, I threw in my del.icio.us tag data into a spreadsheet - all the tags I have used and their frequency (shown in the tagcloud above)  - and then sorted the list by descending order (most used tags as the top), charted and added a logarithmic trendline. This is what I saw:

Each tag is listed along the horizontal axis and their frequency is represented along the vertical, so the tags most used are on the left ('RSS' tag starts the series).

Lo and behold, athe Long Tail appears once more!

This is more than a variation on the theme of the Long Tails of language (Zipf's observation that the frequency of words used in the English language followed a powerlaw distribution) and words - this is the Long Tail of my interests as represented by tags.

I tag stuff of interest to me > my tags express my interests > the distribution of my tags express the distribution of my interests > My mind is a powerlaw!

And if you tag a lot, yours probably is too...try it.

Laws of the Long Tail of Tags

So based on the above, I propose the first two Laws of the Long Tail of Tags:

1. the frequency of a tags used by any user who is not required to follow a formalized taxonomy will follow a Long Tail powercurve disribution

and therefore

2. any tag tagged by a user has an >80% chance of being in that user's 'head' of their tagcloud Long Tail

Kind of obvious if you think about it, I suppose. But it hadn't occurred to me until I thought about tags in Long Tail terms.

Now for another Long Tail in tagspace...

Let's take the Long Tail article published in Wired. Around 1,500 users have bookmarked the article in del.icio.us using all sorts of tags. I looked for the number of tags used by all the users who tagged the article and see of there was a powerlaw there too.

Unfortunately I haven't found a way to find all the tags used by all users for the article - I can only get the top 25 (the limit defined by del.icio.us...if anyone knows how to get the rest of the tags please let me know):

I graphed the above data and included some hypothetical data:

If there is a Long Tail here too, and I'm sure there is, what would that mean? And how do an item's tag distribution relate to behavior of their own tagclouds?

We know already know in that the process of lots of people tagging stuff a collective agreement emerges about how things should be tagged. The popular tags used to categorize an article live at the 'head of the tail'. We can also assume the tags that appear in 'tail' of the Long Tail itself show how an article means different things to different people. But what of their relevance in terms of 'importance' to those taggers?

Looking through the data relating to the entire bookmaking history of the Wired article by all users on del.icio.us, there are tags at the 'tail' that are not listed in the top 25 tags. Examples are

  • 'collaboration'
  • 'strategy'
  • 'amazon'
  • 'retail'

I chose to follow the link of one of the users who tagged the article with the 'collaboration' tag and went to their tagspace on del.icio.us. And there it was...The 'Collaboration' tag was the most used tag by the user called David Kato, almost the only one who tagged the article with the 'collaboration' tag. So I threw David's tagcloud into my spreadsheet...

Below is David's Long Tail of Tags. I'll point out here that he has tagged 168 items, using 72 tags - so it's not a large data set and therefore not seeing a very smoothed out curve here. However, I propose that over time his tag distribution will look more like the classic Long Tail shape we're looking for:

Recommendation Networks 

So what does that mean? Again, this maybe quite obvious to you, but this seems pretty interesting. What it says to me at least is that these Long Tails of individual minds are strongly and potentially algorithmically correlated to the Long Tails of taggers' collective efforts.

Looking at the tagging data in this way (and without any use of fancy algorithms) we can see the inherent potential of using tagging as a basis for collaborative filtering and recommendation systems. Based on the the simple and unscientific analysis I've done here, it appears that the world of tagging holds related Long Tail networks everywhere.

In other words, tagware =  natural Recommendation Networks

-

Other Tag related posts of mine (on my old blog):

Comments

Hashim said:

VERY interesting.

This shows that there is a long tail of interest in every particular item. Most are interested in the Wired Long Tail article for "business" and "collaboration" but there is also a long tail of interest, that even stretches to collaboration.

How should that effect marketing of products? A horror movie is typically marketed fans of the genre, actors, and director, but what about the long tail of people interested in the movie because of the location, the supporting actors, the extras, down to the costume designer, and best boy grip.

This isn't theroy. I once saw a movie a had little interest in (that is, no interest in the genre, actors, or director) because my girlfriend was the assistant costume designer. Any recommendation system that is aware of the long tail of interests  should tell me about all the other movies that she works on. There are only a dozen people who would see the movie for that reason, but it is enough to make it appear in the long tail.

All creators need to make as much information about their products available as possible so that a long tail of interests aggregator can make it available to the right audience.

Thanks so much for this analysis. My mind is brimming with ideas about this right now.

# September 17, 2006 12:13 AM

taylor parsons said:

This is a pretty interesting observation on your part Alex.  Thanks for taking the time to explain it to us.  

# September 17, 2006 11:13 PM

Korby Parnell's CodePlex Wunderkammer said:

In his recent post, The Long Tail of Tags, my compadre Alex Barnett contends that, "Looking...

# September 19, 2006 12:19 AM

TrackBack said:

# September 22, 2006 12:59 PM

TrackBack said:

These themes keep on popping up -- niche vs the whole; individual vs group...
# September 22, 2006 8:28 PM

TrackBack said:

Fascinating post on tags and how it applies to some of the concepts behind the Long Tail. 
# September 23, 2006 12:28 AM

BillyG said:

Nice post Alex. I started on this a few months ago but never got as far as you did, I need to get back to this... very nice!

# September 26, 2006 5:03 AM