Hey all - since tag clouds are so popular these days, I thought I'd make a quick post about the algorithm I'm using to generate my tag cloud. I found a few sample algorithms already out there, but many of them seemed either inefficient, or didn't return the size that I was expecting.
Considering that the algorithm I'm using for my blog is running on the client side, I wanted to make sure that it was especially efficient. Here's what I ended up coming up with.
First, let's define the variables that are being used:
Before I show some sample code for this, let's look at the actual algorithm for calculating the size:
var size = minPercent + ((max-(max-(count-min)))*(maxPercent-minPercent)/(max-min));
Since count is the only variable that changes during each iteration, we can extract the expression (maxPercent-minPercent)/(max-min) into its own constant and define it outside of the loop. This will save us two subtractions and a division operation for every loop. This now yields the following algorithm:
var multiplier = (maxPercent-minPercent)/(max-min); var size = minPercent + ((max-(max-(count-min)))*multiplier);
The mathematical brilliance (sorry) behind this simple function isn't immediately obvious by looking at it, but the size returned will adhere to the following rules:
- The least occuring tag(s) will have a font-size of minPercent.
- The most occuring tag(s) will have a font-size of maxPercent.
- Tags with occurence counts in the middle will be scaled linearly.
Ok, let's look at this algorithm in action. The sample below uses a few jQuery methods (you all saw it coming), but this algorithm can certainly survive apart from any JS library. Notice that in the code below, I don't know min or max at the start. This means I have to loop through all of the tags twice, calculating these statistics the first time.
To give this code a real life context, I am generating a tag cloud out of the categories list that blogger generates. The markup that I am looping through is in the following format:
<div class="widget-content" id="categories"> <ul> <li> <a dir="ltr" href="http://jmar777.blogspot.com/search/label/jQuery">jQuery</a> (<span dir="ltr">6</span>) </li> <li> <a dir="ltr" href="http://jmar777.blogspot.com/search/label/jTruncate">jTruncate</a> (<span dir="ltr">2</span>) </li> [etc...] </ul> </div>
And here's the heavily commented javascript:
$().ready(function() { var catContainer = $('div#categories'); // get an array of all the <li>'s var categories = catContainer.find('ul li'); var cloudMarkup = ''; // set maxPercent/minPercent preferences var maxPercent = 150, minPercent = 100; // note that max is initialized to a number that I know is lower than the max count // and that min is set to a number larger than the known min count var max = 1, min = 999, count = 0; // loop through each li and calculate statistics categories.each(function(i) { count = parseInt($(this).find('span').text()); max = (count > max ? count : max); min = (min > count ? count : min); }); var total, link, size; var multiplier = (maxPercent-minPercent)/(max-min); // loop through each li and generate the markup for the new tag cloud categories.each(function(i) { count = parseInt($(this).find('span').text()); link = $(this).find('a'); size = minPercent + ((max-(max-(count-min)))*multiplier) + '%'; cloudMarkup += '<a class="cloud_link" style="font-size:' + size + '" href="' + link.attr('href') + '">' + link.text() + '</a> '; }); // replace the html content of the parent container with the new tag cloud catContainer.html(cloudMarkup); });
For you jQuery users out there - I am considering turning this into a plugin. I would like to hear from you what the expected markup should look like. For example, <a href="#" rel="[count]">tag</a> or <a href="#">tag ([count])</a>. Hopefully we can come up with something that doesn't completely violate the purpose of the rel tag, like that example! Thanks for your input!
20 comments:
Wonderful and a jquery plugin would be awesome.I personally feel you can choose any markup that makes things simple for you.
Why give it to the plugin users choice and let him define via plugin constructor the way he wants his tag to be.
Just to think for. It could not be this vast effort to implement it this way.
Best regards,
Skyborg
I agree that it would be ideal to let the user determine the markup - just not sure yet on how to let them specify that in a flexible way...
While you're optimising your loop why don't you simplify
((max-(max-(count-min)))
to just
(count - min)?
Or do you have a typo in there somewhere?
@Anonymous
Since count is updated during each iteration, I can't extract any expressions that contain count outside of the loop.
I would be very happy, if you could add an alternative routine to get the data for the cloud.
Like a text-file or data from JSON. The reason is, that not all people who use jQuery use it for HTML ;-)
Personally I use it for XUL (Firefox extension, etc.) as well.
Right now I am trying to create a GTD add-on for an application based on Mozilla and thus I am in need for a tag-clound jQuery plugin, that reads the tags from a database and then creates the tag-cloud from it.
Jeremy--
Thanks for this post. I had the same thought as the March 21 anonymous regarding the ((max-(max-(count-min))). You said efficiency was a concern, so here's my two bits on optimzing the size calculation in the loop:
Just distribute the minuses out to the terms in the parenthesis, and you'll see it works out, even though "count" changes on each iteration. Here it is step-by-step (perhaps a little more verbose than needed).
a) ((max - (max - (count - min)))
b) ((max - (max - count + min))
c) (max - max + count - min)
"C" will give you the exact same result as "A", and since the "max-max" cancels out, you can further simiplify it to be:
d) count - min
That leaves your code as the following:
var multiplier = (maxPercent-minPercent)/(max-min);
var size = minPercent + (count-min)*multiplier;
The nice thing about the above code is that math-minded folks will instantly recognize you're doing a linear interpolation of "count" between two numbers ("maxPercent" and "minPercent"). That was a bit obscured in the original code.
With a bit of explanation to someone of what interpolation is, you can even easily visualize in your minds-eye what is happening in the above code.
This is the easiest to understand version of the code, but you can introduce another variable optimize the loop further:
a) var size = minPercent + (count-min)*multiplier;
b) var size = minPercent + count*multiplier - min*multiplier;
c) var size = (minPercent - min*multiplier) + count*multiplier;
d) var offset = minPercent - min*multiplier; var size = offset + count*multiplier;
with "offset" being calculated outside of the loop (note that no terms in "offset" use "count" so this is safe to do). This means that inside of the loop we have 1 multiplication and 1 addition, whereas it used to be 4 additions/subtractions and 1 multiplication. Notice that the only things we're pulling out of the loop are the constants. Anything involving "count" stays in the loop.
This leaves your code as the following
var multiplier = (maxPercent-minPercent)/(max-min);
var offset = minPercent - min*multiplier;
var size = count*multiplier + offset;
A little harder to visualize what's going on in this version, but it is fine with a comment.
Try this; it will work. What your algorithm is basically doing is a linear interpolation.
@Ken
Thanks for that very thorough explanation! I guess I was too hasty in my response to the 3/21 comment. Didn't we learn that stuff in like 5th grade or something...? I'll add this to my growing list of updates - too many smart people correcting me, not enough time...
Your algorithm has a problem if all of the tags have exactly the same number of occurrences. In this case max=min and you would get a divide by zero error.
@Jacob
Uh... so it does. I swear, this post is taking me to school... Thanks for pointing that out!
This is great. Playing with it now. As for markup, maybe something simple like class="cloudable" rel="rank"?
I Personally prefer the rank/weight indicator to be in an attrib.
Thank you very much for the article. It took me about 25 mins to apply it, not on the client side but server side using php. Your article is the simplest tutorial that I have found on the net. Special thanks go to Ken, as I used his refined version.
Just to think for. It could not be this vast effort to implement it this way.
I liked the idea, but I have a large set of data, the distribution is such that I have a lot of rare tags and a few very common ones - with hardly anything in between - I'm looking at product sales from a large inventory.
Some products sell 1 unit, others 50, 100, 500, 1000, etc... but one product sells 250,000 which means your algorithm gives me a few enormous tag and maps everything else down to the mininium size.
I need to be able to pull out the detail at both ends of the scale.
Using Log10(Count) worked really well and produces really compelling sales charts.
Thanks for the idea - 10 minute metrics - Mwhahahaha!
Here is a mootools/functional programming variant. Tidies away all the variables into its own namespace:
var TagCloud = {
apply: function(selector, options) {
options = options || {};
var maxPercent = options['maxPercent'] || 150;
var minPercent = options['minPercent'] || 100;
var retrieveCount = options['retrieveCount'] || function(element){element.getAttribute('count').toInt()};
var apply = options['apply'] || function(element, size){element.setStyle('font-size', size + '%')};
var max = null;
var min = null;
var tagElements = $$(selector);
tagElements.each(function(element)
{
count = retrieveCount(element);
max = (max == null || count > max ? count : max);
min = (min == null || min > count ? count : min);
});
var multiplier = (maxPercent-minPercent)/(max-min);
tagElements.each(function(element)
{
count = retrieveCount(count);
size = (minPercent + (count - min) * multiplier);
apply(element, size);
});
}
};
TagCloud.apply('ul#tags > li');
Whoops...
make that:
var retrieveCount = options['retrieveCount'] || function(element){return element.getProperty('count').toInt()};
Keep forgetting that the return statement is required in JavaScript. Lazy!
big think it was exactly what I searched, with a min and max!
I did it in PHP but will perhaps look after to make in JQuery because it's display work
think again!
Nice Plug-in and demonstration!
great demonstration... can i use this to my premium template?
Thanks! I have been trying to come up with a good way to do this for a long time.
Post a Comment