jMar"s Blog DevSmash Developer Portal

Friday, March 7, 2008

Efficient Tag Cloud Algorithm

Hey all - since tag clouds are so popular these days, I thought I'd make a quick post about the algorithm I'm using to generate my tag cloud. I found a few sample algorithms already out there, but many of them seemed either inefficient, or didn't return the size that I was expecting.

Considering that the algorithm I'm using for my blog is running on the client side, I wanted to make sure that it was especially efficient. Here's what I ended up coming up with.

First, let's define the variables that are being used:

maxPercent
The font size is set as a percentage. This is the font-size percentage that the largest (most frequent) tag should be set to.
minPercent
This is the font-size percentage that the smallest (least frequent) tag should be set to.
max
This is the number of occurences for the most frequent tag.
min
This is the number of occurences for the least frequent tag.
count
This variable should be set inside of the link iterator. It refers to the number of occurences for the current tag.

Before I show some sample code for this, let's look at the actual algorithm for calculating the size:

var size = minPercent + ((max-(max-(count-min)))*(maxPercent-minPercent)/(max-min));

Since count is the only variable that changes during each iteration, we can extract the expression (maxPercent-minPercent)/(max-min) into its own constant and define it outside of the loop. This will save us two subtractions and a division operation for every loop. This now yields the following algorithm:

var multiplier = (maxPercent-minPercent)/(max-min);
var size = minPercent + ((max-(max-(count-min)))*multiplier);

The mathematical brilliance (sorry) behind this simple function isn't immediately obvious by looking at it, but the size returned will adhere to the following rules:

  1. The least occuring tag(s) will have a font-size of minPercent.
  2. The most occuring tag(s) will have a font-size of maxPercent.
  3. Tags with occurence counts in the middle will be scaled linearly.

Ok, let's look at this algorithm in action. The sample below uses a few jQuery methods (you all saw it coming), but this algorithm can certainly survive apart from any JS library. Notice that in the code below, I don't know min or max at the start. This means I have to loop through all of the tags twice, calculating these statistics the first time.

To give this code a real life context, I am generating a tag cloud out of the categories list that blogger generates. The markup that I am looping through is in the following format:

<div class="widget-content" id="categories">
 <ul>
  <li>
   <a dir="ltr" href="http://jmar777.blogspot.com/search/label/jQuery">jQuery</a>
   (<span dir="ltr">6</span>)
  </li>
  <li>
   <a dir="ltr" href="http://jmar777.blogspot.com/search/label/jTruncate">jTruncate</a>
   (<span dir="ltr">2</span>)
  </li>
  [etc...]
 </ul>
</div>

And here's the heavily commented javascript:

$().ready(function() {
 var catContainer = $('div#categories');
 // get an array of all the <li>'s
 var categories = catContainer.find('ul li');
 var cloudMarkup = '';
 // set maxPercent/minPercent preferences
 var maxPercent = 150, minPercent = 100;
 // note that max is initialized to a number that I know is lower than the max count
 // and that min is set to a number larger than the known min count
 var max = 1, min = 999, count = 0;
 // loop through each li and calculate statistics
 categories.each(function(i) {
  count = parseInt($(this).find('span').text());
  max = (count > max ? count : max);
  min = (min > count ? count : min);
 });
 var total, link, size;
 var multiplier = (maxPercent-minPercent)/(max-min);
 // loop through each li and generate the markup for the new tag cloud
 categories.each(function(i) {
  count = parseInt($(this).find('span').text());
  link = $(this).find('a');
  size = minPercent + ((max-(max-(count-min)))*multiplier) + '%';
  cloudMarkup += '<a class="cloud_link" style="font-size:' + size + '" href="' + link.attr('href') + '">' + link.text() + '</a> ';
 });
 // replace the html content of the parent container with the new tag cloud
 catContainer.html(cloudMarkup);
});

For you jQuery users out there - I am considering turning this into a plugin. I would like to hear from you what the expected markup should look like. For example, <a href="#" rel="[count]">tag</a> or <a href="#">tag ([count])</a>. Hopefully we can come up with something that doesn't completely violate the purpose of the rel tag, like that example! Thanks for your input!



42 comments:

Anonymous said...

Wonderful and a jquery plugin would be awesome.I personally feel you can choose any markup that makes things simple for you.

Anonymous said...

Why give it to the plugin users choice and let him define via plugin constructor the way he wants his tag to be.

Just to think for. It could not be this vast effort to implement it this way.

Best regards,
Skyborg

Jeremy Martin said...

I agree that it would be ideal to let the user determine the markup - just not sure yet on how to let them specify that in a flexible way...

Anonymous said...

While you're optimising your loop why don't you simplify
((max-(max-(count-min)))

to just

(count - min)?

Or do you have a typo in there somewhere?

Jeremy Martin said...

@Anonymous
Since count is updated during each iteration, I can't extract any expressions that contain count outside of the loop.

Anonymous said...

I would be very happy, if you could add an alternative routine to get the data for the cloud.

Like a text-file or data from JSON. The reason is, that not all people who use jQuery use it for HTML ;-)
Personally I use it for XUL (Firefox extension, etc.) as well.

Right now I am trying to create a GTD add-on for an application based on Mozilla and thus I am in need for a tag-clound jQuery plugin, that reads the tags from a database and then creates the tag-cloud from it.

Ken said...

Jeremy--

Thanks for this post. I had the same thought as the March 21 anonymous regarding the ((max-(max-(count-min))). You said efficiency was a concern, so here's my two bits on optimzing the size calculation in the loop:

Just distribute the minuses out to the terms in the parenthesis, and you'll see it works out, even though "count" changes on each iteration. Here it is step-by-step (perhaps a little more verbose than needed).

a) ((max - (max - (count - min)))
b) ((max - (max - count + min))
c) (max - max + count - min)

"C" will give you the exact same result as "A", and since the "max-max" cancels out, you can further simiplify it to be:

d) count - min

That leaves your code as the following:

var multiplier = (maxPercent-minPercent)/(max-min);
var size = minPercent + (count-min)*multiplier;

The nice thing about the above code is that math-minded folks will instantly recognize you're doing a linear interpolation of "count" between two numbers ("maxPercent" and "minPercent"). That was a bit obscured in the original code.

With a bit of explanation to someone of what interpolation is, you can even easily visualize in your minds-eye what is happening in the above code.

This is the easiest to understand version of the code, but you can introduce another variable optimize the loop further:

a) var size = minPercent + (count-min)*multiplier;
b) var size = minPercent + count*multiplier - min*multiplier;
c) var size = (minPercent - min*multiplier) + count*multiplier;
d) var offset = minPercent - min*multiplier; var size = offset + count*multiplier;

with "offset" being calculated outside of the loop (note that no terms in "offset" use "count" so this is safe to do). This means that inside of the loop we have 1 multiplication and 1 addition, whereas it used to be 4 additions/subtractions and 1 multiplication. Notice that the only things we're pulling out of the loop are the constants. Anything involving "count" stays in the loop.

This leaves your code as the following

var multiplier = (maxPercent-minPercent)/(max-min);
var offset = minPercent - min*multiplier;
var size = count*multiplier + offset;


A little harder to visualize what's going on in this version, but it is fine with a comment.

Try this; it will work. What your algorithm is basically doing is a linear interpolation.

Jeremy Martin said...

@Ken
Thanks for that very thorough explanation! I guess I was too hasty in my response to the 3/21 comment. Didn't we learn that stuff in like 5th grade or something...? I'll add this to my growing list of updates - too many smart people correcting me, not enough time...

jacob said...

Your algorithm has a problem if all of the tags have exactly the same number of occurrences. In this case max=min and you would get a divide by zero error.

Jeremy Martin said...

@Jacob
Uh... so it does. I swear, this post is taking me to school... Thanks for pointing that out!

rapin said...

This is great. Playing with it now. As for markup, maybe something simple like class="cloudable" rel="rank"?

I Personally prefer the rank/weight indicator to be in an attrib.

ege said...

Thank you very much for the article. It took me about 25 mins to apply it, not on the client side but server side using php. Your article is the simplest tutorial that I have found on the net. Special thanks go to Ken, as I used his refined version.

terminals-blocks said...

Just to think for. It could not be this vast effort to implement it this way.

webbod said...

I liked the idea, but I have a large set of data, the distribution is such that I have a lot of rare tags and a few very common ones - with hardly anything in between - I'm looking at product sales from a large inventory.

Some products sell 1 unit, others 50, 100, 500, 1000, etc... but one product sells 250,000 which means your algorithm gives me a few enormous tag and maps everything else down to the mininium size.

I need to be able to pull out the detail at both ends of the scale.

Using Log10(Count) worked really well and produces really compelling sales charts.

Thanks for the idea - 10 minute metrics - Mwhahahaha!

toothygoose said...

Here is a mootools/functional programming variant. Tidies away all the variables into its own namespace:


var TagCloud = {
apply: function(selector, options) {
options = options || {};

var maxPercent = options['maxPercent'] || 150;
var minPercent = options['minPercent'] || 100;
var retrieveCount = options['retrieveCount'] || function(element){element.getAttribute('count').toInt()};
var apply = options['apply'] || function(element, size){element.setStyle('font-size', size + '%')};

var max = null;
var min = null;

var tagElements = $$(selector);

tagElements.each(function(element)
{
count = retrieveCount(element);
max = (max == null || count > max ? count : max);
min = (min == null || min > count ? count : min);
});

var multiplier = (maxPercent-minPercent)/(max-min);

tagElements.each(function(element)
{
count = retrieveCount(count);
size = (minPercent + (count - min) * multiplier);
apply(element, size);
});
}
};

TagCloud.apply('ul#tags > li');

toothygoose said...

Whoops...

make that:

var retrieveCount = options['retrieveCount'] || function(element){return element.getProperty('count').toInt()};

Keep forgetting that the return statement is required in JavaScript. Lazy!

Anonymous said...

big think it was exactly what I searched, with a min and max!

I did it in PHP but will perhaps look after to make in JQuery because it's display work

think again!

nodejs said...

Nice Plug-in and demonstration!

Joseph Buarao said...

great demonstration... can i use this to my premium template?

Jonas said...

Thanks! I have been trying to come up with a good way to do this for a long time.

Jerry M. Mingus said...

Excellent information to many people like to read articles to learn about these issues of great interest.Oups... my blog disappeared from Les Explorers list ; )So many people leave a comment in your blog, For this we can know that your blog is very Wonderful,Cheers for your performance, I hope you continue to post such good articles.
clipping path

digital signature said...

Excellent! Thanks for this - I've been looking at this feature for ages. Followed your instructions and it works a treat!

Martin001 said...

Nonetheless, display systems designed for ordinary text message aren't perfect for tags, since font measurements can vary commonly on a collection. Also, the normal structure doesn't account for romantic relationships which might be acknowledged involving tags. This kind of papers reveals products and algorithms to boost the particular display involving label confuses that incorporate in-line HTML, together with algorithms that utilize nested dining tables to attain a more standard 2-dimensional structure through which label from write my paper for cheap romantic relationships are considered. The primary algorithms influence before perform with typesetting and rectangle packaging, in contrast to the second band of algorithms influence before perform with Electric Style Automation.

thesis writing services said...

i agree with you since i had such problem while installing some plugin..

best essay writing services said...

wow excellent..i'm a fresher in html and development field.. custom essay writing services

Stephanie said...

Your blog has been a great help to me moving towards a paperless lifestyle. I’m still refining and trying to get it just right but I feel I’m making progress. I just wanted to let you know about my services I’ve been using that have been helping me with this endeavor.

Working at best online mba programs

Jennifer said...


My compliment about this blog is very positive I visit this blog first time and impress by this stuff nice work. Great Post Keep Posting such a good information.

college paper writing service

Forte School said...

Forte schools offer a FREE trial lessons for classes and some offer a Free Trial private lesson.

aliya seen said...

Math is always my favorite subjects becasue of excellen teachers i have. The teaching statement math is also very important for students nd teachers also.

Badan Singh said...

סדנת מכירות קורס מכירות וסדנת מכירות להכפלת המכירות עם אימון מכירות מקצועי קורס/הדרכת/סדנת מכירות המבוסס על ידע של 100 אנשי המכירות הטובים בעולם, קורס אומנות המכירה המעשי להזנקה מיידית של המכירות.

Badan Singh said...

Escort Geneva Genève Sex Switzerland Escort service agency, find nice and beautiful girl in Geneva!

Badan Singh said...

deutsch porno Die kleine Schlampe lässt sich kostenlos auf dem Sofa durchficken und das Beste daran ist, dass du keinen Cent dafür bezahlen musst um dir das heisse kostenlose deutsch porno anzuschauen.

turkpornotv said...

Deutsche Porno, Klicken Sie sich durch unsere riesigen Archive mit kostenlosen Pornos und gratis Pornofilme. Sofort Pornofilme gratis ansehen.

klicke hier : deutsche porno

turkpornotv said...

HD ve mobil uyumlu binlerce porno ve sikiş izleten site, tek kalite altında bu kadar videoya ulaşmak artık daha kolay.. porno izle


writing essays for cheap said...

Really great, jquery plugin is just awesome. And expecting more from you, and waiting for that. Thank you for sharing :)

Shiny Elena said...

Wow, you do understand this and you wrote very nice post. I got used to read reviews about the best US resume companies and to tell the truth that is the thing I understand better than the topic of your post. Anyway, thanks for great content.

Badan Singh said...

gigolo Elite High-class Escorts and Boys are waiting to be contacted by you. Find your ideal VIP Escort date here and enjoy unforgettable moments. Busty luxury and classy escorts and models for Gentlemen that will surpass your high expectations. Sex-Escort is an international directory of Independent Girls and Ladies and Escort Agencies, only elite companions who are perfect for any social engagement public or private, from blondes to brunette, Asiatic or Italian, French or Russian. You can select any exciting city in the World and find the ideal female or male independent escort, be that Paris, Rome, New York, London or Tokyo.

Badan Singh said...

umproducts If you are serious about Penis Enlargement, then the new Hydromax is the pump for you. 15% more powerful than the original hercules! The X30 Xtreme takes your hydrotherapy workout to a whole new level. It's our most powerful pump to date with the same unique globally patented design, and comes with many additions. The bathmate Hydromax X30 is our best seller at 20-30% off Retail price, you cant beat it. Order Today and gain 1 inch in 1 Week with the bathmate hydromax x30, the hercules original or the new bathmate Xtreme ! USE CODE "10OFFMYORDER" and save $10 !!

Edna Shirley said...

Writing services place an unavoidable role in every student's life. They are preparing almost writing tasks of the students from various level. Not only essays they also offering case study essay writing service at affordable price. So that students can easily submit their case studies effectively at cheap rate.

essay writing service USA said...

I am so glad to be given a chance to read your wonderful article. I am looking forward to read more of your works and posts. keep on sharing.

David Clarke said...


When you get a college task that you cannot complete by oneself it seems to be really frustrating! Don't panic! Perfect essays online service gives you a cool chance to remove your college stress! Our writers will complete any type of the college task really quickly and you won't be sad!

Admission essay writing service said...

Good to read! Thanks for describing the topic in an impressive and convincing manner. Please keep on sharing!