jMar"s Blog DevSmash Developer Portal

Friday, March 7, 2008

Efficient Tag Cloud Algorithm

Hey all - since tag clouds are so popular these days, I thought I'd make a quick post about the algorithm I'm using to generate my tag cloud. I found a few sample algorithms already out there, but many of them seemed either inefficient, or didn't return the size that I was expecting.

Considering that the algorithm I'm using for my blog is running on the client side, I wanted to make sure that it was especially efficient. Here's what I ended up coming up with.

First, let's define the variables that are being used:

maxPercent
The font size is set as a percentage. This is the font-size percentage that the largest (most frequent) tag should be set to.
minPercent
This is the font-size percentage that the smallest (least frequent) tag should be set to.
max
This is the number of occurences for the most frequent tag.
min
This is the number of occurences for the least frequent tag.
count
This variable should be set inside of the link iterator. It refers to the number of occurences for the current tag.

Before I show some sample code for this, let's look at the actual algorithm for calculating the size:

var size = minPercent + ((max-(max-(count-min)))*(maxPercent-minPercent)/(max-min));

Since count is the only variable that changes during each iteration, we can extract the expression (maxPercent-minPercent)/(max-min) into its own constant and define it outside of the loop. This will save us two subtractions and a division operation for every loop. This now yields the following algorithm:

var multiplier = (maxPercent-minPercent)/(max-min);
var size = minPercent + ((max-(max-(count-min)))*multiplier);

The mathematical brilliance (sorry) behind this simple function isn't immediately obvious by looking at it, but the size returned will adhere to the following rules:

  1. The least occuring tag(s) will have a font-size of minPercent.
  2. The most occuring tag(s) will have a font-size of maxPercent.
  3. Tags with occurence counts in the middle will be scaled linearly.

Ok, let's look at this algorithm in action. The sample below uses a few jQuery methods (you all saw it coming), but this algorithm can certainly survive apart from any JS library. Notice that in the code below, I don't know min or max at the start. This means I have to loop through all of the tags twice, calculating these statistics the first time.

To give this code a real life context, I am generating a tag cloud out of the categories list that blogger generates. The markup that I am looping through is in the following format:

<div class="widget-content" id="categories">
 <ul>
  <li>
   <a dir="ltr" href="http://jmar777.blogspot.com/search/label/jQuery">jQuery</a>
   (<span dir="ltr">6</span>)
  </li>
  <li>
   <a dir="ltr" href="http://jmar777.blogspot.com/search/label/jTruncate">jTruncate</a>
   (<span dir="ltr">2</span>)
  </li>
  [etc...]
 </ul>
</div>

And here's the heavily commented javascript:

$().ready(function() {
 var catContainer = $('div#categories');
 // get an array of all the <li>'s
 var categories = catContainer.find('ul li');
 var cloudMarkup = '';
 // set maxPercent/minPercent preferences
 var maxPercent = 150, minPercent = 100;
 // note that max is initialized to a number that I know is lower than the max count
 // and that min is set to a number larger than the known min count
 var max = 1, min = 999, count = 0;
 // loop through each li and calculate statistics
 categories.each(function(i) {
  count = parseInt($(this).find('span').text());
  max = (count > max ? count : max);
  min = (min > count ? count : min);
 });
 var total, link, size;
 var multiplier = (maxPercent-minPercent)/(max-min);
 // loop through each li and generate the markup for the new tag cloud
 categories.each(function(i) {
  count = parseInt($(this).find('span').text());
  link = $(this).find('a');
  size = minPercent + ((max-(max-(count-min)))*multiplier) + '%';
  cloudMarkup += '<a class="cloud_link" style="font-size:' + size + '" href="' + link.attr('href') + '">' + link.text() + '</a> ';
 });
 // replace the html content of the parent container with the new tag cloud
 catContainer.html(cloudMarkup);
});

For you jQuery users out there - I am considering turning this into a plugin. I would like to hear from you what the expected markup should look like. For example, <a href="#" rel="[count]">tag</a> or <a href="#">tag ([count])</a>. Hopefully we can come up with something that doesn't completely violate the purpose of the rel tag, like that example! Thanks for your input!



69 comments:

Anonymous said...

Wonderful and a jquery plugin would be awesome.I personally feel you can choose any markup that makes things simple for you.

Anonymous said...

Why give it to the plugin users choice and let him define via plugin constructor the way he wants his tag to be.

Just to think for. It could not be this vast effort to implement it this way.

Best regards,
Skyborg

Jeremy Martin said...

I agree that it would be ideal to let the user determine the markup - just not sure yet on how to let them specify that in a flexible way...

Anonymous said...

While you're optimising your loop why don't you simplify
((max-(max-(count-min)))

to just

(count - min)?

Or do you have a typo in there somewhere?

Jeremy Martin said...

@Anonymous
Since count is updated during each iteration, I can't extract any expressions that contain count outside of the loop.

Anonymous said...

I would be very happy, if you could add an alternative routine to get the data for the cloud.

Like a text-file or data from JSON. The reason is, that not all people who use jQuery use it for HTML ;-)
Personally I use it for XUL (Firefox extension, etc.) as well.

Right now I am trying to create a GTD add-on for an application based on Mozilla and thus I am in need for a tag-clound jQuery plugin, that reads the tags from a database and then creates the tag-cloud from it.

Ken said...

Jeremy--

Thanks for this post. I had the same thought as the March 21 anonymous regarding the ((max-(max-(count-min))). You said efficiency was a concern, so here's my two bits on optimzing the size calculation in the loop:

Just distribute the minuses out to the terms in the parenthesis, and you'll see it works out, even though "count" changes on each iteration. Here it is step-by-step (perhaps a little more verbose than needed).

a) ((max - (max - (count - min)))
b) ((max - (max - count + min))
c) (max - max + count - min)

"C" will give you the exact same result as "A", and since the "max-max" cancels out, you can further simiplify it to be:

d) count - min

That leaves your code as the following:

var multiplier = (maxPercent-minPercent)/(max-min);
var size = minPercent + (count-min)*multiplier;

The nice thing about the above code is that math-minded folks will instantly recognize you're doing a linear interpolation of "count" between two numbers ("maxPercent" and "minPercent"). That was a bit obscured in the original code.

With a bit of explanation to someone of what interpolation is, you can even easily visualize in your minds-eye what is happening in the above code.

This is the easiest to understand version of the code, but you can introduce another variable optimize the loop further:

a) var size = minPercent + (count-min)*multiplier;
b) var size = minPercent + count*multiplier - min*multiplier;
c) var size = (minPercent - min*multiplier) + count*multiplier;
d) var offset = minPercent - min*multiplier; var size = offset + count*multiplier;

with "offset" being calculated outside of the loop (note that no terms in "offset" use "count" so this is safe to do). This means that inside of the loop we have 1 multiplication and 1 addition, whereas it used to be 4 additions/subtractions and 1 multiplication. Notice that the only things we're pulling out of the loop are the constants. Anything involving "count" stays in the loop.

This leaves your code as the following

var multiplier = (maxPercent-minPercent)/(max-min);
var offset = minPercent - min*multiplier;
var size = count*multiplier + offset;


A little harder to visualize what's going on in this version, but it is fine with a comment.

Try this; it will work. What your algorithm is basically doing is a linear interpolation.

Jeremy Martin said...

@Ken
Thanks for that very thorough explanation! I guess I was too hasty in my response to the 3/21 comment. Didn't we learn that stuff in like 5th grade or something...? I'll add this to my growing list of updates - too many smart people correcting me, not enough time...

jacob said...

Your algorithm has a problem if all of the tags have exactly the same number of occurrences. In this case max=min and you would get a divide by zero error.

Jeremy Martin said...

@Jacob
Uh... so it does. I swear, this post is taking me to school... Thanks for pointing that out!

rapin said...

This is great. Playing with it now. As for markup, maybe something simple like class="cloudable" rel="rank"?

I Personally prefer the rank/weight indicator to be in an attrib.

ege said...

Thank you very much for the article. It took me about 25 mins to apply it, not on the client side but server side using php. Your article is the simplest tutorial that I have found on the net. Special thanks go to Ken, as I used his refined version.

terminals-blocks said...

Just to think for. It could not be this vast effort to implement it this way.

webbod said...

I liked the idea, but I have a large set of data, the distribution is such that I have a lot of rare tags and a few very common ones - with hardly anything in between - I'm looking at product sales from a large inventory.

Some products sell 1 unit, others 50, 100, 500, 1000, etc... but one product sells 250,000 which means your algorithm gives me a few enormous tag and maps everything else down to the mininium size.

I need to be able to pull out the detail at both ends of the scale.

Using Log10(Count) worked really well and produces really compelling sales charts.

Thanks for the idea - 10 minute metrics - Mwhahahaha!

toothygoose said...

Here is a mootools/functional programming variant. Tidies away all the variables into its own namespace:


var TagCloud = {
apply: function(selector, options) {
options = options || {};

var maxPercent = options['maxPercent'] || 150;
var minPercent = options['minPercent'] || 100;
var retrieveCount = options['retrieveCount'] || function(element){element.getAttribute('count').toInt()};
var apply = options['apply'] || function(element, size){element.setStyle('font-size', size + '%')};

var max = null;
var min = null;

var tagElements = $$(selector);

tagElements.each(function(element)
{
count = retrieveCount(element);
max = (max == null || count > max ? count : max);
min = (min == null || min > count ? count : min);
});

var multiplier = (maxPercent-minPercent)/(max-min);

tagElements.each(function(element)
{
count = retrieveCount(count);
size = (minPercent + (count - min) * multiplier);
apply(element, size);
});
}
};

TagCloud.apply('ul#tags > li');

toothygoose said...

Whoops...

make that:

var retrieveCount = options['retrieveCount'] || function(element){return element.getProperty('count').toInt()};

Keep forgetting that the return statement is required in JavaScript. Lazy!

Anonymous said...

big think it was exactly what I searched, with a min and max!

I did it in PHP but will perhaps look after to make in JQuery because it's display work

think again!

nodejs said...

Nice Plug-in and demonstration!

Joseph Buarao said...

great demonstration... can i use this to my premium template?

Jonas said...

Thanks! I have been trying to come up with a good way to do this for a long time.

marko said...

It is constantly difficult to admit to budgetary issues, and many individuals let their obligation get totally wild before they look for offer assistance. Your family and companions will no doubt bail you out with your money related issues. They will presumably be more understanding than you anticipate.cash advance chicago

Alamin Miah said...

Its like you read my mind! You appear to know so much about this, like you wrote the book in it or something. I think that you could do with some pics to drive the message home a little bit, but instead of that, this is magnificent blog. A fantastic read. I’ll certainly be back. My Blog http://blockbtheatre.myfreesites.net/

Mirabbos Mirfayozov said...

I really impressed after read this because of some quality work and informative thoughts . I just wanna say thanks for the writer and wish you all the best for coming!
downlodable serial numbers

rustam said...

I think this is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article.

link for you

Syaniezt Barbie said...

The Article is very interesting and I like it. Agen jual fiforlif Balikpapan , Harga Fiforlif di Balikpapan , Jual Fiforlif Murah di Balikpapan , Manfaat Fiforlif , Distributor Fiforlif di Balikpapan

Badan Singh said...

Homes for sale in gastonia NC There’s no better time and no better place to start your online home search than right here on this listing search page. We’re providing access to listings for participating brokers in the Belmont, Bessemer City, Cramerton, Gastonia, Kings Mountain, Lincolnton, Mount Holly, and Charlotte real estate markets. You won’t miss a single listing.

Muhammad Danial said...

Absolutely fantastic posting! Lots of useful information and inspiration, both of which we all need!Relay appreciate your work. 3d architectural animation companies in vietnam

Selena Gomez - Revival Tour said...

I've been using this tool since ancient times, and I've known all that changed, both positive and negative
nội tiết tố nữ

Badan Singh said...

modeloslux ModelosLux es la agencia de escorts prepagos mas vip de Medellin, Colombia. Contamos con un servicio de primera calidad donde el cliente no tiene que preocuparse de una unicamente escoger a su Modelo de preferencia. En Modelos Lux contamos con una basta experiencia que ponemos a toda su disposicion ademas de contar con las mejores escorts o prepagos de la ciudad de Medellin renovando nuestro catalogo periodicamente para asi ofrecerles nuesvas experiencias a nuestros clientes recurentes. No olvide en la agencia modeloslux hacemos volar su imainacion.

Badan Singh said...

WrestleMania 2017 WrestleMania 33 live stream free, matches, time, TV channel, full show, rumors, results info. WWE WrestleMania 33, WrestleMania 2017, how to watch online.

Lê Trang said...

It seems a bit hard for me, since I have not used this app ever. Can you talk in more detail than me?
nội tiết tố nữ , nội tiết tố 33+ , nội tiết tố là gì , estrogen

Mona afrin said...

Hey there, You’ve done an excellent job. I will certainly digg it and personally suggest to my friends. I’m sure they will be benefited from this website. My Blog http://subscene.pen.io/

Badan Singh said...

Virtual Reality Sex The newest thing to come to the porn industry is virtual reality. The AVN show which is the largest Adult Entertainment expo in the world has even added two new industry award categories.

Alex Conner said...

This site and the resources you provide is really nice keep it up.
Auditing Project help

SPSS Assignment Help said...

SPSS Assignment HelpMy friend recommended this blog and he was totally right keep up the fantastic work!

Mona afrin said...

You'll be directed a 147 guide on Forex Trading plus you'll receive one-part everyday of the 7 portion Forex Trading Program for the next seven days. My Blog http://www.checkmycloset.com/members/profile/4254/blog-view/blog_53596.htm

rustam said...

Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic. If possible, as you gain expertise, would you mind updating your blog with extra information? It is extremely helpful for me.
thefreewarearchive.blogspot.com

Joanne Criss said...

Very Informative post you gave to us, really helpful I would like to share something really helpful for education. Assignment Paper Help

Alexander Sara said...

Thanks for sharing! If have a long time than visit to:
super smash flash | strike force kitty 2 | super smash flash 2 unblocked | super smash flash | strike force kitty | super smash flash 2

Wilson Larrie said...

Thanx for sharing such useful post keep it up :)online assignment help

Ales Noe said...

Things are very open and intensely clear explanation of issues. was truly information. Your website is very beneficial.
MBA Writing Service

Michal Jhonson said...

I’m really impressed with your article, such great & usefull knowledge you mentioned here
Help With Biology Assignment

Terry Bogard said...

Pretty helpful material, much thanks for this article
Do My C Project

Brian Joseph said...

by visiting this site I found cool stuff here keep it up.
Content Creation Service

Edward Mark said...

only professional writers can make this kind of material, cheers
custom coursework help

Finance Assignments said...

I appreciate this work amazing post for us I like it.
Finance Assignments Help Online

Java Assignment Provider said...

I loved the way you discuss the topic great work thanks for the share.
online java help

Ales Noe said...

I'm getting excited about this kind of beneficial information of your stuff in the future
Law Project Help Service

Wilson Larrie said...

This is really a great stuff for sharing. Thanks for sharing.
Psychology Project Help Service

pauledwards124 said...

The case study sample times and analyze the situation thoroughly. Make notes, look up for the important facts and highlight the crucial points.
Analysis Case Study

lisa roberts said...

Our website is number 1 in Assignment Provider, Assignment Help for University Students in UK, USA, Australia, and Malaysia. You Can Ask Every Thing You Want. Our Team Work 24/7.Affordable Assignment Help

Badan Singh said...

magie voyance Magic remains in this world. Fortune-telling and divinatory arts permit to have an experience of true magie voyance. You want to know your future ? You can both visit the website voyance-amour-eternel.com or contact a true clairvoyante fortune-teller able to deliver a voyance magie. You can discover tarot cards, astrology, numerology, kabbale. All these ancients arts allow you to step on the futur and see through your fate.

Badan Singh said...

Free Double Penetration Videos If you want latest double penetration videos then visit our site. For all latest sexy porn videos this is a must see.

Ann Lily said...

Great post. I was checking continuously this blog and I am impressed!
Very useful info specifically the last part :) I care for such info much.
I was looking for this certain information for a very long time.
Thank you and best of luck word cookies answers | word cookies game | word cookies game answers | hotmail email login | hotmail account login

Charlie said...

Free MapSure map improvements and updates keep your CoPilot current with changes in the road network. ownbig.ru From Niles Technology Group: Achievers Writing Center is a full-service writing center in an app.

dhow cruise in marina said...

It's contains very useful data which i need and i want to see more quality posts in this blog so please update your blog. Thanks for sharing.

Nathan said...

From Pike Street 3: Free Tip of the Day for iPhone makes you an iOS expert, one tip at a time. custom-samara.ru When planning your course, consider the skill level of your friends and yourself, safety equipment should be worn and is recommended.

pauledwards124 said...

Fascinating HR assignment help offered by our certified experts is it true that you are confronting issue finishing your human resource assignment.
HR Assignment Help Online

lisa roberts said...

Our website is number 1 in Assignment Provider, Assignment Help for University Students in UK, USA, Australia, and Malaysia. You Can Ask Every Thing You Want. Our Team Work 24/7.
assignment providers help

الاسمر المصري said...

شركة تسليك مجارى بالدمام

Badan Singh said...

Chatter dating Instarcams.com , free live cams & free live chat with Live cam girls , adult chat cams, be a live cam girl,

Badan Singh said...

sodomisée Plus d'1 million de vidéos pornos gratuites réunies sur un seul site. Wet4it est le 1er moteur de recherche porno 100% français (sexe, adulte, amateur, cul, masturbation, brunes, blondes, fetish, sodomie, ...).

Badan Singh said...

cougar The most diverse and complete collection of porn movies with the most sexy porn stars , sexy girls,busty milf with teen lesbian, gay or transexuals. Here are also the most beautiful models of webchat in the world, the most sexy escorts in the world and many more.Hard videos with assfuck,deepthroat,creampie,wet pussy,wet kissing,dirty mature and street hooker.

bwcajerky said...

Hello, i read your blog from time to time and i own a similar
one and i was just curious if you get a lot of spam remarks?
If so how do you prevent it, any plugin or anything you can suggest? visit my website
I get so much lately it’s driving me insane so any support is very much appreciated.

Samuel said...

It is recommended to complete itYou can set the touch sensitivity to control your snake from the game settings. kuznici.ru From Iceberg Reader: Knitting Under the Influence Claire LaZebnik 5 Spot November 15, 2008

Hoang kim said...

TRUNG TÂM BẢO HÀNH SỬA CHỮA PHÚC THỊNH
Xin chân thành cám ơn quý khách hàng đã ủng hộ và tín nhiệm Phúc Thịnh trong suốt thời gian qua. Chúng tôi luôn mong muốn sẽ là sự lựa chọn số một của khách hàng. Để đáp lại sự tin yêu và tín nhiệm của Quý Khách hàng chúng tôi sẽ nổ lực nâng cao chất lượng dịch vụ phục vụ khách hàng ngày một tốt hơn. Một lần nữa xin cám ơn và chúc cho Quý Khách Hàng luôn gặt hái những thành công vượt bậc trong lĩnh vực của mình.
Dịch vụ Công Ty Phúc Thịnh cung cấp:
1. Bảo hành máy giặt
2. Bảo hành máy nước nóng
3. Bảo hành lò viba vi song
4. Bảo hành tủ lạnh
5. Bảo hành tivi lcd led
6. Bảo hành máy lạnh

Anonymous said...

rainiertamayo.cc

Badan Singh said...

anal porno izle Xflix de amatör ve anal porno izle. Seksi sıcak Türk kadınlarının çıplak videolarını izle. Bu güzel kadınlar seksi ve anal seviyorlar.

SandraRoper said...

Could take you with the process of sideloading PPSSPP http://ppssppgold.org been showcased on numerous blogs.