This version of the page http://www.panoramio.com/blog/it-cant-be-so-hard/ (0.0.0.0) stored by archive.org.ua. It represents a snapshot of the page as of 2008-08-17. The original page over time could change.
It can’t be so hard… at Panoramio

Panoramio's Blog


It can’t be so hard…

December 27th, 2005 by Joaquín Cuenca Abela

Do you remember my last post about my javascript compressor?

Óscar Frías, from Trabber fame asked me for my opinion on the dojo javascript compressor, and my reply was along the lines of: “slightly better than mine, as it compress inner variables / functions names”. As this seemed a trivial to overcome difference, I coded this compression in my yet unnamed compressor.

There are only two cases that I have not handled:

  1. If you have variables named _1, _2, … things will break, quickly. This is a pretty trivial bug that I have to fix asap. I guess that dojo gets that right.
  2. If you have “with” statements, things may break.

I wonder if dojo handles that second point. Let me elaborate with an example.


function a()
{
function getElementById(a) { return “hi”; }
with (document) alert(getElementById(”xxx”));
}
a();

This code should show an alert dialog box with “null”. If we compress it, renaming inner variables / functions, we get:


function a()
{
function \_1(a) { return “hi”; }
with (document) alert(\_1(”xxx”));
}
a();

(Note the compressor doesn’t know that the getElementById inside the with statement was in fact document.getElementById, and thus it changes it to _1.)

This new code will show “hi” instead of “null”.

Despite this corner case this new compression is worth enough to have.

But I didn’t build a new compressor just to be a shadow of dojo’s one. I want to build the best compressor ever, and such a compressor should fix what I consider a growing problem in the javascript community.

Javascript frameworks are growing in number / quality / size. Dojo, prototype / scriptaculous, rico. You name it!

Some frameworks are already weighting in the megabytes, and to make the download manageable to users, they split the framework in several files. The developper should then pick the javascript files with the code that he will finally use. But it’s not working.

If I want to use a simple smooth blink effect with scriptaculous, I should bring the whole effects.js file and the whole prototype.js file, even when I’m obviously only using a little part of these files. Bigger pages very quickly start using all the scriptaculous files, even when they are “only” using an Ajax.Request here and a little effect there.

An approach to cut the bloat is to do manual surgery on these files to build a new page with the minimum needed to run, for instance, some effects (as moo.fx does). But this is time consumming, requires a careful human expert to cut down bloat and yet get something useful, and it should be done on a page per page basis. At the end, it boils down to copy & paste parts from your framework on separated file. But that is too much work to use a framework without putting an unaceptable burden in the shoulders of our users.

The approach that I have in mind is akin the previous one, but done automatically. A program parsers your pages, and it sees exactly what’s the minimum javascript needed to do what you are doing. It then rewrites your page / javascript to use this minimum.

So, in short, my goal is to prune dead code of javascript files, as linkers do in other languages.

Unfortunately, due to the highly dynamic nature of javascript, this is impossible to do without building a full javascript engine. Remember that functions are also objects in javascript, and thus they can be freely copied in variables. When you do “a()”, you have to evaluate “a” to know what is getting called.

At this point some part of my brain started to think “well, it’s just a javascript engine, it can’t be so hard…” and somewhere, a few days ago, it convinced the other part of my brain. So here I am, with a javascript engine that evaluates all the javascript statements but for’s, switches, try / catches and functions, and all the javascript expressions but constructors, function calls, arrays and objects. You can call it the most overengineered calculator, ever. But I felt really great when I wrote 1 + 1, and it replied: 2.

The really big missing part in this engine is the ECMA object system, that I failed to implement correctly in my first try. I will still need a full day or two to finish it, and then I hope the remaining parts will just fall in place.

When I will finish the javascript engine, I will have to figure out how to do dead code prunning with it. I’m thinking of using a garbage collector like algorithm, but doing it without falling in an exponential explosion of cases seems not trivial. Well, I will cross that bridge when I will get at it, I guess.

Oh, btw, MERRY CHRISTMAS!!!


10 Responses to “It can’t be so hard…”  

  1. 1 Johan Sundström

    gzip(1) is the best javascript compressor available, putting the HTTP standardized, content preserving, Content-Encoding:gzip functionality to excellent use. If it is compression you’re after, anyway, that is the best route to go. Innerjoin has a nice tutorial on methods of setting up Apache with mod_gunzip. (If it is actually the code obfuscation and “raising the bar on external code reuse” you seek, it would be very pleasing on the eye to see it named/admitted as such.)

    That said, I still find your work on a javascript linker theoretically interesting, even if it would need to be very advanced to do true dependency analysis on code putting the dynamic capabilities of ecmascript to good use. I believe most frameworks today are rather naïvely coded, though, by people who write code in the “smallest common denominator” (between PHP and javascript, for instance) fashion, so it might be of some use.

    If you ever plan on releasing it, I hope any variable, property, object and function renaming will be optional.

  2. 2 Joaquín Cuenca Abela

    Hi Johan,

    As you can see, we are using mod_deflate (or mod_gunzip, I don’t remember) right now.
    I see that you have seen now my reply to your previous comment, so that kind of replies your question about the goal of this work.

    I have several goals, and each stage brings a little tool useful by itself. I did not want to attack the big problem of doing a javascript linker or a javascript leak detector first, because they are quite daunting, and I want to have something to show if my motivation fails.

    These are the tools I’m thinking of:

    \1. With a parser, I can compress a bit the output, until now in a fully recoverable way. The new renaming is not fully recoverable, and thus it will be optional. Actually, even the “compress spaces” will be optional, leaving the choice of pretty printer / compressor in the user. (In the future I will myself use it to compress spaces, let newlines in place, don’t rename.)

    Given that I have not even compressed Panoramio’s code (except by using mod_deflate), even when my tool can parse and rewrite correctly both, Panoramio’s code and Google Maps’ code, it should be pretty clear that I’m not trying to obfuscate anything. (But as I said, when I finish the pretty printer I will remove extra spaces.)

    Btw, I have discovered an ecmascript conformance bug in Google Maps that every browser parses and execute as Google Maps’ author expected.

    \2. With a javascript engine, I can do extra tests in the code. For instance, I would love if it could say me “hey, here you are using foo, and that’s not supported in IE5. Do bar to make it compatible.”. Unfortunately I will have to clone the host objects to have useful warnings, and that will take a bit of time.

    It’s also mandatory for a leak detector, but again, I will need to mirror host objects. The only tool to help diagnose leaks that I am aware of is Joel’s one, and it crashed when I tried to use it in Panoramio.

    \3. Third tool will probably be the linker. I think the community has really a problem here. I’m always copying little functions like prototype’s $ to not impose the burden of prototype on light pages. Or cutting debug code, that may statically be proven as unreachable. But as you say, it’s not gonna be easy.

    It is also an enjoyable learning experience. Last time I coded in O’Caml was years ago, and I’m rediscovering the language with this project. And even when I thought I mastered quite well javascript, I now have a much deeper understanding of it.

    I hope this clears the “what are you trying to do here” question.

    Cheers,

  3. 3 pablete

    You are trying to solve the wrong problem. The javascripts you loaded on a page are cached by the browser, so, there is no need to constantly reload them. You need speed for the first download, so use gzip with apache or lighttpd, this will give you 10x compression to your javascript files (plain text files). That’s everything you need.
    With all due respect, you are solving a problem, only because the beauty of it.
    I think you need to rest a little bit, and take some days off. I am not beening ironic or worst. It’s just an advice. I have been following your development since jose florido told me to, a while ago, and you have made an impressive work with panoramio.

    By the way, your idea to make stripped versions of libraries you use for different cases (when you need only few functions) can be done offline just once, for particular cases, by some other simple languajes like ruby or perl. It’s very good idea actually.

    Cheers,

  4. 4 Johan Sundström

    It certainly does, and with lots of eager anticipation of any other bits or pieces to come out of either project.

    I would have lots of use for a tool to do just custom formatting of code (and am a bit too lazy to hack any of the C parsers on offer today). An Ocaml first encounter for that would probably be lots of fun; it seems a very nice language to do really efficient code (my experience with the language mostly consists of the perspective given by Doug’s Great Computer Language Shootout, where the Ocaml implementations on several benchmarks beat most other languages with a margin, including Pike, which is home environments to me).

    You peaked my curiosity about the ecmascript conformance bug you found; any more detailry available on that topic?

    \2. also sounds an amazingly useful prospect. I hope you’ll post some notice about it, should you ever want co-developers on either of these “spin-off” projects.

  5. 5 Joaquín Cuenca Abela

    Hi Pablete & Johan,

    You are slightly overestimating gzip’s abilities, but otherwise I mostly agree. Again, a compressor / pretty printer is the only thing I could do with a javascript parser, and I need it anyway for other things. It also yields a 2x compression, which is not entirely irrelevant.

    But, if we forget about the compressor, I hope we will agree the linker / memory leak detector / “issue warnings on compatibily problems” projects are worth it. All them need a previous javascript parser.

    You said that stripped versions of libraries can be done offline. That’s the whole point (I don’t want to do it online), but to do it *correctly* you need a javascript engine.

    You can do some quick hack in ruby, perl (or caml, btw) that will work most of the time, in the same sense than you can use some regular expressions to do a quick compressor. It will not however cover all cases, and as today toolkits keep growing, your chances of hitting a bug trying to prune dead code in each toolkit will grow.

    For things like a leak memory detector, there is no way around doing a full javascript engine. A quick & dirty perl script will not cut it. And I don’t know about you, but I’m using a personalized version of Google Maps code due to all the leaks of the original version. Maybe they have improved in .28 / .29, but anyway if Google is having problems attacking this problem with tradicional tools, then it means everybody has a problem. (And unfortunately is not as easy as just disconnecting all your event handlers on unload.)

    Johan, the ecmascript conformance bug in google maps is that they do a (I think it is two, in fact) function declarations inside an if block, something as:

    if (foo) { function bar() { return “hi” } alert(bar()) }

    Function declarations are only legal in toplevel and directly in the block of another function declaraction, so this is illegal. As everybody else is handling this case, I also added it to my parser. (Btw, I already pointed out this problem to Google Maps guys.)

    Oh, and I may even follow Pablete’s advice and take some rest

    Happy holidays guys!

  6. 6 Johan Sundström

    Your work has lots of merit, and the automated leak detection is certainly another dazzling prospect. Thanks for the ecmascript hint; while I would not likely write that kind of code myself, it’s a useful bit of knowledge to possess.

    And happy holidays to you, too!

  7. 7 Joaquín Cuenca Abela

    For people following this, I completed the evaluation of all the statements and expression, except for functions declarations / calls and constructors.

    I rewrote the object framework, only to discover than I have made it not generic enough for host objects. So in short I screwed it. *Again!* It seems that I will need to use O’Caml objects after all.

    At least I was able to implement the basic Object and Function objects (with their prototypes and constructors), so I will maybe be able to implement function calls and object constructors with the current framework.

    Everything else seems to be working, with caveats, for instance I convert exception’s objects to javascript strings, as I have not yet implemented these objects (but they are correctly throw / propagated / catched).

    In the mean time, I managed to discover two bugs in SpiderMonkey. Brendan Eich being the incredible hacker he is, has already fixed one of them, and the other one seems to be in the pipeline.

    I also discovered three bugs in the grammar published in mozilla’s site. I have yet to email Waldemar about them.

    I think that I will take a rest for a few days to work on some Panoramio features. And maybe even take a rest to do nothing… you know, to rest

    Cheers,

  8. 8 Brad Neuberg

    If you layer a packaging system over JavaScript, as Dojo has with it’s dojo.provide and dojo.require methods, then the problem you are trying to solve becomes much easier.

  9. 9 Joaquín Cuenca Abela

    Yes, but you still have to manually annotate what you’re going to use, and what do you provide in each file.

    And besides that, it will only work on those frameworks that provide such functions.

  10. 10 jameson

    Her. Bobby knew wascoming. Mindy yes, i can and slowly pressed jenna jameson anal his body. Was right. Let jenna jameson videos her fingers wrapped around and a diligent mechanic. She looks up.

Leave a Reply


For spam detection purposes, please copy the number 4120 to the field below:

  • Archives

    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
    • October 2007
    • September 2007
    • August 2007
    • July 2007
    • June 2007
    • May 2007
    • April 2007
    • March 2007
    • February 2007
    • January 2007
    • December 2006
    • November 2006
    • October 2006
    • September 2006
    • August 2006
    • July 2006
    • June 2006
    • May 2006
    • April 2006
    • March 2006
    • February 2006
    • January 2006
    • December 2005
    • November 2005
    • October 2005
    • September 2005
    • August 2005
  • Categories

    • css (2)
    • html (8)
    • Interaction Design (6)
    • javascript (11)
    • miscellaneous (25)
    • new features / improvements (63)
    • panoramio (49)
    • personal (2)
    • places (25)
    • Uncategorized (5)