Automatic Invariant Detection in Dynamic Web Applications

For the last year, I have been working on my master project and two weeks ago I finally graduated. I did my master project at Tam Tam, an internet agency that provides full service internet services. It was nice to work there and if I did not have the opportunity to expand my own company I would have applied for a job at Tam Tam.

The project was about automatically finding invariants in web applications. The first focus was finding invariants in the JavaScript parts, but later on we extended the scope a bit and also included invariants over the DOM. While most of the techniques I developed can be used in a very generic way, my implementation depends on Crawljax. I developed plugins to Crawljax, under the name of InvarScope, that can automatically find these invariants and use them for regression testing.

We submitted a paper based on my work to ICSE'11, so before that was finished I was not allowed to blog or publish any of my work. Well, we made the deadline, so I can now release all of the code, my thesis and the paper itself.

The code I wrote is available in a subdirectory of the Crawljax plugins Google code project. We're currently in the process of fixing all Maven dependencies, cleaning up some code and making it all work with the current Crawljax trunk version, so expect a binary release in a few days.

Don't hesitate to contact me if you have any questions!

Symphony CMS; the Best CMS?

I've been looking for a good Content Management System (CMS) the last couple of days after a colleague and I had some discussion about what CMS to use for our clients. Sometimes we have clients with specific needs, which are difficult to fulfill using WordPress. The solution we used to choose was either build some plugins or use our custom developed CMS. However, none of these are a great solution. WordPress can be complicated for novice computer users, has a messy code-base and our own CMS is not really user-friendly either.

My colleague decided to try out ExpressionEngine. He bought the freelancer edition and he's been trying things out. Up until now, it all seems to work quite well, although the back-end can still be too complicated for our clients. Also, I hate the fact that you should pay 300 dollars to use ExpressionEngine for a commercial company. Thats an added fee some customers would rather spend on different things.

So, I started to search for open-source CMSes myself and made a list of requirements.

  • It should not be page based, it should allow you to model your own content. If you use a CMS that supports types/entities/resources/sections/whatever you can create your own page type, but you can also create more advanced things like portfolio items, projects or products (yes, even a simple web shop is possible then).
  • The back-end should be as simple as possible.
  • It should be written in PHP, object-oriented if possible, and use MySQL for storage.
  • There should be a good, flexible templating engine for the views.
  • It should have a good plugin API.

Well, using this list it was a lot easier to search for the most fitting CMS, as quite a lot CMSes are only page or post based. The list of possible candidates shrunk by more than 75%. Eventually I found a CMS I had never heard of, but which seemed to have all the things we were looking for: Symphony CMS.

I've been trying it out in the last few days and I still haven't found any deal-breakers. Symphony CMS has a great website, friendly community (because it's still small I think), great features, simple back-end, small code-base and it can be easily extended by writing extensions.

Some things might give problems for specific clients though: multi file upload is non-existant (there's one extension that doesn't do what it should) and the WYSIWYG editor extensions, with support for placing images etc., don't seem to be integrated well enough with Symphony CMS yet. Well, maybe I'll just fix those two myself and contribute them upstream. That is, if I have some spare time... :)

Dropbox on Your Own Server

I've always liked Dropbox, except for one thing: I don't trust them with my data. Also, it seems wrong to pay $ 10,00 for 50 GB of storage when you have your own server with much more storage and available on a fast network.

Well, finally there is a solution. It's called SparkleShare and it's completely open source and uses Git as a backend. Today they released a very early alpha version and I tried it out immediately. After having some trouble with the interface (you need to insert <username>/<reponame> in the folder input box if you use Github), everything worked great. However, I don't advice anybody to use it in production. It's still in development and can contain serious bugs. I can't wait till it gets more mature and ready for production usage!

Crawljax 1.9 Released

We just released Crawljax 1.9, the project I'm working on for my master thesis. It's mostly a "bug fix and clean up" release, but some important changes were made as well.
Continue reading

Crawljax 1.8 released

I'm working on a cool project for my master thesis: Crawljax. Crawljax is a website crawler that supports JavaScript. This is done by opening a real browser such as Firefox and controlling it via WebDriver. The core of Crawljax does only that: crawling websites. However, there is a very flexible plugin system available that allows you to do all kinds of cool things such as creating a static mirror of an AJAX website or creating test suites for you AJAX webapplications.

Download Crawljax now and give it a try!

To get a better grasp of what is possible, have a look at the Google Tech Talk a colleague of mine did:

WordPress 2.9 released

WordPress 2.9 was just released. All in all this seems to be a great release again, including some features I had been looking forward to:

  • Easier bulk plugin upgrades
  • In browser image editing
  • A trash for posts you remove, so you can undo it if necessary

Upgrade now!

Speeding up your website

A few days ago I stumbled upon the new Google "Let's make the web faster"-page. I found some useful tips on there. Some of them are public knowledge, for example the fact that using echo in php is faster than print, single quotes are faster than double quotes and echo supports endless arguments, which is faster than string concatenating (so you write echo 'this', $is, 'faster than', $concatenating, 'the string' instead of echo 'this' . $is . 'slower').
However, there's a nice article about ommitting html tags, which had some tips I did not know. For example, if you're using HTML, instead of XHTML, you're allowed to ommit more tags than most people know. For example, you don't need to close a paragraph, you can just start a new one. For the complete list, have a look at the articles section of the site.

My first Cherokee patch

I've been playing with Cherokee (the light-weight web server) for a while now. I really like the way their configuration file can be managed with cherokee-admin. This is basically a secured web page that provides a convenient interface to all of Cherokees settings.
Although Cherokee is looks so great, I can't switch the Ivaldi web server to it, because of a few problems:

  • No support for authentication against SHA1 hashed passwords from a MySQL database
  • No support for webdav/svn, we currently use Apaches mod_subversion with authentication against a MySQL database.

The first point didn't seem so hard to fix, so I submitted a patch to the Cherokee project. The maintainer got in contact with me and let me sign a contributors agreement. I think this means that the code can be committed to their subversion repositories now.
This still leaves one problem before I can switch: webdav/svn. I don't think I have enough knowledge to fix that. I might try to switch all our current sites to Cherokee though and keep a light-weight, trimmed down, Apache for webdav/svn.

Google Analytics API launched!

Google just announced the public availability of the Google Analytics API. This is great news, because you can now write your own (web) applications that use the data from your analytics account. I might integrate this with the CMS we wrote for my company.
The announcement is available here.

Magic c-style comments

A friend/collegea of mine had a really cool idea about c-style comments. C-style comments are multi-line comments that start with /* and end with */. They can be used in C, C++, C#, PHP, Java and a other languages. When this friend debugged some code, he found himself switching some blocks of code on and off all the time. He came up with a solution to switch between two blocks of code by just removing or adding one character!
See for yourself at: http://blog.mycroes.nl/2009/04/magic-comments.html