Please join me at my new location bryankyle.com

Wednesday, January 28, 2009

Java's Got it All Wrong

When I first started working on computers I was sold that Windows was the only way to go. However as time went on I found that there was a lot that I was missing out on. One of those things was Unix. I've learned a lot about in the time since I was a hard-core Windows user. The one thing in Unix that has really stuck with me is the elegance of its design. In particular the approach of favouring processes over threads. It has huge implications that you never really understand until you run into a system that does things the other way around.

Recently I've been spending a lot of time thinking about web applications, and I keep coming back to the thought that the current design of application servers in Java are flawed. Typically application servers are designed to run as a single process with many threads servicing requests. While this approach has been proven to work, I'm not convinced its the right approach. The real problem as I see it is that there is no process isolation, and it's a real problem when working with many applications deployed to the same server. To illustrate my point, lets look at an analog: operating systems and processes.

One of the things operating systems do very well is manage processes. Each process gets its own little sandbox to play in so that it can keep track of everything the process is doing: what files it has open, how much memory it has allocated, etc. Additionally, the information available to the operating system is also made available to the users of the system so they can monitor each process and terminate any that aren't working correctly. By allowing the user to access the process information the operating system essentially saying "I'm a stupid machine and I only know how to handle obvious problems". An intelligent user can determine if one process is using too much memory or CPU and kill it so that all the other processes can keep on working away.

Java application servers on the other hand only presents a single opaque process. There's no way of telling which application is consuming all of the CPU cycles, sucking up all of the memory available to the VM, etc. And if one of the applications crashes it takes down the entire server with it. All of the applications within a server run under the same process and as such aren't really isolated from each other in any meaningful way -- there's no process isolation.

A better approach might be to follow the lead of the operating system. Each application runs as a separate process or tree of processes tracked independently by the operating system. The web server runs isolated from the applications that it's serving. The operating system already provides great tools for managing processes, so why not use them? This is by no means a new concept. If the application process speaks HTTP, then the web server can setup a reverse proxy to map the application into its URL address space. Alternatively, the application could just as easily speak FastCGI, a protocol that allows applications to respond to requests from a web server - similar to CGI, but without the overhead of starting up a process to service each request.

Unfortunately, I don't see anything changing anytime soon.