<H1>Jetty Case Studies</H1> <H2>A Shared Jetty Server in a College Environment</H2> <b>by James Robinson</b> <BR>April 2004 <P> <i> When I was systems administrator for the University of North Carolina at Charlotte department of Computer Science, I saw the need to establish an environment for our students to experiment with servlets. The goal was to provide a centralized system for our students to deploy their servlets / web-applications on without having to shoulder the additional burden of administrating their own web / servlet containers. At the time, the department of computer science was a partner in a heavily centralized workstation-based computing system modeled after MIT's Project Athena, using AFS as an enterprise filesystem. Having such a filesystem solved the problem of getting the student's code + web-application data to the servlet container -- just save it in a readable portion of the student's AFS home directory, and the servlet server can pick it up and go. <P> I developed a workable solution with Jetty version 3. I found the internal architecture of Jetty to follow the 'as simple as things should be, but no simpler' rule, allowing me, a time-constrained sys admin with Java skills, to squeeze off the project in the time allowed so that courses could begin to utilize the service. Jetty's native extensibility allowed me to easily extend its functionality, allowing the students to remote-deploy and administrate their own servlet / JSP code on the machine via webspace gestures. </i> <h3>Implementation</h3> The core bits of this was a new HttpHandler implementation which acted as the main controller. In Jetty 3, HttpHandlers are stacked within HandlerContext objects, which are mapped to URI patterns within the Jetty server itself. The HandlerContext most closely matching the URI named in a HTTP request is asked to handle the request, which it does so through iterating over each its contained HttpHandlers. The HandlerContext containing this HttpHandler implementation was mapped to URI "/~*", so that this handler would be considered to handle a request to "/~username/...". The handler's initial responsibilities were to: <ol> <li> Dynamically build contexts for users on demand by first reference to their own webspace on the machine, such as the first hit to "/~username/*". This handler would look up the user's homedir in a UNIX passwd file containing only students (no system accounts), and then create a new HandlerContext to serve out static content and JSPs out of ~username/public_html in filespace, and dynamically mapped servlets from ~username/public_html/servlets in filespace. The ability to lazily deploy a user's personal context was paramount, since possibly only 20 out of many thousands of possible students would use the server any given day. The newly-created HandlerContext would be preferred by Jetty to serve out requests to "/~username/*" over this handler's context, since the match to "/~username/" was more specific. </li> <BR> <li> Reload any one of the already-deployed user contexts, so that Jetty would reload any class files that had been recompiled. This was done through merely stopping, removing, and destroying the user context in question (easy, since the HttpHandler implementation maintained a Map of username -> created context). After removal of the old context, we would lazily initialize a new context upon next request to a resource in that context via step 1. This action was done through a custom servlet in the same package which maintained a reference to the HttpHandler via the singleton design pattern. This servlet, when asked via a webspace gesture, would make a protected method call into the HttpHandler to perform this step to user foo's context. </li> </ol> <P> As time went on, additional features were added: <ul> <li> <b>Web applications.</b> Students could deploy web-applications, either in expanded format or jar'd up into a subordinate webspace of their personal webspace of their own choosing (i.e /~username/mywebapp/*). They could then choose to undeploy, redeploy, or to view the logs generated by this web-application's servlet / JSP code (hooks to personal log sinks per each webapplication). I chose to have the deployed web-applications be 'sticky', living through a server reset. This was accomplished by serializing the Map of loaded web-applications to disk whenever it changed, and to replay it as a log upon server startup. In hindsight, I should have deferred the full reload of a known web-application until a resource within the web-application was actually referenced, reducing the memory footprint of the server, as well as greatly reducing the server startup time (150 webapps can contain quite a lot of XML to parse). </li> <BR> <li> <b>User authentication realms.</b> Users could configure simple Jetty HashUserRealms via indicating where in their filespace to load in the data for the realm. Realms defined by students in this way were forced to be named relative to their own username, such as 'joe:realm'. The student's web-applications could then contain security constraints referencing user / role mappings of their own choosing. </li> </ul> <h3>Security</h3> Security is an issue for any resource shared by students. The servlet allowing users to remote-control their own resources was ultimately made available through SSL, locked down via a HTTP basic security realm backed by interprocess communication to C code to perform the AFS/Kerberos authentication checks given a username / password, allowing the server to accurately trust gestures controlling a given user's resources on the server. A java security policy was installed in the JVM running Jetty, limiting filespace access, as well as disallowing calls to System.exit() and other obvious baddies, as I quickly found out that their JDBC code's SQLException handler often was System.exit(). Unfortunately, the java SecurityManager model cannot protect against many types of 'attacks' brought on by untrusted student code, such as CPU-hogging broken loops, thread-bombs, and the like. A babysitting parent process was quickly written to restart the servlet server if it ever went down, as well as would bounce the server if it had consumed more CPU than it should have (probable student-code busy-loop). Daily restarting the server acted as ultimate garbage collection. <p> AFS supports ACLs on directories, and instead of requiring all servlet-related files to be flagged as world-readable, the servlet server process ran authenticated to AFS as a particular entity which students could grant access to. This reduced the capability of just stealing their classmates code using filesystem-based methods, but they could conceivably just write a servlet to do the same thing. Possibly deeper insight into the java security model could have corrected this. <p> The RequestDispatcher API was another thorn in the side of security, allowing any single servlet to tunnel through a context / web-application barrier to any other URI valid on the machine, conceivably allowing a nefarious student to snarf up content served by another student's servlets, even if that student had wrapped the servlet in a security constraint. <p> Symbolic-link misdirection based thefts were not considered at all. <p> Ultimately, students were warned many times up and down that this was a shared server running your peer's untrustable code, and that you should only be using it for your coursework and explorations into the servlet world. Nothing requiring true tamper-proof security should be deployed on this box. <h3>Lessons Learned</h3> As the service became more and more popular, I wish that I had been able to move it to a bigger box, something other than a non-dedicated Sun Ultra 5 with 256M RAM. Having more memory available to the JVM would have greatly helped out when a 30 student section all tried to deploy SOAP-based application, each using their own jars for apache axis, xalan, etc. <p> Using an inverse-proxy front-end to the system would have allowed splitting the users across multiple JVM / Jetty instances, allowing gains on uptimes (as seen from an individual's perspective, since a server kick to clear out a runaway thread would cause downtime for, say, 50% or 33% of the entire user base, as opposed to 100%). It would also have allowed me to have the service facade running at port 80, as opposed to the truly evil JNI hack I had to do to have Jetty start up as root, bind to port 80, then setuid() away its rights N seconds after startup. Way ugly. After the setuid() call was made, a latch was set in the custom HttpHandler, allowing it to begin servicing user contexts. However, having more than one Jetty instance would have complicated the implementation of the controlling servlet, requiring it to perform RMI when asked to control a non-local user context. This pattern could have been used to scale down to one user per JVM, with the inverse-proxy being able to fork / exec the JVM for the user upon demand, especially with Jetty now having HTTP proxy capability. That would probably be overkill for a student service, but having a static inverse-proxy with a fixed mapping to 2 or 3 Jetty instances (possibly running on distinct machines) would have been a relatively attractive performance and reliability enhancer per the effort. <p> Impressions from the users were mixed. When all of the code being run on the machine was benign, not memory nor CPU-hoggish, all was well and the students were generally ambivalent -- this service was something that they had to use to pass their coursework, servlet coding / debugging was slower and more cumbersome than 'regular' coding, etc. Having a hot-redeploy-capable container didn't seem whiz-bang to them because they had no other experience in servlet-land. When the machine was unhappy, such as if it was 3 AM on the night before the project was due and one student's code decided to allocate and leak way more memory than it had any right doing, causing the others to begin to get OutOfMemory exceptions left and right, then they were (rightly) annoyed and let me hear about it the next day. <p> If I were to re-solve the problem today, I would: <ul> <li> Use some sort of inverse-proxy to smear the load over more than one JVM for higher-availablity, allowing the Jetty instances to bind to an unprivileged port. </li> <li> Use the JDK 1.4's internal Kerberos client implementation to authenticate the campus users. Both of these steps would eliminate all C code from the system. </li> <li> Run on at least one bigger dedicated machine. </li> <li> Encourage the faculty to work with me to ensure that their central APIs can be loaded by the system classloader as opposed to their student's web-application loader so we don't end up with 30 copies of XSTL or SOAP implementations all at once. </li> <li> Lazy-load web-applications and auth realms upon first demand instead of at server startup. </li> <li>Age-out defined web-applications and auth realms if they have not been referenced in the past X days, so that they'll eventually be forgotten about completely when a student finished the course. </li> </ul> <p> <font size="smaller">[Copyright James Robinson 2004]</font> <p> <p> <center> <font size=-1>Click to find out <A href="index.html">how to contribute a Case Study</a>.</font> </center>