Sophie: jetty5-manual-0:5.1.15-1.5.2mdv2010.0 noarch

jetty5-manual-5.1.15-1.5.2mdv2010.0.noarch.rpm

<H1>Jetty Case Studies</H1>

<H2>A Shared Jetty Server in a College Environment</H2>
<b>by James Robinson</b> 
<BR>April 2004
<P>
<i>
When I was systems administrator for the University of North Carolina at
Charlotte department of Computer Science, I saw the need to establish an
environment for our students to experiment with servlets. 
The goal was to provide a centralized system for our students to
deploy their servlets / web-applications on without having to shoulder
the additional burden of administrating their own web / servlet containers.
At the time, the department of computer science was a partner in a heavily
centralized workstation-based computing system modeled after MIT's Project
Athena, using AFS as an enterprise filesystem. Having such a filesystem
solved the problem of getting the student's code + web-application data
to the servlet container -- just save it in a readable portion of the
student's AFS home directory, and the servlet server can pick it up
and go.
<P>
I developed a workable solution with Jetty version 3. I found the internal
architecture of Jetty to follow the 'as simple as things should be, but
no simpler' rule, allowing me, a time-constrained sys admin with Java
	skills, to squeeze off the project in the time allowed so that courses could begin to utilize the service. Jetty's native extensibility allowed me
to easily extend its functionality, allowing the students to remote-deploy
and administrate their own servlet / JSP code on the machine via webspace
gestures.
</i>

<h3>Implementation</h3>
The core bits of this was a new HttpHandler implementation which acted as
the main controller. In Jetty 3, HttpHandlers are stacked within
HandlerContext objects, which are mapped to URI patterns within the
Jetty server itself. The HandlerContext most closely matching the
URI named in a HTTP request is asked to handle the request, which it
does so through iterating over each its contained HttpHandlers.
The HandlerContext containing this HttpHandler implementation was mapped to URI "/~*", so that this handler would be considered to handle a request to "/~username/...". The handler's initial
responsibilities were to:
<ol>
	<li> Dynamically build contexts for users on demand by first
reference to their own webspace on the machine, such as the first
hit to "/~username/*". This handler would look up the user's homedir
in a UNIX passwd file containing only students (no system accounts),
and then create a new HandlerContext to serve out static content and JSPs
out of ~username/public_html in filespace, and dynamically
mapped servlets from ~username/public_html/servlets in filespace. The
ability to lazily deploy a user's personal context was paramount, since
possibly only 20 out of many thousands of possible students would use
the server any given day. The newly-created HandlerContext would be
preferred by Jetty to serve out requests to "/~username/*" over this
handler's context, since the match to "/~username/" was more specific.
        </li>
<BR>
	<li> Reload any one of the already-deployed user contexts, so that
Jetty would reload any class files that had been recompiled. This was
done through merely stopping, removing, and destroying the user context
in question (easy, since the HttpHandler implementation maintained a
Map of username -> created context). After removal of the old context,
we would lazily initialize a new context upon next request to a resource
in that context via step 1. This action was done through a custom servlet
in the same package which maintained a reference to the HttpHandler via the
singleton design pattern. This servlet, when asked via a webspace gesture,
would make a protected method call into the HttpHandler to perform this
step to user foo's context.
        </li>
</ol>
<P>
As time went on, additional features were added:
<ul>
<li> <b>Web applications.</b> Students could deploy web-applications, either
in expanded format or jar'd up into a subordinate webspace of their
personal webspace of their own choosing (i.e /~username/mywebapp/*). They
could then choose to undeploy, redeploy, or to view the logs generated
by this web-application's servlet / JSP code (hooks to personal log sinks
per each webapplication). I chose to have the deployed web-applications
be 'sticky', living through a server reset. This was accomplished by
serializing the Map of loaded web-applications to disk whenever it changed,
and to replay it as a log upon server startup. In hindsight, I should have
deferred the full reload of a known web-application until a resource within
the web-application was actually referenced, reducing the memory footprint
of the server, as well as greatly reducing the server startup time (150
webapps can contain quite a lot of XML to parse).
</li>
<BR>
<li> <b>User authentication realms.</b> Users could configure simple
Jetty HashUserRealms via indicating where in their filespace to load in
the data for the realm. Realms defined by students in this way were
forced to be named relative to their own username, such as 'joe:realm'.
The student's web-applications could then contain security constraints
referencing user / role mappings of their own choosing.
</li>
</ul>
<h3>Security</h3>
Security is an issue for any resource shared by students. The servlet
allowing users to remote-control their own resources was ultimately
made available through SSL, locked down via a HTTP basic security
realm backed by interprocess communication to C code to perform the
AFS/Kerberos authentication checks given a username / password, allowing
the server to accurately trust gestures controlling a given user's
resources on the server. A java security policy was installed
in the JVM running Jetty, limiting filespace access, as well as disallowing
calls to System.exit() and other obvious baddies, as I quickly found out
that their JDBC code's SQLException handler often was System.exit().
Unfortunately, the java SecurityManager model cannot protect against
many types of 'attacks' brought on by untrusted student code, such as
CPU-hogging broken loops, thread-bombs, and the like. A babysitting
parent process was quickly written to restart the servlet server if it
ever went down, as well as would bounce the server if it had
consumed more CPU than it should have (probable student-code busy-loop).
Daily restarting the server acted as ultimate garbage collection.
<p>
AFS supports ACLs on directories, and instead of requiring all
servlet-related files to be flagged as world-readable, the servlet
server process ran authenticated to AFS as a particular entity which
students could grant access to. This reduced the capability of just
stealing their classmates code using filesystem-based methods, but
they could conceivably just write a servlet to do the same thing.
Possibly deeper insight into the java security model could have
corrected this.
<p>
The RequestDispatcher API was another thorn in the side of security,
allowing any single servlet to tunnel through a context / web-application
barrier to any other URI valid on the machine, conceivably allowing
a nefarious student to snarf up content served by another student's
servlets, even if that student had wrapped the servlet in a security
constraint.
<p>
Symbolic-link misdirection based thefts were not considered at all.
<p>
Ultimately, students were warned many times up and down that this
was a shared server running your peer's untrustable code, and that
you should only be using it for your coursework and explorations into
the servlet world. Nothing requiring true tamper-proof security should
be deployed on this box.

<h3>Lessons Learned</h3>

As the service became more and more popular, I wish that I had been able
to move it to a bigger box, something other than a non-dedicated Sun Ultra
5 with 256M RAM. Having more memory available to the JVM would have greatly
helped out when a 30 student section all tried to deploy SOAP-based
application, each using their own jars for apache axis, xalan, etc.
<p>
Using an inverse-proxy front-end to the system would have allowed splitting
the users across multiple JVM / Jetty instances, allowing gains on
uptimes (as seen from an individual's perspective, since a server kick
to clear out a runaway thread would cause downtime for, say, 50% or 33%
of the entire user base, as opposed to 100%). It would also have allowed
me to have the service facade running at port 80, as opposed to the truly
evil JNI hack I had to do to have Jetty start up as root, bind to port
80, then setuid() away its rights N seconds after startup. Way ugly.
After the setuid() call was made, a latch was set in the custom
HttpHandler, allowing it to begin servicing user contexts. However, having
more than one Jetty instance would have complicated the implementation of
the controlling servlet, requiring it to perform RMI when asked to
control a non-local user context. This pattern could have been used
to scale down to one user per JVM, with the inverse-proxy being able
to fork / exec the JVM for the user upon demand, especially with
Jetty now having HTTP proxy capability. That would probably be overkill
for a student service, but having a static inverse-proxy with a fixed
mapping to 2 or 3 Jetty instances (possibly running on distinct
machines) would have been a relatively attractive performance and
reliability enhancer per the effort.
<p>
Impressions from the users were mixed. When all of the code being run on
the machine was benign, not memory nor CPU-hoggish, all was well and
the students were generally ambivalent -- this service was something that
they had to use to pass their coursework, servlet coding / debugging was
slower and more cumbersome than 'regular' coding, etc. Having a hot-redeploy-capable container didn't seem whiz-bang to them because they
had no other experience in servlet-land. When the machine was unhappy,
such as if it was 3 AM on the night before the project was due and
one student's code decided to allocate and leak way more memory than
it had any right doing, causing the others to begin to get OutOfMemory
exceptions left and right, then they were (rightly) annoyed and let
me hear about it the next day.
<p>
If I were to re-solve the problem today, I would:
<ul>
	<li> Use some sort of inverse-proxy to smear the load over more
than one JVM for higher-availablity, allowing the Jetty instances to bind to an unprivileged port.
        </li>

	<li> Use the JDK 1.4's internal Kerberos client implementation to
authenticate the campus users. Both of these steps would eliminate all
C code from the system.
        </li>

	<li> Run on at least one bigger dedicated machine.
	</li>

	<li> Encourage the faculty to work with me to ensure that their
central APIs can be loaded by the system classloader as opposed to their
student's web-application loader so we don't end up with 30 copies of
XSTL or SOAP implementations all at once.
        </li>

	<li> Lazy-load web-applications and auth realms upon first demand
instead of at server startup.
        </li>

	<li>Age-out defined web-applications and auth realms if they
have not been referenced in the past X days, so that they'll eventually
be forgotten about completely when a student finished the course.
        </li>
</ul>
<p>
<font size="smaller">[Copyright James Robinson 2004]</font>
<p>
<p>
<center>
<font size=-1>Click to find out <A href="index.html">how to contribute a Case Study</a>.</font>
</center>