<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <title>Jetty Optimization Guide</title> <link rel="stylesheet" href="jetty.css" type="text/css"/> <meta name="generator" content="DocBook XSL Stylesheets V1.62.0"/> </head> <body> <h1 class="title">Jetty Optimization Guide</h1> <h1>Introduction</h1> <p>This guide describes techniques for optimizing a deployment of the Jetty HTTP server and servlet container. While some of the techniques described here are particular to the Jetty server, many are generally applicable to any similar servlet server. Note that for a J2EE application server, often it is the web tier that controls the vast majority of requests entering the server. Thus optimization of the web tier is key to the optimization of the entire container.</p> <p>Optimization is more of an art than a science, so this document does not present a specific solution. Instead the issues and parameters that need to be considered are discussed and "rules of thumb" are given where appropriate.</p> <h1>Optimization Overview</h1> <h2>HTTP Traffic Profile</h2> <p>In order to optimize a servlet container it is important to understand how requests are delivered to the container and what resources are used to handle it.</p> <h3>Browser Connection Handling</h3> <p>Each user connecting to the webapp container will be using a browser or other HTTP client application. How that client connects to the server greatly effects the optimization process. Historically browsers would only send a single HTTP request over a TCP connection, which meant that each HTTP request incurred the latency and resource costs of establishing a connection to the server. In order to quickly render a page with many images, each requiring a request, browsers could open up to 8 connections to the server so that multiple requests could be outstanding at once. In some specific circumstances with HTTP/1.0 browsers multiple requests could be sent over a single connection.</p> <p>Modern browsers are now mostly using HTTP/1.1 persistent connections that allow multiple requests per connection in almost all circumstances. Thus browsers now typically open only 1 or 2 connections to each server and send many requests over those connections. Browsers are increasingly using request pipelining so that multiple requests may be outstanding on a single connection, thus decreasing request latency and reducing the need for multiple connections.</p> <p>This situation results in a near linear relationship between the number of server connections and the number of simultaneous users off the server:</p> <pre>SimultaneousUser * NconnectionPerClient == SimultaneousConnections</pre> <h3>Server Connection Handling</h3> <p>For Jetty and almost all java HTTP servers, each connection accepted by the server is allocated a thread to listen for requests and to handle those requests. While non-blocking solutions are available to avoid this allocation of a thread per connection, the blocking nature of the servlet API prevents these being efficiently used with a servlet container.</p> <pre>SimultaneousConnections <= Threads</pre> <h3>Persistent Connections</h3> <p>Persistent connections are supported by the HTTP/1.1 protocol and to a lesser extent by the HTTP/1.0 protocol. The duration of these connections and how they interact with a webapp can greatly effect the optimization of the server and webapp.</p> <p>A typical webapp will be comprised of a dynamically generated page with many static components such as style sheets and/or images. Thus to display a page a cluster of requests are sent for the main page and for the resources that it uses. It is highly desirable for persistent connections to be held at least long enough for all the requests of a single page view to be completed.</p> <p>After a page is served to a user, there is typically a delay while the user reads or interacts with the page. After which another request cluster is sent in order to obtain the next page of the webapp. The delay between request clusters can be anything from seconds to minutes. It is desirable for the persistent connections to be held longer than this delay in order to improve the responsiveness of the webapp and to reduce the costs of new connections. However the cost of this may be many idle connections on the server which are consuming resources for no server throughput.</p> <p>The duration that persistent connections are held is under the control of both the client and the server, either of which can close a connection at any time. The browsers cache settings may also greatly effect the use of persistent connections, as many requests for resources on a page may not be issued or may be handled with a simple 304-NotModified response.</p> <h2>Optimization Objectives</h2> <p>There are several key objectives when optimizing a webapp container, unfortunately not all of them are compatible and you are often faced with a trade off between two or more objectives.</p> <h3>Maximize Throughput</h3> <p>Throughput is the primary measure used to rate the performance of a web container and it is mostly measured in requests per second. Your efforts in optimizing the container will mainly be aimed at maximizing the request rate or at least ensuring a minimal rate is achievable. However you must remember that request rate is an imperfect measure as not all requests are the same and that it is simple to measure a request rate for load that is unlike a real load. Specifically:</p> <ul> <li>Containers will be more efficient handling high requests rates from a few long held persistent connections. Unfortunately this is often not a real traffic profile and requests more often come in from many connections which are mostly idle and/or short held. Thus it is key to also consider connection rate or at least the number of simultaneous connections when consider the meaning of a request rate figure.</li> <li>Requests with content or large responses take more time to package and process and may be exposed to more network inefficiencies. Thus requests rates of realistically sized requests must be considered and in some circumstances it is useful to consider data rate.</li> <li>There are several different ways that a webapp may serve a request and features that may be applied that will effect throughput, e.g. Static versus dynamic content, fixed versus variable length or security. The complexity of the requests must be considered when measuring throughput.</li> </ul> <h3>Minimize Latency</h3> <p>Latency is a delay in the processing of requests and it is desirable to reduce latency so that web applications appear responsive to the users. There are two key sources of latency to consider:</p> <ul> <li>The latency between when a request is initiated and when the handling of that request starts. This latency is effected by the time taken to establish a connection and the scheduling of threads within the server.</li> <li>The latency between requests in a request cluster. This latency can be large if the response for a previous request must complete before the next request can be issued. Browsers reduce this latency by using multiple connections or pipelining requests over a single connections.</li> </ul> <p>While latency is not directly related to throughput, there is often a trade off to be made between reducing latency and increasing throughput. Server resources that are allocated to idle connections may be better deployed handing actual requests.</p> <h3>Minimize Resources</h3> <p>The processing of each request consumes server resources in the form of memory, CPU and time. Memory is used for buffers, program stack space and application objects. Keeping memory usage within a servers physical available memory is important for maximum throughput. Conversely using a servers virtual memory may allow increased simultaneous users and can also decrease latency.</p> <p>Servers will have 1 or more CPUs available to process requests. It is important that the scheduling of these processors is done in such a way that they spend more time handling requests and less time organizing and switching between tasks.</p> <p>The servers often allocate resources based on time and it is important to tune timeouts so that those resources have a high probability of being productively used.</p> <h3>Graceful degradation</h3> <p>Much of optimization is focused on providing maximum throughput under average or high offered load rates. However for many systems that wish to offer high availability and high quality of service, it is important to optimize the performance under extreme offered load, either to continue providing reasonable service to some of the offered load or to gracefully degrade service to all of the offered load.</p> <h1>Analyzing Traffic</h1> <p>Before beginning to optimize the configuration of your HTTP server and servlet container, it is fundamental that you analyse the profile of the traffic you expect your server to handle. This can be estimated or measured from an actual live server. The type of information that is useful gather includes:</p> <table border="1"> <tr> <th class="attribute">Attribute</th> <th class="variations">Variations</th> <th>Comment</th> </tr> <tr> <td>Request rate</td> <td>average, peak</td> <td>The number of requests per second</td> </tr> <tr> <td>Connection rate</td> <td>average, peak</td> <td>The number of new connections established with the server per second</td> </tr> <tr> <td>Simultaneous Users</td> <td>average, peak</td> <td>The number of users that a simultaneously interacting with the server.</td> </tr> <tr> <td>Requests per page</td> <td>average</td> <td>The number of requests that is required to render a page of the webapp. Includes images and style sheets, but may be affected by client caching.</td> </tr> <tr> <td>Page view time</td> <td>average</td> <td>The period of time that a typical user will view a page before requesting another from the webapp.</td> </tr> <tr> <td>Session duration</td> <td>average</td> <td>The period of time that an average user will remain in contact with the server. This can be used to estimate session and memory requirements</td> </tr> </table> <h2>Measuring Traffic</h2> <p>The most accurate way to measure the attributes listed above is to measure them on a live server that is handling real traffic for the webapp that you are trying to optimize. Statistics and log analysis can then be used to derive the information above.</p> <p>Jetty supports statistics collection at both the server and context level. The following configuration excerpt shows how to turn on statistics for the server and for a particular web application:</p> <pre><Configure class="org.mortbay.jetty.Server"> ... <Call name="addWebApplication"> <Arg>/myapp</Arg> <Arg>./webapps/myapp</Arg> <Set name="statsOn">false</Set> </Call> ... <Set name="statsOn">false</Set> ... </Configure> </pre> <p>While statistics can be enabled as above, it is probably just as convenient to turn them on using a JMX agent to the Jetty MBeans. If Jetty is run with JBoss or within a JMX server, then a JMX agent can be used to configure and view statistics collection</p> <h3>Jetty HttpServer Statistics</h3> <p>The following statistics attributes are available on the org.mortbay.http.HttpServer class or via the associated MBean which is normally named like "org.mortbay:Jetty=0":</p> <table border="1"> <tr> <th>Attribute</th> <th>Comment</th> </tr> <tr> <td>statsOn</td> <td>True if statistics collection is turned on.</td> </tr> <tr> <td>statsOnMs</td> <td>Time in milliseconds stats have been collected for</td> </tr> <tr> <td>statsReset()</td> <td>Reset statistics</td> </tr> <tr> <td>connections</td> <td>Number of connections accepted by the server since statsReset() called</td> </tr> <tr> <td>connectionsOpen</td> <td>Number of connections currently open that were opened since statsReset() called</td> </tr> <tr> <td>connectionsOpenMax</td> <td>Maximum number of connections opened simultaneously since statsReset() called</td> </tr> <tr> <td>connectionsDurationAve</td> <td>Sliding average duration in milliseconds of open connections since statsReset() called</td> </tr> <tr> <td>connectionsDurationMax</td> <td>Maximum duration in milliseconds of an open connection since statsReset() called</td> </tr> <tr> <td>connectionsRequestsAve</td> <td>Sliding average number of requests per connection since statsReset() called</td> </tr> <tr> <td>connectionsRequestsMax</td> <td>Maximum number of requests per connection since statsReset() called</td> </tr> <tr> <td>errors</td> <td>Number of errors since statsReset() called. An error is a request that resulted in an exception being thrown by the handler</td> </tr> <tr> <td>requests</td> <td>Number of requests since statsReset() called</td> </tr> <tr> <td>requestsActive</td> <td>Number of requests currently active</td> </tr> <tr> <td>requestsActiveMax</td> <td>Maximum number of active requests since statsReset() called</td> </tr> <tr> <td>requestsDurationAve</td> <td>Average duration of request handling in milliseconds since statsReset() called</td> </tr> <tr> <td>requestsDurationMax</td> <td>Get maximum duration in milliseconds of request handling since statsReset() called.</td> </tr> </table> <h3>Jetty HttpContext Statistics</h3> <p>The following statistics attributes are available on the org.mortbay.http.HttpContext class or via the associated MBean which is normally named like "org.mortbay:Jetty=0,HttpContext=0,context=/myappp":</p> <table border="1"> <tr> <th>Attribute</th> <th>Comment</th> </tr> <tr> <td>statsOn</td> <td>True if statistics collection is turned on</td> </tr> <tr> <td>statsOnMs</td> <td>Time in Milliseconds that stats have been collected for</td> </tr> <tr> <td>statsReset()</td> <td>Reset statistics</td> </tr> <tr> <td>requests</td> <td>Number of requests since statsReset() called</td> </tr> <tr> <td>requestsActive</td> <td>Number of requests currently active</td> </tr> <tr> <td>requestsActiveMax</td> <td>Maximum number of active requests since statsReset() called</td> </tr> <tr> <td>responses1xx</td> <td>Number of responses with 1xx status (Informal) since statsReset() called</td> </tr> <tr> <td>responses2xx</td> <td>Number of responses with 2xx status (Success) since statsReset() called</td> </tr> <tr> <td>responses3xx</td> <td>Number of responses with 3xx status (Redirection) since statsReset() called</td> </tr> <tr> <td>responses4xx</td> <td>Number of responses with 4xx status (Client Error) since statsReset() called</td> </tr> <tr> <td>responses5xx</td> <td>Number of responses with 5xx status (Server Error) since statsReset() called</td> </tr> </table> <h2>Estimating Traffic</h2> <p>It may not be possible to measure actual live traffic of a deployment to be optimized. In this case estimates must be made to obtain a traffic profile on which to base your optimization. The following work sheets give some examples of how this may be done:</p> <table border="1"> <tr> <th>Attribute</th> <th>Formula</th> <th>Example</th> <th>Comment</th> </tr> <tr> <td>SimultaneousUsers</td> <td>-</td> <td>1000</td> <td>Estimated from marketing or other sources.</td> </tr> <tr> <td>UserSessionDuration</td> <td>-</td> <td>180 seconds</td> <td>Time a single user spends interactive with the webapp. Estimated from marketing, usage trials or other sources.</td> </tr> <tr> <td>AvePageViewTime</td> <td>-</td> <td>30 seconds</td> <td>Time between page requests from a single user. Estimated from marketing or usage trials or other sources.</td> </tr> <tr> <td>PagesPerUserSession</td> <td>UserSessionDuration/PageViewTime</td> <td>6</td> <td> </td> </tr> <tr> <td>RequestsPerPageNoCache</td> <td>-</td> <td>12</td> <td>Calculated from inspection of HTML</td> </tr> <tr> <td>RequestsPerPageCache</td> <td>-</td> <td>3</td> <td>Calculated from inspection of HTML and usage trials.</td> </tr> <tr> <td>RequestsPerUserSession</td> <td>RequestsPerPageNoCache+ (RequestsPerPageCache* (PagesPerUserSession-1))</td> <td>27</td> <td> </td> </tr> <tr> <td>RequestsPerSecPerUser</td> <td>RequestsPerUserSession/ UserSessionDuration</td> <td>0.15</td> <td> </td> </tr> <tr> <td>RequestsPerSec</td> <td>SimultaneousUsers* RequestsPerSecPerUser</td> <td>150</td> <td> </td> </tr> <tr> <td>ConnectionsPerUser</td> <td>-</td> <td>2.5</td> <td>Measured from usage trials with estimated browser mix.</td> </tr> <tr> <td>AverageConnections</td> <td>SimultaneousUsers* ConnectionsPerUser</td> <td>2500</td> <td></td> </tr> <tr> <td>ConnectionsPerSecond</td> <td>ConnectionsPerUser* SimultaneousUsers/ UserSessionDuration</td> <td>13.88</td> <td>Assuming persistent connections that will span entire user session. If connections will not span session the multiply by PagesPerUserSession</td> </tr> <tr> <td>PeakRequestsPerSecond</td> <td>2*ConnectionsPerSecond + (RequestsPerPageNoCache- RequestsPerPageCache) * SimultaneousUsers/ UserSessionDuration</td> <td>77.76</td> <td>Based on SimultaneousUsers doubling in UserSessionDuration. The formula represents double the normal requests rate, plus the additional load of the new users loading the initial page with no cache.</td> </tr> </table> <p>This work sheet is only indicative of an estimate process that can be used, specially the method for determining the peak request rate. If possible , several estimation techniques should be used and the worse case numbers assumed.</p> <h2>Clustered Traffic</h2> <p>When running a cluster of application servers, it is often desirable to be able to handle the max expected load in the advent of a node failure. Thus once the single node traffic has been estimated or measured, the traffic loads for failure modes can be calculated:</p> <table border="1"> <tr> <th>Nodes in Cluster</th> <th>Failed Nodes</th> <th>Load</th> </tr> <tr> <td>2</td> <td>1</td> <td>200%</td> </tr> <tr> <td>3</td> <td>1</td> <td>150%</td> </tr> <tr> <td>3</td> <td>2</td> <td>300%</td> </tr> <tr> <td>4</td> <td>1</td> <td>133%</td> </tr> <tr> <td>4</td> <td>2</td> <td>200%</td> </tr> </table> <h2>Generating Traffic</h2> <p>Once the expected traffic profile has been analysed, a test client can be used to generate load on the server that reflects realistic load. It is important to make sure that the test client used is generating realistic load:</p> <ul> <li>Are persistent connections supported? Persistent connections are much more efficient than non persistent connections and a realistic mix should be used to represent the expected browser population using the server.</li> <li>Are connections held idle for realistic times? Idle connections reduce latency for individual users at the expense of server resources. A test client that does not idle connections will not test the servers ability to balance these competing resource requirements.</li> <li>Does the test client account for client caching and if-modified-since headers? Most pages of a webapp are rendered from a cluster of requests for the initial page and it's included resources such as images and style sheets. Most client browser will cache many of the included resources and may often issue no requests for them or a request with an if-modified-since header that can be responded to with a simple 304-Not-Modified response. Test clients that do not model client caching will be measuring an unlikely worse case scenario.</li> <li>Is the test client run on a different machine to the server? Local networking has different characteristics to remote networking and a local test client will consume resources that could have been used by the server</li> </ul> <h1>Optimizing Jetty</h1> <p>Jetty has a few features that have been deprecated or that are particularly resource hungry. Before starting optimizing the more conventional attributes it is worthwhile to make sure that these features are turned off or minimally configured.</p> <h2>Request Log Buffering</h2> <p>The Jetty request log mechanism has the ability to buffer its output in memory before writing this to a file, which was intended to reduce synchronization load on the server. Unfortunately analysis of actual performance shows that the a server with buffering turned on has around 5% maximum throughput. Prior to Jetty release 4.2.9 log buffering was turned on by default. This should be turned off:</p> <pre><Configure class="org.mortbay.jetty.Server"> ... <Set name="RequestLog"> <New class="org.mortbay.http.NCSARequestLog"> <Arg><SystemProperty name="jetty.home" default="."/>/logs/yyyy_mm_dd.request.log</Arg> <Set name="retainDays">90</Set> <Set name="append">true</Set> <Set name="extended">false</Set> <Set name="buffered">false</Set> <Set name="logTimeZone">GMT</Set> </New> </Set> ... </Configure> </pre> <h2>Statistics</h2> <p>The Jetty server supports statistic collection at the server and at the context level. While stats collection itself does not involve significant work load, it does require synchronization in order to correctly count some statistics. On a multi CPU machine, this extra synchronization could significantly affect the performance of the server, thus statistics should be turned off while optimizing the server. Note that this is somewhat counter productive, as the statistics are very useful for measuring the results of optimization. Thus the recommended use of the server statistics is to measure the profile of real load being handled by the server. This profile can then be used in generating test load, hopefully from a test client which itself can generate statistics which can be used to evaluate optimizations.</p> <h2>NIO SocketChannelListener</h2> <p>Jetty releases from release 4.0.0 to 4.2.9 contained the SocketChannelListener implementation of the HttpListener interface. This implementation used the features of the java 1.4 NIO library to use non-blocking sockets for idle connections. The intent was to avoid allocating a java thread to idle connections. Unfortunately, due to the nature of the servlet API, the sockets had to be returned to blocking mode before control was passed to a servlet. The resulting constant changing of the NIO select sets proved to consume significantly more system resources than was saved by reducing the required number of threads. The SocketChannelListener has been deprecated since 4.2.10 and should not be used for any release unless for experimental purposes.</p> <h2>Max Read Time</h2> <p>The Jetty HTTP Listeners in versions prior to 4.1.1 had a parameter called maxReadTime, which was used to limit the time a request handler would wait for request content (e.g. on a form POST). This parameter, like maxIdleTime, was used to set the SO timeout value on the underlying connection socket. Unfortunately, if the maxReadTime value was different to the maxIdleTime value, then the SO timeout value was changed twice for every request. This proved to cause a significant reduction of throughput of the server, in the order of 10%. Thus for Jetty versions prior to 4.1.1 it is important to set the maxIdleTime and maxReadTime parameters to the same value:</p> <pre><Configure class="org.mortbay.jetty.Server"> <Call name="addListener"> <Arg> <New class="org.mortbay.http.SocketListener"> <Set name="port">8080</Set <Set name="minThreads">25</Set> <Set name="maxThreads">255</Set> <Set name="maxIdleTimeMs">60000</Set> <Set name="maxReadTimeMs">60000</Set> </New> </Arg> </Call> ...</pre> <p>For Jetty versions 4.1.1 or later, maxReadTime should not be set as it is ignored and produces a warning.</p> <h1>Optimizing Memory</h1> <p>Memory is a key resource that must be managed in any optimization of a web container. The procedure is to:</p> <ol> <li>Measure the static and dynamic memory requirements of your application.</li> <li>Configure the JVMs memory limits</li> <li>Adjust the thread pool to constrain dynamic memory use.</li> <li>Tune garbage collection.</li> </ol> <p>To tune memory usage, using a profiling tool like optimizeIt of jProbe can be very useful, however it can also be done simply by monitoring the memory allocated to the process by the operating system.</p> <h2>Measuring memory usage</h2> <p>Running a webapp can consume memory for:</p> <ul> <li>Statically allocated memory during initialization.</li> <li>Heap space allocated for Session objects per user of the webapp.</li> <li>Stack space allocated per thread.</li> <li>Heap space allocated for objects created during the processing of requests.</li> </ul> <h3>Check for memory leaks</h3> <p>Before optimizing your memory, it is important to establish that your webapp does not have any memory leaks. This is to say that no memory allocated when processing requests that cannot be freed when the server returns to idle. This can be determined by running the application with a constant low to medium load and monitoring the memory usage. The memory allocated should increase to a level and then stabilize. If the memory continues to grow and/or a OutOfMemory exception is eventually thrown, then the application has an object/memory leak. Such an application will not be able to run long term and the leak should be fixed before optimizing or deploying the webapp.</p> <p>Note that application data caches or poor garbage collection (GC) behaviour may appear as a memory leak. If possible disable application caches or configure them to small sized in order to test the applications underlying memory requirements. The JVM may be forced to perform a GC after a fixed number of requests by the requestsPerGC attribute of HttpServer. This can be set to a low value to avoid large fluctuations in memory usage during this measurement phase: </p> <pre><Configure class="org.mortbay.jetty.Server"> ... <Set name="requestsPerGC">100</Set> ... </Configure> </pre> <h3>Stack space Usage</h3> <p>JVMs allocate a fixed amount of stack space per thread created. The stack space is used for storing parameters and other objects associated with a method call. The more nested method calls that you application requires (deeper stacks), then more stack space is required. Typically the default stack settings for JVMs are rather generous and are allocated per thread, thus significant savings can be made by tuning this allocation.</p> <p>For many JVMs, the stack space allocation is controlled with the -Xss option and the following command runs Jetty with 96kb allocated per stack:</p> <pre>java -Xss96k -jar start.jar</pre> <p>The simplest way to measure your stack requirements is to reduce the stack allocated until complex requests fail with StackOverflowException. You then need to increase your stack allocation with a good safety margin, the size of which will depend greatly on your application as some may have large variation in stack usage, specially those that use recursion. </p> <h3>Static & Dynamic Memory Usage</h3> <p>An estimate of the static and dynamic heap space usage is needed to optimize the memory allocation. This is best done by measuring memory usage under realistic steady load at several load levels. The Jetty HttpListener should be configured to have a low minimal threads setting, so that idle threads do not effect the measurements. </p> <p>The following table shows some results for a simple test for memory usage using the unix ps command to determine the resident memory set size:</p> <table border="1"> <tr> <th>Active connections/threads</th> <th>Process size in kb</th> <th>kb per connection</th> </tr> <tr> <td>0</td> <td>23076</td> <td></td> </tr> <tr> <td>20</td> <td>27540</td> <td>224</td> </tr> <tr> <td>40</td> <td>29352</td> <td>90</td> </tr> <tr> <td>60</td> <td>31868</td> <td>125</td> </tr> <tr> <td>100</td> <td>33852</td> <td>49</td> </tr> <tr> <td>150</td> <td>38264</td> <td>88</td> </tr> </table> <p>Extrapolating from this table gives the following approximate formula for memory usagage for this webapp:</p> <pre>memoryRequired = 23Mb + threads * 200kb</pre> <p>Ideally this formula should be tested with direct measurement under all load levels.</p> <p>This formula can now be used to calculate the memory requirements for your system and the JVM parameters should set to ensure that enough memory is available when the maximum number of threads are in use. For the above example, if a maximum of 500 threads are required (see below) and a 128k stack size is used, then 120MB of memory is required and the JVMs memory parameters should be configures as follows:</p> <pre>java -Xss128k -Xms120m -jar start.jar</pre> <p>Alternately, the memory formula can be used in reverse. If a known amount of physical or virtual memory is available and must not be exceeded, then the maximum number of threads can be determined. </p> <h2>Clustered Memory Usage.</h2> <p>Memory usage for a node in a cluster cannot be measured by looking at a single node. If distributed sessions or EJBs are being used, then memory used on one node may be replicated on all nodes. For example, with distributed HTTP sessions, each node must have capacity to store all the sessions for all the nodes in the cluster.</p> <p>For this reason, it is often desirable to not have large homogenous clusters. Rather a cluster of clusters topology can reduce the memory and failure contingency load on each node.</p> <h1>Optimizing Threads</h1> <p>Once you have determined your traffic profile and your memory profile, it is now possible to tune your server by adjusting the parameters of the thread pool. Each Jetty HttpListener has a pool of threads that is used to allocate threads to accepted connections. The following parameters can be used to configure the thread pool of each listener:</p> <table border="1"> <tr> <th>Parameter</th> <th>Comment</th> </tr> <tr> <td>maxThreads</td> <td>limit to the number of threads that can be allocated to connections for that HTTP listener. This will effectively limit the number of simultaneous users of the server as well as the maximum memory usage. </td> </tr> <tr> <td>minThreads</td> <td>The minimum number of unused threads to keep within the thread pool. A large number of unused threads will allows a server to respond to a sudden increase in load with little latency. More importantly, a HTTP listener is considered to be low on resources once it's pool cannot allocate minThreads unused threads without exceeding max threads.</td> </tr> <tr> <td>maxIdleTimeMs</td> <td>The maximum time in milliseconds that a thread can be allocated to a connection without a request being received. This limits the duration of idle persistent connections.</td> </tr> <tr> <td>lowResourcePersistTimeMs</td> <td>An alternative value for maxIdleTimeMs to be used when the listener is low on resources (see minThreads).</td> </tr> <tr> <td>poolName</td> <td>If multiple HTTP Listeners are used, those with the same pool name will share the same thread pool. This avoid one listener running low on threads while another has idle threads.</td> </tr> </table> <h2>Setting maxThreads</h2> <p>The primary objective of the maxThread setting is to protect the server from excess resource utilization from high connection or request rates. Without a limit to the maximum threads, it would be possible for arbitrary high load to be accepted by the server which would eventually lead to one of the following failure modes:</p> <ul> <li>Out of memory. Each accepted connection/thread consumes memory and unlimited threads will eventually result in an OutOfMemoryException. Note that the memory allocated to the JVM can be increased to avoid this limit, but at some level physical memory will be exceeded and the server performance will decline. Eventually virtual memory can be exhausted. </li> <li>Out of threads. Threads are normally implemented by the host operating system and are a finite resource that can be exhausted. The OS can normally be tuned to increase this limit, but not indefinitely as system performance will eventually degrade.</li> <li>Out of file descriptors. TCP/IP connections are implemented by most operating systems using file descriptors and are a finite resource that can be exhausted. The OS can normally be tuned to increase this limit, but not indefinitely as system performance will eventually degrade.</li> <li>100% CPU. Each connection accepted will allows a flow of requests into the system, each which takes CPU to process. Once 100% CPU has been reached any additional connections accepted are just increasing latency for all connections and eventually reducing total throughput.</li> </ul> <p>There are two main approaches to setting maxThreads:</p> <ol> <li>If a good estimate or measurement of the maximum load is known, then maxThreads is set high enough to handle this and then system verified to check that none of the failure modes are breached. This approach results in a server that is good enough for the webapp and can leave server resources available for other uses.</li> <li>Various maxThreads values are tested with a test client generating a load of approximately the same value. The tested maxThreads value is increased until such time as one of the failure modes above is detected or the measured throughput starts to decrease. This approach results in a server that uses all the system resources and requires a dedicated machine.</li> </ol> <p>If with either of these approaches, the estimated, measured or required maximum load requires a maxThread value that exhaust the system memory, CPU, connections or other resources, then the machine is not sufficient for that webapp. In this case, additional server resources (memory, CPU, kernel configuration) is required or a clustering solution can be considered.</p> <p>Once a server has reached it's maximum number of threads, then any new connections attempted are held by the operating system until either they time out, a thread becomes available to accept the connection or they are refused when the operating system queue becomes full.</p> <h2>Setting minThreads</h2> <p>The minThreads value is used to control how a server degrades under extreme load. Once there are less than minThreads available in the thread pool, then the lowResourcePersisteTimeMs parameter can be used to free up other idle threads.</p> <p>If a good estimate or measure of average and maximum load are known, then the minThreads value can be set to half the difference between the average and maximum.</p> <pre>minThreads == (maxThreads - averageConnections) / 2</pre> <p>Thus if maxThreads is 3000 and averageConnections is 2500, then minThreads could be set at 250, so that low resource timeouts will be applied once the actual connections exceeds 2750.</p> <p>Alternately, minThreads may be set to protect excess memory usage. If maxThreads requires more memory than is physically available, then minThreads can be set to free resources once physical memory is exceeded. Using the memory formula example from above and if 47Mb of physical memory is available on the system (when running the OS), then for maxThreads == 200:</p> <pre>minThreads == maxThreads - ( ( 47Mb - 23Mb ) / 200kb ) == 80</pre> <h2>Setting maxIdleTimeMs</h2> <p>The idle time of a thread is used to limit the time that an persistent connection can be idle. Higher values are desirable to reduce latency for a user and avoid the expense of recreating TCP/IP connections. However, if the value is set too high, it wil result in many connections being left open when the user is no longer browsing the webapp and the resources allocated to it are effectively wasted for a long period of time.</p> <p>A good value to use for the maxIdleTimeMs is slightly longer than the average page view time for the application, so that persistent connections are held long enough to span the time between page requests for an average user.</p> <h2>Setting lowResourcesPersistTimeMs</h2> <p>A HTTP Listener is considered low on resources if there are less than minThreads available in the thread pool and a lowResourcePersistTimeMs can be set to replace maxIdleTimeMs so that idle connections can be freed for other connections. The reasoning for this is that once a server is low on resources, there is little benefit keeping resources allocated to idle connections in the hope that new requests will come from them.</p> <p>With a low lowResourcesPersistTimeMs value set, performance will degrade more gracefully as maxThreads is approached.</p> <p>The value of lowResourcePersistTimeMs should be long enough to ensure that all requests in the cluster for a page view can be served by a persistent connection. This is typcially governed by the network latency and should not be more than a few seconds and can be as low as a few hundred milliseconds for a good network.</p> <h2>Setting poolName</h2> <p>If a server has multiple HTTP listeners configured, it may be desirable to share the thread pool between listeners, so that one listener is not starved or resources if the other has free threads. If you wish to reserve capacity for a particular listener, then a shared thread pool should not be used:</p> <pre><Configure class="org.mortbay.jetty.Server"> ... <Call name="addListener"> <Arg> <New class="org.mortbay.http.SocketListener"> <Set name="port">8080</Set> <Set name="minThreads">80</Set> <Set name="maxThreads">200</Set> <Set name="maxIdleTimeMs">30000</Set> <Set name="lowResourcePersistTimeMs">2500</Set> <Set name="poolName">Listener</Set> </New> </Arg> </Call> <Call name="addListener"> <Arg> <New class="org.mortbay.http.SunJsseListener"> <Set name="port">443</Set> <Set name="poolName">Listener</Set> <Set name="keystore">./etc/demokeystore</Set> <Set name="password">OBF:1vny1zlo1x8e1vnw1vn61x8g1zlu1vn4</Set> <Set name="keyPassword">OBF:1u2u1wml1z7s1z7a1wnl1u2g</Set> </New> </Arg> </Call> ... </Configure> </pre> <h1>Other Optimizations</h1> <h2>Buffering</h2> <p>Providing larger buffers for the HTTP Listeners allows more efficient processing and generation of content, with less blocking and content switching. It also allows the TCP/IP protocol to more efficiently run it's sliding window protocol and avoid network latencies. Prior to Jetty release 4.2.10, the default buffer size was 4096 bytes. This has now been increased to 8192 bytes. The buffer size can be set as follows: </p> <pre><Configure class="org.mortbay.jetty.Server"> ... <Call name="addListener"> <Arg> <New class="org.mortbay.http.SocketListener"> <Set name="port">8080</Set> <Set name="minThreads">80</Set> <Set name="maxThreads">200</Set> <Set name="maxIdleTimeMs">30000</Set> <Set name="lowResourcePersistTimeMs">2500</Set> <Set name="poolName">Listener</Set> <Set name="bufferSize">8192</Set> </New> </Arg> </Call> ... </Configure> </pre> <h2>Security</h2> <p>Authenticated security constraints on a webapp can be expensive to check as often a realm is implemented using crypto algorithms or with a remote AAA server or database involved. </p> <p>Frequently a webapp page is constructed with many images that are not sensitive and do not need to be protected with an authenticated security constraint. Significant performance gains can be obtained by excluding such static resources from a security constraint.</p> <p>For example consider a webapp that protects the directory /private with an authenticated constraint, but has a number of non-sensitive images in the /private/images directory, then the following web.xml excerp can be used to protect the private directory without the expense of protecting the images directory.</p> <pre> ... <security-constraint> <web-resource-collection> <web-resource-name>Authed User Required</web-resource-name> <url-pattern>/private/*</url-pattern> </web-resource-collection> <auth-constraint> <role-name>*</role-name> </auth-constraint> </security-constraint> <security-constraint> <web-resource-collection> <web-resource-name>Images Not Protected</web-resource-name> <url-pattern>/private/images/*</url-pattern> <http-method>GET</http-method> <http-method>HEAD</http-method> </web-resource-collection> </security-constraint></pre> <h2>Logging</h2> <p>Logging of requests can add extra CPU load per request and an extra synchronization point. The following points should be considered to optimize the logging configuration: </p> <ul> <li>Is logging really required? Many webapps collect requests logs that are never viewed or analyzed. If the logs are unlikely to be used, then it would be better to not generate them. Note that there is a security audit aspect to collecting request logs that may require them to be generated even if seldom viewed.</li> <li>Is the extended log format required? The extra content of the extended log is only useful if detailed log analysis is being performed. </li> <li>Are all request required to be logged? Images and style sheets often do not add any significant information to a request log. The ignorePaths attribute of the NCSARequestLog class can be used to exclude some paths from the log.</li> <li>Turn off buffering.</li> </ul> <p>The following request log configuration applies the points above.</p> <pre><Configure class="org.mortbay.jetty.Server"> ... <Set name="RequestLog"> <New class="org.mortbay.http.NCSARequestLog"> <Set name="filename">./logs/yyyy_mm_dd.request.log</Set> <Set name="buffered">false</Set> <Set name="retainDays">90</Set> <Set name="append">true</Set> <Set name="extended">false</Set> <Set name="logTimeZone">GMT</Set> <Set name="ignorePaths"> <Array type="String"> <Item>/images/*</Item> <Item>*.css</Item> </Array> </Set> </New> </Set> ... </Configure></pre> <h2>Application</h2> <p>The way a web application is written can greatly effect the efficiency of the service. The following points should be considered when writing or reviewing your webapplication: </p> <ul> <li>Do not flush the response output stream or writers. This can result in inefficient packet fragmentation.</li> <li>If possible, implement the HttpServlet.getLastModfied() method so that content is only generated and served if the browser does not have a cached copy of the page.</li> <li>If possible, set the content length of the content served. This allows simple persistent connections for both HTTP/1.0 and HTTP/1.1 clients.</li> </ul> </body> </html>