This document is intended for system administrators who are interested in limiting the logging data they retain. As EFF noted in our Online Service Provider Best Data Practices paper, it is essential for an online service provider (OSP) to formulate a data retention policy appropriate to its needs. In many cases, operational requirements will not call for the retention of data beyond a few days or weeks; in other cases, there is no need to log certain information at all. For example, many web publishers could get all the statistical information they need without logging visitors' IP addresses. Since having more information about users than necessary can be a liability rather than an advantage, we recommend that publishers in this situation configure their web servers not to log IP addresses at all. However, many system administrators don't know exactly what logs they have until they have looked into the question. Often, logging was enabled by defaults -- or by previous system administrators -- and so your systems may be keeping logs you never intended. A significant portion of popular server software defaults to a policy of logging all events or transactions and retaining those logs indefinitely. Few organizations have an operational requirement for this sort of logging, and logging defaults are unlikely to coincide exactly with a carefully-considered logging policy for your organization. Therefore, to enforce your logging and data retention policy after you have formulated it, you will probably need to use technical means to find and delete logs, as well as changing software configurations or setting up log-rotation scripts. Some operating systems come with preinstalled log-rotation software. However, the log-rotation software provided by an operating system vendor is normally -- at best -- able to recognize and rotate logs created by vendor-provided software. If you have installed third-party application software, or software you have written or compiled yourself, it may keep logs completely outside the notice of log rotators. Here are some of the questions system administrators can ask themselves to ensure that their data retention policies are followed as faithfully as possible. Does my operating system have a log rotation utility such as logrotate? Is the log rotation utility enabled and functioning? Does it run automatically at predetermined intervals? Does the configuration of the log rotation software match my logging and data retention policy? Do I have any third-party application software or user-developed software that keeps logs? If so, is the log rotation software aware of them? Are there any logs that might exist in an unexpected place, such as a user's home directory? (For example, Unix sites that use procmail for e-mail delivery often have ~/.procmail/log files on a per-user basis, in parallel to and often redundant with systemwide e-mail log files. Similarly, a site with multiple virtually-hosted web sites may have separate site-by-site web transaction logging -- or logs from user-created CGI scripts -- within individual user home directories. These logs can be difficult to observe with a utility such as lsof, because they are usually not held open by the software that creates them, and may be updated relatively infrequently. Therefore, merely looking for open files or recently updated files may not unearth these sorts of logs.) Do I have application software that logs into a relational database table, such as an Oracle or MySQL database? (For extremely large logs, or logs that are intended to be routinely machine-readable, logging into a database is more likely than logging into a text file.) If so, are the records in the table allowed to persist forever, or are they periodically purged? Do I have applications that are configured to log over a network to a remote machine, using a facility such as syslog's loghost feature? (This is especially common in clusters and in centrally-administered networks.) If so, what is that machine doing with the log data it receives over the network? Do I have logs in binary formats (such as Unix wtmp/utmp or the Windows registry) that might be difficult to recognize as logs on sight? If my data retention policy calls for secure deletion of log files, is my log rotation software or other software that implements the policy using an appropriate secure deletion utility? (Files that are deleted but not overwritten might be undeletable in whole or in part. Some experts have also recommended means of multiply-overwriting files to reduce the chance that usable information might remain on magnetic media even after a single overwriting.) We have created a program called logfinder as a sample means of locating files that might be logs on an existing system. logfinder uses regular expressions to find local files with "log-like" contents; you can customize those expressions if necessary to meet your needs. logfinder requires Python 2 or greater and finds logs in text files on a POSIX-like system. (It might also find some log-like data in binary files if the binary files represent that data in textual form.) logfinder can, if the lsof program is installed and when run with appropriate privileges, detect open files systemwide that grow larger over time. It can also search for text that may indicate logging activity within a given directory hierarchy, or systemwide. As we suggest above, a program like logfinder can find some, but not all, kinds of logging activity. For example, logfinder will generally not identify logs in binary (non-text) formats or logs kept inside databases. Therefore, using a program like logfinder is usually a supplement to, not a replacement for, answering questions like those given above. logfinder should be run as root. If logfinder is invoked without any arguments, it will examine open files systemwide to see whether they grow larger, and then indicate whether files that appear to be growing contain log-like text. (This requires lsof to be installed, and lsof's ability to report open files accurately may depend on your operating system. So far, we've had success with Linux and MacOS X, and some difficulty with FreeBSD and OpenBSD.) If logfinder is given one or more directory names as arguments, it will search for log-like text in files in those directories. For additional information for on-line service providers about their legal rights and obligations, and about formulating a data-retention policy, please consult EFF's OSP site at http://www.eff.org/osp/ As a general resource on logging and data retention, we highly recommend the Log Analysis web site at http://www.loganalysis.org/ Among the useful resources collected there are pages on logfile rotation (including scheduled log deletion and trimming) http://www.loganalysis.org/sections/rotation-tools/index.html and a set of general log analysis tools http://www.loganalysis.org/sections/parsing/generic-log-parsers/index.html While many of these tools are useful principally for retaining or analyzing data, rather than for discarding it, understanding logs and knowing what you have and what can be done with it can help any system administrator in formulating and implementing logging policies. EFF thanks Ben Laurie for helping us think about log recognition and writing a prototype log-searching program. We welcome your comments or enhancements; you can send them to Seth Schoen <schoen@eff.org>.