Quixote Session Management

HTTP was originally designed as a stateless protocol, meaning that every request for a document or image was conducted in a separate TCP connection, and that there was no way for a web server to tell if two separate requests actually come from the same user. It's no longer necessarily true that every request is conducted in a separate TCP connection, but HTTP is still fundamentally stateless. However, there are many applications where it is desirable or even essential to establish a "session" for each user, ie. where all requests performed by that user are somehow tied together on the server.

HTTP cookies were invented to address this requirement, and they are still the best solution for establishing sessions on top of HTTP. Thus, Quixote's session management mechanism is cookie-based. (The most common alternative is to generate long, complicated URLs with an embedded session identifier. Since Quixote views the URL as a fundamental part of the web user interface, a URL-based session management scheme would be un-Quixotic.)

For further reading: the standard for cookies that is approximately implemented by most current browsers is RFC 2109; the latest version of the standard is RFC 2965. Those RFCs can be found here:

ftp://ftp.isi.edu/in-notes/rfc2109.txt

ftp://ftp.isi.edu/in-notes/rfc2965.txt

In a nutshell, session management with Quixote works like this:

when a user-agent first requests a page from a Quixote application that implements session management, Quixote creates a Session object and generates a session ID (a random 64-bit number). The Session object is attached to the current HTTPRequest object, so that application code involved in processing this request has access to the Session object.
if, at the end of processing that request, the application code has stored any information in the Session object, Quixote saves the session in its SessionManager object for use by future requests and sends a session cookie, called QX_session by default, to the user. The session cookie contains the session ID encoded as a hexadecimal string, and is included in the response headers, eg.
```
Set-Cookie: QX_session="928F82A9B8FA92FD"
```
(You can instruct Quixote to specify the domain and path for URLs to which this cookie should be sent.)
the user agent stores this cookie for future requests
the next time the user agent requests a resource that matches the cookie's domain and path, it includes the QX_session cookie previously generated by Quixote in the request headers, eg.:
```
Cookie: QX_session="928F82A9B8FA92FD"
```
while processing the request, Quixote decodes the session ID and looks up the corresponding Session object in its SessionManager. If there is no such session, the session cookie is bogus or out-of-date, so Quixote raises SessionError; ultimately the user gets an error page. Otherwise, the Session object is attached to the HTTPRequest object that is available to all application code used to process the request.

There are two caveats to keep in mind before proceeding, one major and one minor:

Quixote's standard Session and SessionManager class do not implement any sort of persistence, meaning that all sessions disappear when the process handling web requests terminates. Thus, session management is completely useless with a plain CGI driver script unless you add some persistence to the mix; see "Session persistence" below for information.
Quixote never expires sessions; if you want user sessions to be cleaned up after a period of inactivity, you will have to write code to do it yourself.

Session management demo

There's a simple demo of Quixote's session management in demo/session_demo.cgi and demo/session.ptl. The demo implements a simple session persistence scheme (each session is written to a separate pickle file in /tmp/quixote-session-demo), so running it through CGI is just fine.

I'll assume that you've added a rewrite rule so that requests for /qsdemo/ are handled by session_demo.cgi, similar to the rewriting for /qdemo/ described in web-server.txt. Once that's done, point your browser at

http://<hostname>/qsdemo/

and play around.

This particular application uses sessions to keep track of just two things: the user's identity and the number of requests made in this session. The first is addressed by Quixote's standard Session class -- every Session object has a user attribute, which you can use for anything you like. In the session demo, we simply store a string, the user's name, which is entered by the user.

Tracking the number of requests is a bit more interesting: from the DemoSession class in session_demo.cgi:

def __init__ (self, request, id):
    Session.__init__(self, request, id)
    self.num_requests = 0

def start_request (self, request):
    Session.start_request(self, request)
    self.num_requests += 1

When the session is created, we initialize the request counter; and when we start processing each request, we increment it.

Using the session information in the application code is simple. For example, here's the PTL code that checks if the user has logged in (identified herself) yet, and generates a login form if not:

session = request.session
if session.user is None:
    '''
    <p>You haven\'t introduced yourself yet.<br>
    Please tell me your name:
    '''
    login_form()

(The login_form() template just emits a simple HTML form -- see demo/session.ptl for full source.)

If the user has already identified herself, then she doesn't need to do so again -- so the other branch of that if statement simply prints a friendly greeting:

else:
    ('<p>Hello, %s.  Good to see you again.</p>\n' 
     % html_quote(session.user))

Note that we must quote the user's name, because they are free to enter anything they please, including special HTML characters like & or <.

Of course, session.user will never be set if we don't set it ourselves. The code that processes the login form is just this (from login() in demo/session.ptl):

if request.form:
    user = request.form.get("name")
    if not user:
        raise QueryError("no user name supplied")

    session.user = user

This is obviously a very simple application -- we're not doing any verification of the user's input. We have no user database, no passwords, and no limitations on what constitutes a "user name". A real application would have all of these, as well as a way for users to add themselves to the user database -- ie. register with your web site.

Configuring the session cookie

Quixote allows you to configure several aspects of the session cookie that it exchanges with clients. First, you can set the name of the cookie; this is important if you have multiple independent Quixote applications running on the same server. For example, the config file for the first application might have

SESSION_COOKIE_NAME = "foo_session"

and the second application might have

SESSION_COOKIE_NAME = "bar_session"

Next, you can use SESSION_COOKIE_DOMAIN and SESSION_COOKIE_PATH to set the cookie attributes that control which requests the cookie is included with. By default, these are both None, which instructs Quixote to send the cookie without Domain or Path qualifiers. For example, if the client requests /foo/bar/ from www.example.com, and Quixote decides that it must set the session cookie in the response to that request, then the server would send

Set-Cookie: QX_session="928F82A9B8FA92FD"

in the response headers. Since no domain or path were specified with that cookie, the browser will only include the cookie with requests to www.example.com for URIs that start with /foo/bar/.

If you want to ensure that your session cookie is included with all requests to www.example.com, you should set SESSION_COOKIE_PATH in your config file:

SESSION_COOKIE_PATH = "/"

which will cause Quixote to set the cookie like this:

Set-Cookie: QX_session="928F82A9B8FA92FD"; Path="/"

which will instruct the browser to include that cookie with all requests to www.example.com.

However, think carefully about what you set SESSION_COOKIE_PATH to -- eg. if you set it to "/", but all of your Quixote code is under "/q/" in your server's URL-space, then your user's session cookies could be unnecessarily exposed. On shared servers where you don't control all of the code, this is especially dangerous; be sure to use (eg.)

SESSION_COOKIE_PATH = "/q/"

on such servers. The trailing slash is important; without it, your session cookies will be sent to URIs like /qux and /qix, even if you don't control those URIs.

If you want to share the cookie across servers in your domain, eg. www1.example.com and www2.example.com, you'll also need to set SESSION_COOKIE_DOMAIN:

SESSION_COOKIE_DOMAIN = ".example.com"

Finally, note that the SESSION_COOKIE_* configuration variables only affect Quixote's session cookie; if you set your own cookies using the HTTPResponse.set_cookie() method, then the cookie sent to the client is completely determined by that set_cookie() call.

See RFCs 2109 and 2965 for more information on the rules browsers are supposed to follow for including cookies with HTTP requests.

Writing the session class

You will almost certainly have to write a custom session class for your application by subclassing Quixote's standard Session class. Every custom session class has two essential responsibilities:

initialize the attributes that will be used by your application
override the has_info() method, so the session manager knows when it must save your session object

The first one is fairly obvious and just good practice. The second is essential, and not at all obvious. The has_info() method exists because SessionManager does not automatically hang on to all session objects; this is a defence against clients that ignore cookies, making your session manager create lots of session objects that are just used once. As long as those session objects are not saved, the burden imposed by these clients is not too bad -- at least they aren't sucking up your memory, or bogging down the database that you save session data to. Thus, the session manager uses has_info() to know if it should hang on to a session object or not: if a session has information that must be saved, the session manager saves it and sends a session cookie to the client.

For development/testing work, it's fine to say that your session objects should always be saved:

def has_info (self):
    return 1

The opposite extreme is to forget to override has_info() altogether, in which case session management most likely won't work: unless you tickle the Session object such that the base has_info() method returns true, the session manager won't save the sessions that it creates, and Quixote will never drop a session cookie on the client.

In a real application, you need to think carefully about what data to store in your sessions, and how has_info() should react to the presence of that data. If you try and track something about every single visitor to your site, sooner or later one of those a broken/malicious client that ignores cookies and robots.txt will come along and crawl your entire site, wreaking havoc on your Quixote application (or the database underlying it).

Session persistence

Keeping session data across requests is all very nice, but in the real world you want that data to survive across process termination. With CGI, this is essential, since each process serves exactly one request and then terminates. With other execution mechanisms, though, it's still important -- you don't want to lose all your session data just because your long-lived server process was restarted, or your server machine was rebooted.

However, every application is different, so Quixote doesn't provide any built-in mechanism for session persistence. Instead, it provides a number of hooks, most in the SessionManager class, that let you plug in your preferred persistence mechanism.

The first and most important hook is in the SessionManager constructor: you can provide an alternate mapping object that SessionManager will use to store session objects in. By default, SessionManager uses an ordinary dictionary; if you provide a mapping object that implements persistence, then your session data will automatically persist across processes. For example, you might use the standard 'shelve' module, which provides a mapping object on top of a DBM or Berkeley DB file:

import shelve
sessions = shelve.open("/tmp/quixote-sessions")
session_mgr = SessionManager(session_mapping=sessions)

For a persistent mapping implementation that doesn't require any external libraries, see the DirMapping class in demo/session_demo.cgi.

If you use one of these relatively simple persistent mapping types, you'll also need to override is_dirty() in your Session class. That's in addition to overriding has_info(), which determines if a session object is ever saved; is_dirty() is only called on sessions that have already been added to the session mapping, to see if they need to be "re-added". The default implementation always returns false, because once an object has been added to a normal dictionary, there's no need to add it again. However, with simple persistent mapping types like shelve and DirMapping, you need to store the object again each time it changes. Thus, is_dirty() should return true if the session object needs to be re-written. For a simple, naive, but inefficient implementation, making is_dirty an alias for has_info() will work -- that just means that once the session has been written once, it will be re-written on every request. (This is what DemoSession in demo/session_demo.cgi does.)

The third and final part of the persistence interface only applies if you are using a transactional persistence mechanism, such as ZODB or an industrial-strength relational database. In that case, you need a place to commit or abort the transaction that contains pending changes to the current session. SessionManager provides two methods for you to override: abort_changes() and commit_changes(). abort_changes() is called by SessionPublisher whenever a request crashes, ie. whenever your application raises an exception other than PublishError. commit_changes() is called for requests that complete successfully, or that raise a PublishError exception. They are defined as follows:

def abort_changes (self, session):
    """abort_changes(session : Session)"""

def commit_changes (self, session):
    """commit_changes(session : Session)"""

Obviously, you'll have to write your own SessionManager subclass if you need to take advantage of these hooks for transactional session persistence.

$Id: session-mgmt.txt,v 1.7 2002/10/02 15:27:26 gward Exp $