Paul Khuong mostly on Lisp

Sun, 28 Aug 2011

Specify absolute deadlines, not relative timeouts, part 2

In the last post, I advocated the use and design of blocking interfaces based on deadlines rather than timeouts. I finally found the POSIX snippet that I was thinking of: it’s in the specification for pthread_cond_timedwait. The post also got me two very interesting comments.

One reader noted that they had had a lot more trouble understanding and fixing performance issues in a highly-threaded program when they used deadlines instead of timeouts. I suppose that using deadlines based on something like wallclock-time can introduce yet another variable in an already hard-to-understand system. In the end, I still feel that’s more than outweighed by the consistency with respect to the rest of the world, especially given that timeouts can be constructed so easily on top of deadlines.

Another raised an important issue: which time should our deadlines be based on? The regular system clock, like UTC time, is affected by many events like leap seconds, ntp drift, the user changing the time to fool Farmville, …. In fact, it’s not even guaranteed to be monotonic: the system time can be moved backward. (Bonus points if you can determine how hardware suspend should be detected and/or handled.)

Of course, timeouts share all issues regarding nonmonotonic clocks, but I suppose most of us don’t think about it. POSIX usually specifies that timeouts must ignore time adjustments, or at least, backward ones, and linux/glibc seem to mostly achieve that by using a monotonic clock; it’s not clear how a portable program can achieve the same effect, though.

This blog post has an interesting overview of the issue, and a list of buggy programs and APIs that misuse wall-clock time.

The realtime POSIX extension includes a partial solution to this non-monotonic clock issue: clock_gettime, along with CLOCK_MONOTONIC, gives access to time values that never go backward. Unfortunately, that’s not always available (in particular, it’s absent on Solaris, OS X and Windows); the author of the previously-mentioned blog post also has a tiny portability wrapper to provide such time values on POSIX platforms with CLOCK_MONOTONIC, and on Solaris and OS X as well. Still, platforms without clock_gettime don’t necessarily expose blocking calls with deadlines based on a monotonic clock either (e.g. OS X doesn’t have one for its Mach semaphores), so only having a sane clock isn’t that useful.

This isn’t only a theoretical or highly-improbable issue either; SBCL has had a bug caused by timeouts for quite a while. The internals include this function, which resumes nanosleeping when interrupted by signal handling.

(defun nanosleep (secs nsecs) 
  (with-alien ((req (struct timespec)) 
               (rem (struct timespec))) 
    (setf (slot req ’tv-sec) secs 
          (slot req ’tv-nsec) nsecs) 
    (loop while (and (eql sb!unix:eintr 
                          (nth-value 1 
                                     (int-syscall ("nanosleep" (* (struct timespec)) 
                                                               (* (struct timespec))) 
                                                  (addr req) (addr rem)))) 
                     ;; KLUDGE: [...] 
                     #!+darwin 
                     (let ((rem-sec (slot rem ’tv-sec)) 
                           (rem-nsec (slot rem ’tv-nsec))) 
                       (when (or (> secs rem-sec) 
                                 (and (= secs rem-sec) (>= nsecs rem-nsec))) 
                         ;; Update for next round. 
                         (setf secs  rem-sec 
                               nsecs rem-nsec) 
                         t))) 
          do (setf (slot req ’tv-sec) (slot rem ’tv-sec) 
                   (slot req ’tv-nsec) (slot rem ’tv-nsec)))))

On OS X, when nanosleep is interrupted by a signal, the second argument is updated by computing the time remaining in the timeout, once the signal handler returns. Of course, that leads to an interesting situation when the subtracted time is greater than the time to nanosleep for (e.g. a signal handler consumes two seconds before returning to a 1-second nanosleep): the “remaining” timeout underflows into a very long timeout.

Other platforms only subtract the time elapsed from the execution of nanosleep until the signal is received. At least, there’s never any underflow in the “remaining” timeout, but that value, while always sane, is still pretty much useless. If the loop is executed 5 times (i.e. nanosleep is interrupted 5 times), and each signal takes 1 second to handle, the function be 5 second late.

So, outside OS X, the “remaining” timeout computed by nanosleep is subtly useless. On OS X, it’s only even more subtly useless: we have the same problem when a signal hits us between two calls to nanosleep.

POSIX recommends the use of clock_nanosleep if the issue above with interruptions matters. In addition to being based on a deadline rather than a timeout, it lets us specify which clock the deadline is based on. As usual, that’s not available everywhere, so we’ll likely be stuck with a hard-to-trigger race condition in SLEEP on some platforms.

posted at: 01:30 | /Coding | permalink

Contact me by email: pvk@pvk.ca.