Paul Khuong mostly on Lisp

Sun, 22 Mar 2009

Hacking SSE Intrinsics in SBCL (part 1)

I think I managed to add sane support for SSE values (octwords) and operations to SBCL this weekend (all that because of a discussion on PRNGs which lead to remembering that SFMT wasn't too hard to implement, which lead to SSE intrinsics). It wasn't too hard, but clearly the path could have been better documented. The interesting part is the creation of a new primitive data type in the compiler; once that is done, we only have to define new VOPs. Note that since I'm only targetting x86-64, alignment isn't an issue: objects are aligned to 128 bit by default.

The compiler has to be informed of the new data type at two levels:

The front/middle -end, to create a new type (as in CL:TYPEP), and to define how to map from that type to the primitive type (which is more concerned about representation than semantics) used in the back-end;
The back-end, which has to be informed about the existence of a new primitive type, and must also be modified to correctly map that primitive type to machine registers or stack locations, and to know how to move from one to the other or how to box such values in a GC-friendly object on the heap.

Obviously the runtime was also be modified to be able to GC the new type of objects.

I believe it makes more sense to do this by starting low-level, in the back-end, and then building things up to the front-end, so I'll try and explain it that way.

Hacking the machine definition

SBCL's backend has a sort of type system at the VOP / TN level (virtual operations/registers). Two elements of type information are associated with each TN: the primitive type and the storage class. As its name implies, the primitive type is a low-level, C-style type (e.g. SIMPLE-ARRAY-UNSIGNED-BYTE-64 or DOUBLE-FLOAT), which is almost entirely concerned with representation. Apart from bad puns, there is no subtyping; COMPLEX and COMPLEX-DOUBLE-FLOAT are disjoint types. However, that still leaves some leeway to the back-end. A DOUBLE-FLOAT may be stored in a FP register, on the stack, or as a boxed value. Thus, TN are also assigned a storage class before generating code.

The first step was to define new storage classes for SSE values: sse-reg and sse-stack for SSE values in XMM registers and on the stack, respectively. That's done in src/compiler/x86-64/vm.lisp, !define-storage-classes:

  (sse-stack stack :element-size 2 :alignment 2)

[...]

  (sse-reg float-registers
           :locations #.(loop for i from 0 below 15 collect i)
           :constant-scs ()
           :save-p t
           :alternate-scs (sse-stack))

The first form defines a new storage class (SC) that's stored on the stack (the storage base, SB), where each element takes two locations (in this case, 64 bit words) and requires an alignment of two locations. The second form defines a new storage class that uses the float-registers storage base, and may take any of the first 15 locations in that SB (xmm15 is reserved). Values in that SC must be saved to the stack when needed. The sse-stack SC is to be used when there aren't enough registers or when saving registers (e.g. for a call). Some more modifications were needed in the assembler to make it aware of the new SC, but that's a mostly orthogonal concern.

Adding a new primitive type

That's enough to define a new primitive type in src/compiler/generic/primtype.lisp:

(!def-primitive-type sse-value (sse-reg descriptor-reg))

sse-value can be stored in sse-reg or descriptor-reg (as boxed values), or in any of their alternate SC.

Defining a new kind of primitive object

I've defined a new primitive type that can be stored as a boxed value. However, I haven't yet defined how that boxed value should be represented. I allocated one of the unused widetags to sse-values in src/compiler/generic/early-objdef.lisp:

  #!-x86-64
  unused02
  #!+x86-64
  sse-value                         ; 01100010

That's not necessary, but I would then have to define my own typechecking VOPs, adapt the GC somehow, etc. It will define a new constant, SB!VM::SSE-VALUE-WIDETAG. I use it in the primitive object definition (src/compiler/generic/objdef.lisp):

(define-primitive-object (sse-value
                          :lowtag other-pointer-lowtag
                          :widetag sse-value-widetag)
  (filler) ; preserve the natural 128 bit alignment
  (lo-value :c-type "long" :type (unsigned-byte 64))
  (hi-value :c-type "long" :type (unsigned-byte 64)))

The macro will also define useful constants, e.g. SSE-VALUE-SIZE, as well as slot offsets for the low and high values. That information will be needed by the GC. Genesis only exports CL constants to C when the symbols are external to the package. Thus, I had to add some symbols to the export list for SB!VM in ./package-data-list.lisp-expr:

               #!+x86-64 "SSE-VALUE"
               #!+x86-64 "SSE-VALUE-P" ; will be defined later on
               #!+x86-64 "SSE-VALUE-HI-VALUE-SLOT"
               #!+x86-64 "SSE-VALUE-LO-VALUE-SLOT"
               #!+x86-64 "SSE-VALUE-SIZE"
               #!+x86-64 "SSE-VALUE-WIDETAG"

and similarly for SB!KERNEL:

               #!+x86-64
               "OBJECT-NOT-SSE-VALUE-ERROR" ; so will this

Adapting the GC

I chose a very simple representation for sse-values: the header word will contain the right widetag, obviously, and the rest will encode the size of the object (in words). That's a common scheme in SBCL, and well supported by generic code everywhere. The garbage collector (a copying Cheney GC) has three important tables that are used to dispatch to the correct function given an object's widetag: scavtab to scavenge objects for pointers, transother to copy objects to the new space and sizetab to compute the size of an object. They're all initialised in src/runtime/gc-common.c, gc_init_tables. The representation is standard, so I only had to add pointers to predefined functions, scav_unboxed, trans_unboxed and size_unboxed:

#ifdef SSE_VALUE_WIDETAG
    scavtab[SSE_VALUE_WIDETAG] = scav_unboxed;
#endif

[...]

#ifdef SSE_VALUE_WIDETAG
    transother[SSE_VALUE_WIDETAG] = trans_unboxed;
#endif

[...]

#ifdef SSE_VALUE_WIDETAG
    sizetab[SSE_VALUE_WIDETAG] = size_unboxed;
#endif

Adding utility VOPs to the compiler

That's enough code to be able to use the sse-value primitive type, use our new storage classes in VOP definitions, and pass boxed sse-values around without crashing the GC. The first VOPs to define are probably those that let the compiler move sse-values around, from an sse-reg, sse-stack or descriptor-reg to another. Not all VOPs in the cartesian product must be defined; the compiler can figure out how to piece together move VOPs to a certain extent.

(define-move-fun (load-sse-value 2) (vop x y)
  ((sse-stack) (sse-reg))
  (inst movdqa y (ea-for-sse-stack x)))

(define-move-fun (store-sse-value 2) (vop x y)
  ((sse-reg) (sse-stack))
  (inst movdqa (ea-for-sse-stack y) x))

are enough to define how to move between the stack and xmm registers (ea-for-sse-stack is a helper function that generates an effective address from an sse-value 's location in the compile-time stack frame).

(define-vop (sse-move)
  (:args (x :scs (sse-reg)
            :target y
            :load-if (not (location= x y))))
  (:results (y :scs (sse-reg)
               :load-if (not (location= x y))))
  (:note "sse move")
  (:generator 0
    (unless (location= y x)
      (inst movdqa y x))))
(define-move-vop sse-move :move (sse-reg) (sse-reg))

provides and registers code to move from one xmm register to another.

(define-vop (move-from-sse)
  (:args (x :scs (sse-reg)))
  (:results (y :scs (descriptor-reg)))
  (:node-var node)
  (:note "sse to pointer coercion")
  (:generator 13
     (with-fixed-allocation (y
                             sse-value-widetag
                             sse-value-size
                             node)
       (inst movdqa (make-ea :qword
                             :base y
                             :disp 1)
             x))))
(define-move-vop move-from-sse :move
  (sse-reg) (descriptor-reg))

(define-vop (move-to-sse)
  (:args (x :scs (descriptor-reg)))
  (:results (y :scs (sse-reg)))
  (:note "pointer to sse coercion")
  (:generator 2
    (inst movdqa y (make-ea :qword :base x :disp 1))))
(define-move-vop move-to-sse :move (descriptor-reg) (sse-reg))

will be used to move from an xmm register to a boxed representation and vice-versa.

Finally,

(define-vop (move-sse-arg)
  (:args (x :scs (sse-reg) :target y)
         (fp :scs (any-reg)
             :load-if (not (sc-is y sse-reg))))
  (:results (y))
  (:note "SSE argument move")
  (:generator 4
     (sc-case y
       (sse-reg
        (unless (location= x y)
          (inst movdqa y x)))
       (sse-stack
        (inst movdqa (ea-for-sse-stack y fp) x)))))
(define-move-vop move-sse-arg :move-arg
  (sse-reg descriptor-reg) (sse-reg))

defines how arguments are loaded before calling a function.

Creating a new CL type and associated functions

Now that pretty much all the groundwork has been done in the backend, it's time to inform the frontend about the new data type. The system's built-in classes are defined in src/code/class.lisp, right after (defvar *built-in-classes*). I only had to insert another sublist in the list of built-in classes.

     (sb!vm:sse-value
      :codes (#.sb!vm:sse-value-widetag))

That takes care of defining a new class for the middle-end, and of mapping the class's name, in the front-end, to the class object in the middle-end.

SBCL tries pretty hard to always provide a safe language by default, so I have to make sure the type sse-value can be checked. First, a new error kind is defined in src/code/interr.lisp:

(deferr object-not-sse-value-error (object)
  (error 'type-error
         :datum object
         :expected-type 'sb!vm:sse-value))

The mapping from internal error number to a meaningful message is specified in src/compiler/generic/interr.lisp, define-internal-errors: (object-not-sse-value "Object is not of type SSE-VALUE."). Since I use a normal representation, I can use preexisting machinery to define the type checking VOPs in src/compiler/generic/late-type-vops.lisp:

(!define-type-vops sse-value-p check-sse-value sse-value
    object-not-sse-value-error
  (sse-value-widetag))

That only creates a VOP for sse-value-p; we'd sometimes like to have a real function. That's created in src/code/pred.lisp, with (def-type-predicate-wrapper sb!vm:sse-value-p). Moreover, when a VOP is defined as a translation for a function, that function must be defknown ed to the compiler. I do that in src/compiler/generic/vm-fndb.lisp, with (defknown sb!vm:sse-value-p (t) boolean (foldable flushable)).

TYPEP must also be informed to use that function. That's set up in src/compiler/generic/vm-typetran.lisp, with (define-type-predicate sb!vm:sse-value-p sb!vm:sse-value).

Making the middle and the back meet

The modifications above added a new primitive type to the back-end, along with some machinery to represent and manipulate values of that type. They also added a new built-in class to the front and middle -end and some type-checking code for that new class. The only thing left is to make sure the new class is mapped to the correct primitive type. The default is to map everything to the primitive type T, a boxed value. That's done at the bottom of src/compiler/generic/primtype.lisp, in primitive-type-aux. I only had to modify the case of translating built-in class(oids) to primitive types: sse-value are treated like complex, function, system-area-pointer or weak-pointer. The class sse-value is mapped to the primitive type sse-value. That way, a function that is defknown to, e.g., take an argument of type sse-value (the class) can be translated by a VOP that takes an argument of primitive type sse-value, which can be stored in an sse-reg (an xmm register).

Now what?

We have the data type definitions. To make them useful we still have to define a lot of functions and VOPs (and SSE instructions). However, that's much closer to regular development and doesn't require as much digging around in the source. I'll leave that for another post.

posted at: 18:25 | /Lisp | permalink

Contact me by email: pvk@pvk.ca.