Specifying Instructions’ Semantics Using CSDL
(Preliminary Report)

Norman Ramsey and Jack W. Davidson
Department of Computer Science
University of Virginia
Charlottesville, VA 22903

November 24, 1997
Abstract

The Zephyr project is part of an effort to build a National Compiler Infrastructure, which will support research in compiling techniques and high-performance computing. Compilers work with source code, abstract syntax, intermediate forms, and machine instructions. By using high-level descriptions of the representations and semantics of these forms, we expect to be able to create compiler components that will be usable with different source languages, front ends, and target machines.

To help deal with multiple machines, we are developing a family of Computer Systems Description Languages (CSDL) to describe properties that are relevant to the construction of compilers and other systems software. The languages describe properties of a machine’s instructions or its mutable state, or both. Of particular interest is the description of the semantics of instructions, i.e., their effects on the state of the machine. This report describes our preliminary design of \( \lambda \)-RTL, a CSDL language for specifying instructions’ semantics.

We describe the effects of instructions using register transfer lists (RTLs). A register transfer list is a collection of assignments to locations, which represent registers, memory, and all other mutable state. We prescribe a form of RTLs that makes it explicit how to compute the values assigned and on what state the computation depends. The form also makes byte order explicit and provides for instructions whose effects may be undefined.

Because our form of RTLs contains so much information, it is convenient for use by tools, but it would be tedious to write RTLs by hand. \( \lambda \)-RTL, which is based on the \( \lambda \)-calculus and on register transfer lists, is a metalanguage designed to make it easier for people to write RTLs and to associate them with machine instructions. It enables us to omit substantial information from hand-written specifications; the \( \lambda \)-RTL translator infers the missing information and puts the resulting RTLs into canonical form. \( \lambda \)-RTL also provides a “grouping” construct designed to help specify large groups of similar instructions.

We are still designing \( \lambda \)-RTL. This report presents a short overview of \( \lambda \)-RTL, followed by examples. The examples include definitions of basic operators, which we believe will be useful for describing a wide variety of machines, as well as excerpts from descriptions of the SPARC and Pentium. We have chosen excerpts that illustrate features which are characteristic of these particular machines; for example, we show a model of SPARC register windows, and we show how \( \lambda \)-RTL can help manage the complexity of the Pentium instruction set.

Both the machine descriptions and \( \lambda \)-RTL itself are under development, so this report is a snapshot of a work in progress. We issue it now to solicit feedback both on our overall approach and on the details of \( \lambda \)-RTL. Please send feedback by electronic mail to zephyr-investigators@virginia.edu.
# Contents

1 Overview of CSDL  
   Machine descriptions for machine-level tools ........................................ 1  
   Core aspects for CSDL ........................................................................... 2  
   Instructions ......................................................................................... 2  
   Storage ............................................................................................... 3  
   Combining instructions and storage ...................................................... 3  
   Languages in the CSDL family ............................................................. 4  

2 Register Transfer Lists  
   Form of RTLs ...................................................................................... 6  
   Types in RTLs ....................................................................................... 7  
   RTL languages and translation ............................................................. 7  
   Interpretations of RTLs ......................................................................... 8  

3 Using λ-RTL to specify register transfer lists  
   Design considerations ......................................................................... 10  
   Restrictions eased in λ-RTL ................................................................. 11  
      Implicit fetches .............................................................................. 11  
      Slices ............................................................................................. 11  
      Implicit aggregation .................................................................... 12  
   Overview of λ-RTL ........................................................................... 12  
      Top-level declarations ................................................................ 13  
      Inner declarations ....................................................................... 14  
      Expressions .................................................................................. 14  
      Grouping ....................................................................................... 14  
      Lexical conventions ...................................................................... 15  
      The initial basis ........................................................................... 15  
      Style ............................................................................................. 16  
   Differences between λ-RTL and Standard ML ................................. 16  

4 Basic RTL operators  
   4.1 Building RTLs ............................................................................ 19  
   4.2 Booleans ....................................................................................... 19  
   4.3 Integer arithmetic and comparisons ............................................ 20
4.4 Logical operators .............................................. 21
4.5 Byte order ......................................................... 22
4.6 Undefined effects .............................................. 22
4.7 Floating-point arithmetic .................................... 22

5 Describing the SPARC, Version 8 24
5.1 Storage spaces .................................................... 24
   Integer Registers ................................................ 24
   Integer Unit control/status registers ......................... 26
5.2 SPARC instructions ............................................... 26
   SPARC addresses and operands ................................... 26
   Load instructions ............................................... 27
   Store instructions ............................................... 28
   Swapping load-store instructions ............................... 28
   Save and restore instructions ................................... 28
   Logical instructions ............................................. 30
   Add instructions ................................................ 31
   Multiply instructions ............................................ 32
   Branch on integer condition codes ............................ 33
   Call and jump instructions ................................... 35
5.3 Putting it all together ......................................... 35

6 Describing the Intel Pentium 36
6.1 Storage spaces .................................................... 37
   Registers .......................................................... 37
   Effective addresses ............................................. 39
   Instructions ..................................................... 41

7 Index 45
7.1 List of chunks .................................................. 45
7.2 Identifier index ............................................... 46

References 49
Chapter 1

Overview of CSDL

Machine descriptions for machine-level tools

Special-purpose and general-purpose computers are designed at a rapid pace, and software tools and technology don’t always keep up. The tools we need include not only the compilers, assemblers, linkers, and debuggers familiar to all programmers, but also less familiar tools, like profilers, tracers, test-coverage analyzers, and general code-modification tools. Most of these tools must work with machine instructions.

Machine-dependent detail makes it hard to build machine-level tools. For many years, compilers have used machine descriptions to capture such detail. Machine descriptions isolate target-specific information so that it can easily be examined and changed. Despite their successful use in compilers, machine descriptions are seldom used to build other systems software. Descriptions used in compilers are hard to reuse because they typically combine information about the target machine with information about the compiler. For example, “machine descriptions” written using tools like BEG (Emmelmann, Schröer, and Landwehr 1989) and BURG (Fraser, Henry, and Proebsting 1992) are actually descriptions of code generators, and they depend not only on the target machine but also on a particular intermediate language. In extreme cases (e.g., gcc’s md files), the description formalism itself depends on the compiler.

We believe we can simplify construction of compilers and other machine-level tools by developing description techniques that separate machine properties from compiler concerns. We expect this capability to be useful in a National Compiler Infrastructure because it will lead to a back-end infrastructure that should be usable not only with many different target machines, but also with different source languages, front ends, and intermediate languages.

Some existing languages for machine description, like VHDL (Lipsett, Schaefer, and Ussery 1993) and Verilog (Thomas and Moorby 1995) do describe only properties of the machine, but they are at too low a level, describing implementations as much as architectures. These description languages require too much detail that is not needed to build systems software.

We are designing a family of Computer Systems Description Languages (CSDL) to support a variety of machine-level tools while remaining independent of any one in particular. We have several goals for CSDL:

- Descriptions should be composed from simple components. Each component should describe, as much as possible, a single aspect of the target machine. Such aspects might include calling conventions, representations of instructions, pipeline implementations, memory hierarchy, or other properties. It should not always be necessary to describe aspects completely. For example, most compilers use only a subset of the instructions available on a particular
machine, and a compiler writer should be required to describe only that subset.

- An application writer should be able to derive useful tools not only when the components describing individual aspects are incomplete, but also when not all aspects are described. An application writer should not have to describe aspects of the machine unless the information is needed to build his application. For example, an application writer working entirely at the assembly-language level or above should not have to describe binary representations of instructions. Someone writing a garbage collector might need to describe the stack-frame layouts determined by a calling convention, but he should not have to describe the instructions used in the calling sequences that establish those layouts.

- Components, especially those describing "core aspects," should be reusable. For example, writers of all specifications should benefit from having a common formalization of what it means to be a "SPARC instruction supported by the bare hardware."

CSDL forms a family of languages because all family members have the same view of two core aspects of machines: instructions and state. This report explains the CSDL view of these core aspects, and it presents a rationale for and examples of the most important member of the CSDL family, which describes the semantics of machine instructions.

Core aspects for CSDL

To identify core aspects to be used throughout the CSDL family, we examined descriptions used to help retarget a variety of systems-level tools. These tools included an optimizer (Benitez and Davidson 1988), a debugger (Ramsey and Hanson 1992), an instruction scheduler (Proebsting and Fraser 1994), a call-sequence generator (Bailey and Davidson 1995), a linker (Fernández 1995), and an executable editor (Larus and Schnarr 1995). Cur- sory inspection showed no single common aspect of a machine used in descriptions for all of these tools, but a closer look revealed that all the descriptions refer either to the machine’s instruction set or to its storage locations, and some to both. For example, the specifications used by the scheduler and linker refer only to the machine’s instructions and the properties thereof. The specifications used in the call-sequence generator and in the debugger’s stack walker refer only to storage, explaining in detail how values move between registers and memory. The specifications used in the optimizer and the executable editor refer both to instructions and to storage, and in particular, how instructions change the contents of storage. From these observations, we have chosen to require that languages in the CSDL family refer to instructions, storage, or both, and that they use the models of instructions and storage presented below.

Instructions

The CSDL model of an instruction set is a list of instructions together with some identifying information about their operands. Although instruction names in assembly languages are typically overloaded, CSDL requires instructions to have unique names, because tools often need uniquely named code for each instruction in an instruction set. For example, an assembler might use a unique C procedure to encode each instruction, or an executable editor might use a unique element of a C union to represent an instance of each instruction.

An individual instruction is viewed as a function or constructor that, when applied to operands, produces something interesting: a binary representation, an assembly-language string, a semantics, etc. Instruction descriptions that use the CSDL core will include the names and types of the operands of each instruction. Types will include integers of various sizes, but it will also be possible to introduce new types to define such machine-dependent concepts as effective addresses. Values of these new types may be created by applying suitable constructors; for example, Chapter 6 shows how a Pentium effective address may be formed by applying any of
9 constructors, each one representing a different addressing mode.

The structure defined by CSDL constructors and their operands can be viewed as a discriminated-union type. It is also equivalent to an ASDL grammar (Wang et al. 1997), without recursion, in which the start symbol is “instruction,” other nonterminals refer to machine-level concepts like “effective address” or “integer-instruction operand,” and terminal symbols are defined by integers or addresses. This structure is a simplification of the “constructor” from the Specification Language for Encoding and Decoding (SLED) of the New Jersey Machine-Code Toolkit (Ramsey and Fernández 1997). It is determined by the machine, independent of any tool, so it should be useful in any specification language that deals with individual machine instructions. For example, the $\text{nML}$ machine-description language (Fauth, Praet, and Freericks 1995) uses this structure, although $\text{nML}$ is otherwise quite different from SLED.

Properties of instructions

Experience with the New Jersey Machine-Code Toolkit shows that many applications can be built from specifications that discuss only instructions and their properties, with no reference to storage. Such applications include assemblers and disassemblers, as well as code generators that work by pattern matching on the names of instructions.

We specify properties of instructions in a compositional style, so instructions’ properties are functions of the properties of their operands. We formalize that style using attribute grammars. For example, assembly-language representations of instructions can be computed as synthesized attributes, where the attributes are strings. Binary representations can also be computed as attributes, where the attributes are SLED “patterns.” Attributes readily support machines like the 68000 family, in which the representations of effective addresses depend on context.

Storage

The CSDL model of a storage space is a sequence of mutable cells. A storage space is like an array; cells are all the same size, and they are indexed by integers. For example, a typical microprocessor has a memory made up of 8-bit cells (bytes) and a register file made up of 32-bit cells. The number of cells in a storage space may be left unspecified.

The state of a machine can be described as the contents of a collection of storage spaces. We use storage spaces to model main memory, general-purpose registers, special-purpose registers, condition codes, and so on.

Experience with CCL, a Calling Convention Language (Bailey and Davidson 1995), shows that applications can be built from specifications that discuss only storage, with no reference to instructions. For example, calling conventions can be described by discussing the placement of parameters in storage cells and the effects of calls and returns on storage. From a CCL description, one can generate procedure prologs and epilogs in a compiler, and one can also generate code to test compilers’ implementations of calling conventions (Bailey and Davidson 1996). One might be able to derive stack walkers or exception-handling code from similar descriptions. A garbage collector may need to know which locations in storage can contain pointers; this is a property of an application, not of a machine, but it can be described using the CSDL storage core. (We want to describe application properties as well as machine properties, but we want to keep them separate.)

Languages in the CSDL family may refer to individual locations. Ways of writing locations may vary, but each one must resolve to a name of a storage space and an integer offset identifying a cell within that storage space.

Combining instructions and storage

An architecture manual tells programmers what constitutes the state of the processor, and it lists
instructions and their operands, but the most important thing it does is explain the semantics of each instruction in terms of that instruction’s effects on the state of the processor. Many applications can be built based on this information, but some require more detail than others. For example,

- Information about control flow is enough to build control-flow graphs.
- Information about calling conventions may be enough to recognize procedure calls and returns.
- Information about locations read and written is enough to build data-flow graphs. Such graphs, together with the ability to match individual instructions, may suffice to build code-editing tools like EEL (Larus and Schnarr 1995) or ATOM (Srivastava and Eustace 1994).
- Information about register-transfer semantics is enough to build code improvers in the style of PO (Davidson and Fraser 1980), ypo (Benitez and Davidson 1988), and gcc (Stallman 1992). These code improvers work by pattern matching, so they need not know what all of the register-transfer operators do. They do, however, need at least a partial semantics, to perform strength reduction, constant folding, and similar transformations.

To build a code generator, one needs to know more about the operators used in the register transfers. In particular, one needs to know enough so that one can find a sequence of instructions to implement each operation in the compiler’s intermediate code. It is sufficient to know that the intermediate code and the instructions compute the same operation; one need not know exactly what the operations do to the bits.

To build an emulator like SPIM (Larus 1990) or a binary translator like FXI32 (Thompson 1996), one needs enough information about the operations in the register transfers to interpret the the effect of each register transfer on each bit of the processor’s state.

We believe that these applications and more can be served by providing register-transfer semantics for instructions. The effect of a particular instruction can be specified as a register-transfer list (RTL), which modifies storage cells. As with other properties of instructions, the RTL is computed using an attribute grammar.

In principle, a single register-transfer description could support all of the different uses above. A single RTL could be given different interpretations, depending on its intended use. We believe we can support this mode of specification by prescribing a rigid form for RTLS, while supplying suitable abstraction mechanisms for use within that form. Abstraction mechanisms support our goals of enabling application writers to leave irrelevant facts unspecified. For example, we want to make it easy to write an RTL that means “register rd is assigned an unspecified function of registers rs and rt.”

Languages in the CSDL family

We consider SLED, for specifying representations of instructions (Ramsey and Fernández 1997), and CCL, for specifying calling conventions (Bailey and Davidson 1995), to be the first languages in the CSDL family. We are developing a new language, λ-RTL, for specifying instruction semantics. We expect that CSDL will expand to include languages for specifying properties of memory hierarchies and of pipelines. Indeed, the simple functional-resource languages of Proebsting and Fraser (1994) and Bala and Rubin (1995) fit nicely into the CSDL framework.

This report focuses on λ-RTL. The report does not give a specification or definition of λ-RTL, because λ-RTL is not sufficiently developed for a specification or definition to be worthwhile. Instead, this report presents some essential properties of λ-RTL, and it gives examples of machine specifications using the current, imperfect version. Chapter 2 describes the form of register transfer lists used in λ-RTL. Chapter 3 presents λ-RTL and discusses its translation into RTLS. Chapter 4 describe some basic content which fits into that form, and which
we believe will be useful for describing many machines. Chapters 5 and 6 present excerpts from λ-RTL descriptions of the SPARC and Pentium processors, respectively.
Chapter 2

Register Transfer Lists

Computer scientists have used register transfers in many different forms. For $\lambda$-RTL, we have chosen a form designed for use by tools, not by people. We have therefore insisted that as much information as possible be explicit in the RTL itself. Under our current plan,

- RTLs are represented as trees.
- All operators are fully disambiguated, e.g., as to type and size.
- There is no aliasing of locations.
- Fetches are explicit, as are changes in the size or type of data.
- Stores are annotated with the size of the data stored.
- Explicit tree nodes specify byte order. More generally, they specify how to transfer data between storage spaces of different granularity.
- RTL should be a typed representation. We plan to generate RTL type checkers from CSDL specifications. Optimizations and other transformations that are intended to preserve semantics should also preserve the property of being well typed. Typing is a good way to catch bugs in optimizers (Morrisett 1995).

The form of RTLs proposed here may be suitable not just for specification, but also for use in the implementations of compilers, binary translators, and other tools.

([ASDL specification of the form of RTLs] $\equiv$

ty = (int) -- size of a value, in bits
exp = CONST (const)
| FETCH (location, ty)
| APP (operator, exp*)
location = AGG(aggregate, cell)
cell = CELL(space, exp)
effect = STORE(location dst, exp src, ty)
| KILL(location)
| KILLALL(space)
guarded = GUARD(exp, effect)
rtl = RTL (guarded*)

Figure 2.1: ASDL specification of the form of RTLs

Form of RTLs

Figure 2.1 uses the Zephyr Abstract Syntax Description Language (Wang et al. 1997) to show the form of RTLs. A register transfer list is a list of guarded effects. Each effect represents the transfer of a value into a storage location, i.e., a store operation. The transfer takes place only if the guard (an expression) evaluates to true. Locations may be single cells or aggregates of consecutive cells within a storage space. Values are computed by expressions without side effects, which simplifies analysis and transformation. These expressions include integer constants, fetches from locations, and applications of RTL operators to lists of expressions. Effects in a list take place simultaneously, as in Dijkstra's multiple-assignment statement, so one RTL represents one change of state. This decision makes
it possible to specify swap instructions without having to introduce temporary locations.

Not every effect is a true assignment. A kill effect changes the value in a storage location in an undefined way. Kill effects are needed to model such architectural specifications as “the effect of a logical instruction on the AF flag is undefined.”

The “kill all” effect in Figure 2.1 kills all of the cells in the given storage space. It is needed to give a semantics to certain phrases of λ-RTL. We haven’t demonstrated a need for it yet, but it may be useful to model stores into arbitrary memory locations.

As an example of a typical RTL, consider a SPARC load instruction using the displacement addressing mode, written in the SPARC assembly language as

\[ \text{load} \ [\text{disp}], \ %10 \]

Although we would not want to specify just a single instance of a single instruction, the effect of this load instruction might be notated in λ-RTL as follows:

\[ \lambda-\text{RTL for load instruction} \equiv \]

\[
\text{store} \ [\text{disp}], \ %10 \leftarrow \ %9[\text{disp}]+(\text{"12})
\]

because the stack pointer is register 14 and register 10 is register 24. The corresponding RTL is much more verbose, with the sizes of all quantities identified explicitly, as a fully disambiguated tree:

```
store #32
AGGB #32 #32 fetch #32
loc 'r' AGGB #8 #32
const #5 loc 'm'
24

ADD #32
fetch #32 sx #13 #32
AGGB #32 #32 const #13
loc 'r' -12
const #5
14
```

The various constants labeled with hash marks, like #32, indicate the number of bits in arguments.

---

1The " in "12 is a unary minus.

---

results, or data being transferred. Such constants will fit into a generalization of the Hindley-Milner type system (Milner 1978).

Figure 2.2 shows the operators used in this tree. The left child of the store is a subtree representing the location consisting of the single register i0, which is register 24. The right-hand child represents a 32-bit word (a big-endian aggregation of four bytes) fetched from memory at the address given by the subtree rooted at add. This node adds the contents of the stack pointer (register 14) to the constant -12. The constant is a 13-bit constant, and the sx operator sign-extends it to 32 bits, so it can be added to the stack pointer.

**Types in RTLS**

RTL is intended to be a typed format. The type system is under development, but it is expected to include the following types:

- \#nbits: A value that is \( n \) bits wide.
- \#nloc: A location containing an \( n \)-bit value.
- \#n cells: One of a sequence of \( n \)-bit storage cells, which can be aggregated together to make a larger location, as by the AGGB nodes in the example tree.
- bool: A Boolean condition.
- effect: A side effect on storage.

Figure 2.2 uses these types to show the types of the operators used above. We plan to extend Milner’s type inference to this system, so that those writing specifications in λ-RTL can omit types and widths. Unlike in ML, type inference alone will not guarantee that terms make sense; in general, it will be necessary to check additional constraints. For example, in the RTL shown above, it would be necessary to check that the signed integer -12 can be represented using 13 bits, and that 32 is a multiple of both 8 and 32.

**RTL languages and translation**

Although we have prescribed the form of RTLS, we can’t write any RTLS until we choose a set of lo-
STORE: ∀n. #n loc × #n bits → effect

Store an n-bit value in a given location. The type indicates that for any n, STORE #n takes an n-bit location and an n-bit value and produces an effect.

FETCH: ∀n. #n loc → #n bits

For any n, FETCH #n takes an n-bit location and returns the n-bit value stored in that location.

AGGB: ∀n. ∀w. #n cells → #w loc

For any n and w, AGGB #n #w aggregates an integral number of n-bit cells into a w-bit location, making the first cell the most significant part of the new location, i.e., using big-endian byte order. w must be a multiple of n. (w and n are mnemonic for wide and narrow.)

LOC 'm': #32 bits → #8 cells

Given a 32-bit address, LOC 'm' returns the 8-bit cell in memory referred to by that address.

LOC 'r': #5 bits → #32 cells

Given a 5-bit register number, LOC 'r' returns the corresponding 32-bit register (a mutable cell).

ADD: ∀n. #n bits × #n bits → #n bits

For any n, ADD #n takes two n-bit values and returns their n-bit sum. Carry and overflow are ignored.

SX: ∀n. ∀w. #n bits → #w bits

For any n and w, SX #n #w takes an n-bit value, interprets it as a two’s-complement signed integer, and sign-extends it to produce a w-bit representation of the same value. w must be greater than n.

CONST: ∀n. (constant) → #n bits

For any n, CONST #n k represents the n-bit constant k. k must be representable in n bits. The same k could be used with different ns.

Figure 2.2: Some RTL operators and their types

Interpretations of RTLs

One reason we restrict the form of RTLs is to limit their possible meanings or interpretations. For example, access to the mutable state of a machine is available only through the fetch and store operations built into the RTL form, so we can easily tell what state is changed by an RTL and how that change depends on the previous state. Researchers and tools working with our form of RTLs have freedom to define interpretations of only two parts of the RTL form: aggregations and operators.

RTL aggregations specify byte order. More generally, aggregations make it possible to write an RTL that stores a w-bit value in (or fetches a w-bit value from) k consecutive n-bit locations, provided that w = kn. Such an aggregation has type #n cells → #w loc, and its interpretation must be a bijection between a single w-bit value and k n-
bit values. Moreover, when \( w = n \), the bijection must be the identity function. Storing uses the bijection, and fetching uses its inverse, making it possible to combine RTLs using forward substitution. Little-endian and big-endian aggregations will be built into \( \lambda \)-RTL, as will an “identity aggregation” that is defined only when \( w = n \). We imagine that users could define other aggregations by giving systems of equations, as in Ramsey (1996).

RTL operators must be interpreted as pure functions on bit vectors. Consequently, the result of applying an RTL operator must not depend on processor state; the operator must give the same answer every time. Chapter 4 presents a collection of RTL operators that we expect can be used to describe many different machines. We don’t expect this set to be complete; on the contrary, we expect that any machine description written in \( \lambda \)-RTL will introduce a handful of new RTL operators which will be unique to that machine.
Chapter 3

Using $\lambda$-RTL to specify register transfer lists

Bare RTLs are both spartan and verbose. Expressions do not include if-then-else, so conditionals must be represented by using guards on effects. There is no expression meaning “undefined;” assignments of undefined values must be specified using a kill effect. These restrictions, and the requirement that operations be fully disambiguated, make RTLs a form that is good for manipulation by tools but not so good for writing specifications.

$\lambda$-RTL is a metalanguage that enables specification writers to attach RTL trees to SLED-like constructors without having to write everything explicitly. Eventually, $\lambda$-RTL will be a higher-order, strongly typed, polymorphic, pure functional language based largely on Standard ML (Milner, Tofte, and Harper 1990). A variation on the Hindley-Milner type system will make it possible to write flexible, type-safe functions without having to write types explicitly. The current implementation of $\lambda$-RTL is, however, untyped.

Design considerations

We have drawn on our experience with SLED (Ramsey and Fernández 1997) to identify mechanisms and properties that are desirable in any CSDL language, including $\lambda$-RTL. These include:

- Use of constructors to provide an abstract view of the machine's instruction set, and use of attributes to specify properties of instructions. These mechanisms are close kin to attribute grammars and to the denotational approach to semantics, both of which have long histories of utility.

- Use of default, unnamed attributes (for example, binary representation or default constructor type), sometimes supported by special syntax. By providing a default case that does not have to be named, we can unclutter specifications. For example, when we refer to an operand of type Address, the location designated by that effective address should be the default meaning.

- Use of a specialized sublanguage for defining attributes. SLED uses patterns to specify binary representations; $\lambda$-RTL uses the prescribed forms of RTLs, to be augmented by a sophisticated type system.

- Language constructs that help eliminate repetition by permitting simultaneous description of many instructions at once. Such constructs reduce the number of opportunities for errors, and they help keep specifications concise and readable. $\lambda$-RTL includes two such constructs: first class functions, and a generalization of SLED's grouping construct, which itself is based on Icon generators (Griswold 1982).
• Formal notation that matches informal descriptions found in architecture manuals. The architecture manual is the standard model of the machine that is shared by all tool builders, so this close match not only helps people read specifications, it helps them write correct specifications. \(\lambda\text{-RTL}\) enables users to define their own infix operators, and Chapters 4, 5, and 6 show how we have used this capability to create an ESP-like notation for RTLS.

We expect to be able to analyze \(\lambda\text{-RTL}\) specifications for internal consistency and “plausibility.” For example, it should be possible to identify cases in which a register number is mistakenly used as an immediate value instead of as an offset into the storage space modeling the register file.

Restrictions eased in \(\lambda\text{-RTL}\)

\(\lambda\text{-RTL}\) descriptions are easier to write than bare RTLS in three ways: grouping and higher-order functions can help eliminate repetition, the type system will eliminate the need to write sizes explicitly, and \(\lambda\text{-RTL}\) relaxes several of the restrictions on the form of RTLS. In particular,

• In \(\lambda\text{-RTL}\), it is not necessary to write fetches explicitly.

• \(\lambda\text{-RTL}\) gives the illusion that bit slices (subfields) are locations that can be assigned to.

• \(\lambda\text{-RTL}\) gives the illusion that aggregates of cells are locations that can be assigned to, and it is not usually necessary to write aggregations explicitly.

Implicit fetches

Most programmers are used to writing \(x := x + 1\) and having the \(x\) on the left denote a location while the \(x\) on the right denotes the value stored in that location. Typical compilers identify “value contexts” and “rvalue contexts” and automatically insert fetches in rvalue contexts. We do the same in \(\lambda\text{-RTL}\), but instead of using syntax to identify the contexts, we intend to use types.

We use types and not syntax because \(\lambda\text{-RTL}\) has no special syntax for writing RTL assignments. Instead of an assignment syntax, \(\lambda\text{-RTL}\) provides a built-in store function that accepts a location and a value and produces an effect. Any user-defined function might result in a call to the built-in store, so we have to recognize the right and left contexts by their types. Thus, if a location is used where a value is expected, we insert a fetch.

\(\lambda\text{-RTL}\) does almost everything with functions, not syntax, so the writer of a specification can normally redefine the meaning of a notation by defining a new function with the same name. This strategy doesn’t work with implicit fetches because there is no explicit notation associated with a fetch. We want users to be able to control the meanings of these fetches, however, because many machines have resources that are almost, but not quite, sequences of mutable cells. For example, SPARC registers can be viewed as a collection of 32 mutable cells, except that register 0 is not mutable and always contains 0. We would like users to be able to define special meanings for “fetch from register 0” and “store into register 0” so the rest of the specification can pretend that the registers are an ordinary storage space. We do so by permitting users to attach fetch and store methods to each storage space. Methods not given explicitly default to the standard fetch and store operators. Sample fetch and store methods for the SPARC are shown on page 25. If we wanted, we could use fetch and store methods to describe the true implementation of SPARC registers, in which “registers” 8 through 31 denote locations accessed indirectly through the register-window pointer (CWP).

Slices

Many machine instructions manipulate fragments of a word stored in a mutable cell. For example, some machines represent condition codes as individual bits within a program status word. Many machines have instructions that, for example, assign to the least-significant 8 bits of a 32-bit register.
To make it easy to specify such instructions, \(\lambda\)-RTL creates the illusion that a sub-range or “slice” of a cell can be a location in its own right, one that is a suitable argument for a fetch or store operation. This illusion helps keep machine descriptions readable; for example, an effect that sets the SPARC overflow bit simply assigns to it, hiding the fact that it is buried in a program status word that has to be fetched, modified, and stored.

\(\lambda\)-RTL uses a special syntax for slices because the slicing operation is overloaded; it can be applied to locations or to values. Examples of the syntax include

\[
\begin{align*}
x_0[k] & \quad \text{Bit } k \text{ of } x. \quad \text{By default, bit 0 is the least significant bit. A future version of } \lambda\text{-RTL} \text{ may make it possible to change the numbering.} \\
x_0[k_1 .. k_2] & \quad \text{Bits } k_1 \text{ through } k_2 \text{ of } x, \text{ inclusive.} \\
x_0[k \text{ bits at } e] & \quad \text{A } k\text{-bit slice of } x, \text{ with the least significant bit at } e. \\
k' \text{'s denote integer constants, } e' \text{'s denote expressions, and } x' \text{'s may denote values or locations. In all cases the size of the slice is known statically, so its type can be computed automatically. We use the Greek letter } \sigma \text{ to stand for any of these slice specifications.}
\end{align*}
\]

Given a slice specification \(\sigma\), \text{SLICE}_\sigma\ is an overloaded function that maps locations to locations or values to values. The illusion that slices are locations is implemented by rewriting, according to the following rules:

\[
\begin{align*}
\text{FETCH}(\text{SLICE}_\sigma, l) & \mapsto \text{SLICE}_\sigma(\text{FETCH}l) \\
\text{SLICE}_\sigma, l \leftarrow n & \mapsto l \leftarrow \text{bitInsert}_\sigma(\text{FETCH}l, n)
\end{align*}
\]

where \(l\) is a location and \(n\) is a value. After the rewriting, all slices operate on bit vectors, and all fetches and stores operate on true locations. Invocation of user-defined fetch and store methods takes place after the rewriting of slices. This ordering makes it possible to use fetch and store methods to define cell-like abstractions, while ensuring that the meaning of slicing is always consistent with respect to such abstractions.

**Implicit aggregation**

\(\lambda\)-RTL provides the special syntax \(\text{space [offset]}\) for references to mutable cells. The \(\text{offset}\) can be an arbitrary expression, but the \(\text{space}\) must be a literal name, so that \(\lambda\)-RTL can identify the storage space and use appropriate fetch and store methods. To make this cell a location, \(\lambda\)-RTL applies an aggregation, which is also associated with the storage space as a method. The default method is the identity aggregation, which permits only “aggregates” of a single cell.

When little-endian, big-endian, or other aggregations are used, \(\lambda\)-RTL will infer the size of the aggregate. We have not yet determined the rules to be used for the inference, but we hope at least to support the automatic inference that four 8-bit bytes must be aggregated to form a value that goes into a 32-bit register.

The implementation of \(\lambda\)-RTL used in subsequent chapters does not support aggregation.

**Overview of \(\lambda\)-RTL**

From the point of view of external tools, the output of a \(\lambda\)-RTL specification is a set of bindings of values to attributes of constructors. The part of \(\lambda\)-RTL used to specify values is a pure functional language without recursion. The implementation used in this report is untyped, but \(\lambda\)-RTL will eventually use a polymorphic typed calculus. Many features of \(\lambda\)-RTL are inspired by Standard ML.

To describe syntax, we use an EBNF grammar with standard metasymbols for \{sequences\}, [optional constructs], and (alternative | choices). We use large metasymbols to help distinguish them from literals. Terminal symbols given literally appear in \text{typewriter} font. Other terminal symbols and all nonterminals appear in \text{italic} font. Excerpts from the grammar begin with the name of a nonterminal followed by the \(\Rightarrow\) (“produces”) symbol.

A \(\lambda\)-RTL specification is a sequence of \text{structures}. \(\lambda\)-RTL uses structures to organize the name space of values, and structures will play a role in a modules system for \(\lambda\)-RTL. A value \(x\) declared in a struc-
ture $S$ is visible within $S$ as just $x$, but outside $S$ it can only be referred to as $S, x$. Structures are written

$$
\text{structure} \Rightarrow \\
\text{structure} \text{ struct-name is struct } \{ \text{declaration} \} \text{ end}
$$

Constructors and storage spaces occupy their own individual name spaces, which are flat. RTL operators are also expected to occupy a separate, flat name space, but within $\lambda$-RTL they are not distinguished from other kinds of values. Eventually, $\lambda$-RTL will check that the same name is not used to declare distinct RTL operators in two different structures.

Declarations are either top-level or inner declarations. Inner declarations may appear anywhere, but top-level declarations may appear only outside of function and value definitions. Top-level declarations appear only in contexts that guarantee they will be evaluated exactly once.

**Top-level declarations**

$\lambda$-RTL’s top-level declarations may introduce structures, locations, storage spaces, RTL operators, or bindings to attributes of constructors.

Storage spaces are introduced by storage; they name the storage space, give its granularity (width of an individual cell) and size (number of cells), and possibly fetch, store, or aggregation methods.

RTL operators are introduced by:

$$
\text{declaration} \Rightarrow \\
\text{rtlop } \{ \text{operator-name} | \{ \text{operator-name} \} \}
$$

For example,

$$
\text{rtlop bit} \\
\text{rtlop } [= <>]
$$

introduces three new RTL operators.

In the current implementation of $\lambda$-RTL, $\text{rtlop}$ introduces an arbitrary, abstract value. Because it is the only abstraction mechanism available, it is sometimes put to odd uses. A future version of $\lambda$-RTL will require that the type of the RTL operator(s) be specified. $\lambda$-RTL will support polymorphic operators applicable to arguments of any size. This parametric polymorphism will let programmers use to use the same operator to add both 16- and 32-bit integers. $\lambda$-RTL will not support overloading, so a different operator will be required to add floating-point values.

Attribute bindings are introduced as follows:

$$
\text{declaration} \Rightarrow \\
\text{attribute of } \{ \text{constructor is expression} \} \\
\text{attribute} \Rightarrow \\
\text{default attribute of } \{ \text{attribute (attribute-name | default) of} \} \\
\text{constructor} \Rightarrow \\
\text{opcode } \{ \text{operand-name } , \{ \text{operand-name} \} \}
$$

Each expression must be terminated by a newline; if an expression doesn’t fit on one line, it must contain an open parenthesis or brace so that the $\lambda$-RTL compiler knows to continue to the closing delimiter. For example,

$$
default \text{ attribute of } \\
\text{nop() is do nothing}
$$

defines the effect of a no-op.

An expression bound to an attribute must denote a legal fragment of RTL; in the current implementation, this means a value, a location, or an RTL. More precisely, an expression must denote an RTL template. An RTL template becomes a fragment of RTL when suitable fragments of RTL are substituted for the constructor’s operands. An RTL template is an RTL, except that

- Names of constructor operands may be used to stand for RTL expressions. (Eventually we hope to use names of operands to stand for locations as well as values, but the current implementation does not support this usage.)

- Tuple or record selections from constructor operands may be used to stand for RTL expressions.

- Vectors of RTL expressions subscripted by constructor operands may be used in place of RTL expressions.
A tuple or record containing RTL templates is also considered an RTL template. The most notable consequence of these rules is that no expression whose normal form contains a λ-abstraction is an RTL template; λ-abstractions must be applied to arguments.

Eventually there will be a precise, formal definition of RTL template.

**Inner declarations**

Inner declarations may appear anywhere a top-level declaration may appear, and they may also appear in let-expressions, where they are the only kind of declaration permitted. Their typical usage is to bind local values in function bodies. Inner declarations include value bindings, function bindings, and fixity declarations. Value and function bindings are written as

\[
\text{declaration} \Rightarrow \\
\text{val value-spec is expression} \\
| \text{fun value-spec argument-pattern is expression} \\
\text{value-spec} \Rightarrow \text{name} | \{ \text{name} \}
\]

The square brackets in value-spec represent λ-RTL’s grouping mechanism, in which the expression actually denotes a sequence of values, and each value in the sequence is bound of the corresponding name on the left.

As examples, consider

\[
\text{val do_nothing is RTL.SKIP} \\
\text{fun bool n is n <> 0} \\
\text{val [eq ne] is [ (= ) (<> )]}
\]

λ-RTL supports user-defined infix operators with arbitrarily many levels of precedence. Precedence levels range from 999, which represents the highest possible precedence (tightest binding) for an infix operator, down to the most negative integer on the machine (loosest binding). The syntax is

\[
\text{declaration} \Rightarrow \\
(\text{infix} | \text{infix} | \text{infix}) \text{ precedence value-spec} \\
| \text{nonfix value-spec}
\]

Infix operators must be declared to be left-associative, right-associative, or non-associative with other operators of the same precedence. For example, we can make equality testing infix and non-associative using

\[
\text{infix} 5 \ [= \ <=>\] \\
\]

A single instance of an infix operator can be made “nonfix” (treated as an ordinary value) by enclosing it in parentheses. The nonfix declaration makes this behavior permanent.

**Expressions**

λ-RTL expressions include records, tuples, integers, conditionals, and functions (λ-abstractions). As in ML, the juxtaposition of two expressions indicates function application, and function application has higher precedence than any infix operator. Figure 3.1 shows the syntax for expressions.

**Grouping**

A group may appear in place of an expression anywhere in λ-RTL. Whereas an expression denotes a single value, a group denotes a sequence of values. All operations except binding distribute over the group, so if the expression in a declaration contains a group, then that expression denotes a sequence of values, no matter how deeply nested the group is. The most common operations to distribute over a group are function application and tuple formation.

The usual syntax for a group is a list of expressions in square brackets. The expressions need not be of the same type. Ordinary function application is inoperative at the top level of a group, so the group \([f \ x]\) is a group containing two values, as is \([f \ (x)\]\). To use function application or infix operators within a group, one must include the whole application in parentheses, e.g., \([f \ (x)\] or \([f \ (x)\])\]. In addition to the usual syntax, λ-RTL provides two kinds of special syntax to create groups of integers. All use square brackets:

\[
\text{expression} \Rightarrow \\
[ \{ \text{expression} \}] \\
| [ \text{expression} . . . \text{expression} ] \\
| [ \text{expression to expression} \\
| \{ \text{by expression | columns expression} \} ]
\]
expression \Rightarrow

\begin{align*}
& ( \text{expression} \{, \text{expression} \} ) \\
& | \{ \text{member-name in expression}\{, \text{member-name in expression} \} \} \\
& | [. \text{expression} \{, \text{expression} \} .] \\
& | \text{let} \{ \text{declaration} \} \text{in} \text{expression} \text{end} \\
& | \text{if} \text{expression} \text{then} \text{expression} \text{else} \text{expression} \text{fi} \\
& | \text{argument-pattern} . \text{expression} \\
& | \text{expression} \text{member-name} \\
& | \# \text{space-identifier} [ \text{expression} ] \\
& | \text{expression} \# [ \text{slice-specification} ] \\
& | \text{expression} \text{expression} \\
& | \text{identifier} \\
& | \text{literal-constant}
\end{align*}

<table>
<thead>
<tr>
<th>Tuple</th>
<th>Record</th>
<th>Vector</th>
</tr>
</thead>
<tbody>
<tr>
<td>Local declarations</td>
<td>Conditional</td>
<td>Function</td>
</tr>
<tr>
<td>Selection</td>
<td>Location</td>
<td>Slice</td>
</tr>
<tr>
<td>Function application</td>
<td>Named value</td>
<td>Constant</td>
</tr>
</tbody>
</table>

Figure 3.1: \(\lambda\)-RTL expressions

In the latter two cases, the constituent expressions must be compile-time constants.

Groups are evaluated left to right, last-in, first-out, so for example the expression

\(([a \ b], [1 \ 2])\)

denotes the same sequence of four expressions as

\([([a, 1] (a, 2) (b, 1) (b, 2)])\)

The evaluation rule and the design of groups in general are based on the generator mechanism of the Icon programming language (Griswold 1982).

The primary role of groups is to make it easy to specify multiple machine instructions in a single attribute binding. This is done by using \textsf{value-spec} (either a single name or a sequence of names in square brackets) to form the opcodes of constructors. Multiple \textsf{value-specs} may be used to combine, for example, a group of operations with a group of suffixes. Chapters 5 and 6 have examples.

\textsf{opcode} \Rightarrow \textsf{value-spec}\{\textsf{value-spec}\}

**Lexical conventions**

Like identifiers in ML, \(\lambda\)-RTL identifiers may be alphanumeric or symbolic. An alphanumeric identifier begins with a letter and continues with letters, digits, underscores, and primes. A symbolic identifier is a sequence of the symbols

!\&+/:<=?\^\_\{\|\}^*\%\^-#

Such an identifier may begin with any symbol except #.

Identifiers consisting solely of two or more dashes introduce comments; the comment includes the dashes and continues to the end of the line. For example, -- and --- introduce comments, but --> and --***-- do not. A comment is equivalent to a newline.

An integer literal is a sequence of decimal digits; \(\lambda\)-RTL has no floating-point literals. \(\lambda\)-RTL also supports hexadecimal literals beginning with 0x, octal literals beginning with 0, and binary literals beginning with 0b.

Literals and identifiers preceded by # are intended to be used to represent widths in \(\lambda\)-RTL’s forthcoming type system; in the current implementation, they are equivalent to ordinary literals and identifiers.

**The initial basis**

Most of the RTL-specific content of \(\lambda\)-RTL is in the initial basis, i.e., the collection of predefined functions and values. Figure 3.2 shows \(\lambda\)-RTL’s initial basis.
Boolean constants:
true  Truth
false Falsehood

RTL operators and other functions used to create RTLs:
RTL.STORE Takes location and value, produces effect.
RTL.FETCH Fetches value from location.
RTL.SKIP The empty RTL; an effect that does nothing.
RTL.GUARD Takes a Boolean expression and an effect and produces an effect.
RTL.NOT Boolean negation.
RTL.EFFECTS Takes two effects (RTLs) and composes them so they take place simultaneously (list append on list of effects).

Functions on vectors:
sub Vector subscript, a left-associative infix operator with precedence 5.
Vector.spanning A curried function of two arguments. Vector.spanning \( x \ y \) produces the vector \([x, x+1, \ldots, y]\).
Vector.foldr A higher-order function used to visit every element of a vector. Like Vector.foldr in Standard ML '97.

Figure 3.2: \( \lambda \)-RTL's initial basis

**Style**

\( \lambda \)-RTL provides expressive power with few restrictions. Writers of specifications use the functions in \( \lambda \)-RTL's initial basis to create RTLs that correspond to the form prescribed in Chapter 2. Writers can also introduce new RTL operators. This freedom gives us ample scope for experimenting with different styles of description—as well as ample rope with which to hang ourselves. We expect our experiments to lead to a library of useful RTL operators, as well as useful \( \lambda \)-RTL functions. Chapter 4 gives a preliminary collection of RTL operators.

\( \lambda \)-RTL provides no looping or recursion constructs. Loops whose sizes are known in advance can be simulated by using Vector.foldr and Vector.spanning. Because this style will be familiar only to those well versed in functional programming, we expect eventually to provide some syntactic sugar for it.

**Differences between \( \lambda \)-RTL and Standard ML**

Although \( \lambda \)-RTL is inspired in large part by Standard ML, there are significant differences, which may interest ML programmers. Syntactic differences include:

- The defining connective is is, not =.
- if expressions must be terminated by fi.
- Identifiers declared infix must be explicitly identified as left-, right-, or non-associative. Arbitrarily many levels of precedence are available.
- Dot notation is used to select elements from records as well as from structures.
- Comments begin with an identifier consisting of \( n \) dashes, where \( n \) is at least 2. Comments end at the end of a line.
There are also some semantic restrictions:

- There are no side effects (mutation, exceptions), so order of evaluation does not matter.

- Functions may not be recursive, so the normal form of all expressions can be computed at compile time.

- There are no datatype constructors. As a corollary, the only patterns used in function definitions are those that match tuples or records.
Chapter 4

Basic RTL operators

This chapter describes RTL operators and functions that we expect to be useful for describing many different machines. As we do throughout this document, we use the Noweb literate-programming tool (Ramsey 1994) to present examples of λ-RTL. Noweb extracts the code from the same files used to produce this document, so we can run the examples through the λ-RTL compiler and make sure everything compiles.

A Noweb file contains explanatory text interleaved with named “code chunks,” which contain source code and references to other code chunks. The names of chunks appear italicized and in angle brackets, as in the following C code:

```
18a (summarize the problem 18a)≡
   printf("Inputs are: ");
   (for every i that is an input, print i and a blank 18b)
   printf("\n");

18b (for every i that is an input, print i and a blank 18b)≡
   for (i = 1; i < argc; i++)
   printf("%s ", argv[i]);
```

The ≡ sign indicates the definition of a chunk. Definitions of a chunk can be continued in a later chunk; Noweb concatenates their contents. Such a concatenation is indicated by a +≡ sign in the definition:

```
18c (summarize the problem 18a)+≡
   printf("Must process %d files\n", argc-1);
```

Noweb adds navigational aids to the document. Each chunk name ends with the number of the page on which the chunk’s definition begins. When more than one definition appears on a page, they are distinguished by appending lower-case letters to the page number. When a chunk’s definition is continued, Noweb includes pointers to the previous and next definitions, written “a18a” and “18c+.” The notation “(18a)” shows where a chunk is used.

Because λ-RTL does not yet have modules, we use Noweb to include the chunk ⟨RTL basics 19a⟩ in our descriptions of the SPARC and the Pentium.
4.1 Building RTLS

λ-RTL is intended for building RTLS, not analyzing them, so it doesn't expose all the structure of RTLS; single effects, guarded effects, and full RTLS (lists of guarded effects) all have the same type effect. The initial basis provides ways to create and combine effects. We define a more compact, infix notation for parts of the basis.

\[ \textit{RTL basics 19a} \]

\begin{verbatim}
  val do_nothing is RTL.SKIP -- type will be effect
  val --> is RTL.GUARD       -- type will be bool * effect --> effect
  val <-< is RTL.STORE       -- type will be #n loc * #n bits --> effect
  infix 1 -->
  infix 2 <--
\end{verbatim}

In the underlying representation, the semantics of these operations is extended to full RTLS in the obvious way. For example, RTL.STORE produces an RTL containing a single effect whose guard is true.

We don't need an explicit FETCH, because fetches are inserted automatically, and we don't need an explicit CONST, because λ-RTL compiles literal constants into RTL CONST nodes.

We use a semicolon to combine simultaneous effects. We plan eventually to have an andthen operator to indicate sequential composition of effects, which would presumably be implemented by forward substitution. Substitution isn't implemented in the current λ-RTL, so we make andthen equivalent to simultaneous effects, even though this semantics for andthen is incorrect.

\[ \textit{RTL basics 19a} + \Xi \]

\begin{verbatim}
  val ; is RTL.EFFECTS       -- type will be effect * effect --> effect
  val andthen is ;
  infixl 0 ;
  infixl "1 andthen
\end{verbatim}

4.2 Booleans

true, false, and RTL.NOT are the Boolean operators in λ-RTL's initial basis. We prefer to use not to stand for Boolean negation.

\[ \textit{RTL basics 19a} + \Xi \]

\begin{verbatim}
  val not is RTL.NOT
\end{verbatim}

For writing other Boolean functions λ-RTL has only if-then-else-fi, which λ-RTL rewrites into suitable guards. We would like to have connectives andalso and orelse, but introducing them presents a choice about level of detail: should they be introduced as new RTL operators, whose semantics must be defined outside of λ-RTL, or should they be defined in terms of the existing λ-RTL primitives? We would like to try both alternatives, because they provide different levels of detail in the generated RTLS.

Eventually, λ-RTL will have mechanisms to support multiple levels of abstraction, probably by using some sort of parameterized modules. For now, we resort to some literate-programming tricks that are equivalent to conditional compilation. nweb can extract this code in two versions, designated simple and full. Code chunks tagged [simple] and [full] appear only in the corresponding versions, and untagged code chunks appear in both versions.
In the simple version, the boolean connectives are left abstract. In the full version, we give them their standard definitions. The simple version has the advantage that the RTLs are smaller and easier to read, but the disadvantage that the semantics have to be specified elsewhere. In the full version, these connectives are expanded to primitives, so they automatically get their proper semantics.

\[\langle \text{RTL basics [simple]} \rangle \equiv\]
\[\text{rtlop [andalso orelse]} \quad \text{-- Boolean connectives}\]
Conditional definition.

\[\langle \text{RTL basics [full]} \rangle \equiv\]
\[\text{fun andalso } (p, q) \text{ is if } p \text{ then } q \text{ else false fi}\]
\[\text{fun orelse } (p, q) \text{ is if } p \text{ then true } q \text{ else fi}\]
Conditional definition.

No matter what their definitions, we want these connectives to be left-associative infix operators.

\[\langle \text{RTL basics 19a} \rangle \equiv\]
\[\text{infixi 3 orelse}\]
\[\text{infixi 4 andalso}\]

In the long run, equality and inequality will probably have to have some built-in semantics, but for now, we leave them abstract.

\[\langle \text{RTL basics 19a} \rangle \equiv\]
\[\text{rtlop } \left[= \leftrightarrow\right]\]
\[\text{infix 5 } \left[= \leftrightarrow\right]\]

We have chosen to make inequality a primitive RTL operator, although we could have defined it in terms of
equality by writing \(\text{val } \leftrightarrow \text{ is } \setminus (x, y)\). \(\text{not } (x = y)\).

We can convert between booleans and bits. Again we have the choice of making things abstract or concrete.

\[\langle \text{RTL basics [simple]} \rangle \equiv\]
\[\text{rtlop bit } \quad \text{-- type will be bool } \rightarrow \#1 \text{ bits } \quad \text{-- turn boolean into bit}\]
\[\text{rtlop bool } \quad \text{-- type will be #1 bits } \rightarrow \text{ bool } \quad \text{-- turn bit into boolean}\]

\[\langle \text{RTL basics [full]} \rangle \equiv\]
\[\text{fun bit } p \text{ is if } p \text{ then 1 else 0 fi}\]
\[\text{fun bool n is n } \leftrightarrow \text{ 0}\]

4.3 Integer arithmetic and comparisons

These operations represent integer arithmetic performed on the bit-vector representation. Multiplication
and division come in two flavors. One interprets its operands as unsigned integers; the other interprets
its operands as two's-complement signed integers. There are two sets of signed division operators because
division may truncate towards \(-\infty\) (\text{divs} and \text{mods}) or towards 0 (\text{quots} and \text{rems}). There is, of course, no
such distinction for unsigned division.

\[\langle \text{RTL basics 19a} \rangle \equiv\]
\[\text{rtlop [add subtract neg muls mulu divu div modu divs mods quot mods quot rem s <= > >=]}\]
\[\text{-- standard arithmetic operators of several different types}\]
The intended semantics of `add` and `subtract` is “add with carry” and “subtract with borrow.” We define shorthands for the common case where carry (or borrow) in is 0 and carry out is uninteresting:

\[<\text{RTL basics} \text{ 19a}>+\Xi\]

\[
\begin{align*}
\text{fun } +&(n, m) & \text{is } (\text{add}(n, m, 0)).\text{result} & \text{-- add returns } \{\text{result, carry}\} \\
\text{fun } -&(n, m) & \text{is } (\text{subtract}(n, m, 0)).\text{result} & \text{-- subtract returns } \{\text{result, borrow}\}
\end{align*}
\]

The arithmetic operators obey the usual rules of precedence.

\[<\text{RTL basics} \text{ 19a}>+\Xi\]

\[
\begin{align*}
\text{infixl 7 } & [\text{divs mods quot rem divu modu quotu remu}] \\
\text{infixl 6 } & [+ -] \\
\text{infixn 5 } & [< <= > >=]
\end{align*}
\]

Comparisons are left abstract instead of being defined in terms of `<` and `=.`

We can sign-extend or zero-extend integers. These operators should be polymorphic in the sizes of the arguments and results.

\[<\text{RTL basics} \text{ 19a}>+\Xi\]

\[
\text{rtloil} [\text{sx zx}] & \text{-- type will be } \#n \text{ bits } \to \#m \text{ bits}
\]

### 4.4 Logical operators

Most processors offer a variety of bitwise operations on bit vectors, including those shown here.

\[<\text{RTL basics} \text{ 19a}>+\Xi\]

\[
\begin{align*}
\text{rtloil} [\text{and or xor}] & \text{-- } \#n \text{ bits } \times \#n \text{ bits } \to \#n \text{ bits} \\
\text{rtloil com} & \text{-- } \#n \text{ bits } \to \#n \text{ bits } \text{-- ‘complement’ because ‘not’ is boolean} \\
\text{rtloil bitInsert} & \text{-- } \{\text{value} : \#w \text{ bits}, \text{lsb} : \#k \text{ bits}\} \to \#n \text{ bits } \to \#w \text{ bits} \\
\text{rtloil bitExtract} & \text{-- } \{\text{value} : \#w \text{ bits}, \text{lsb} : \#k \text{ bits}\} \to \#n \text{ bits}
\end{align*}
\]

`bitInsert` and `bitExtract` insert into or extract from a `w`-bit (“wide”) `value`. The thing inserted or extracted is `n` bits (“narrow”), and the widths of both wide and narrow values are normally inferred by the type system. (We may change the types planned for `bitInsert` and `bitExtract`.)

There are instructions that manipulate bit fields whose widths aren’t known statically. These computations can’t be expressed with `bitInsert` and `bitExtract`, because we can’t assign a type (width) to the result. But we can define `bitTransfer` as a combination extraction and insertion, and in fact, this is the way such instructions must behave. Even an instruction that extracts `k` bits from a 32-bit word, where `k` is not known until run time, must still insert the result into some storage location of known size—typically a 32-bit word of all zeros.

\[<\text{RTL basics} \text{ 19a}>+\Xi\]

\[
\begin{align*}
\text{rtloil bitTransfer} & \text{-- } \{\text{src: }\{\text{value:}\#w \text{ bits, lsb:}\#k \text{ bits}\}, \text{dst:}\{\text{value:}\#w \text{ bits, lsb:}\#k \text{ bits}\}, \text{width:}\#j \text{ bits}\} \to \#w \text{ bits} \\
& \text{-- extract width bits from src at lsb, insert into dst at lsb}
\end{align*}
\]

Most processors have shift instructions. We could make the shift operations primitive RTL operators, or we could define them in terms of bit transfer.

\[<\text{RTL basics} \text{ [simple] 20a}>+\Xi\]

\[
\begin{align*}
\text{rtloil} [\text{shl shr shra}]
\end{align*}
\]
4.5 Byte order

The current implementation of λ-RTL does not support aggregations, but we would like to be able to use them in our sample specifications, so we define both big- and little-endian aggregations to be the identity function.

4.6 Undefined effects

The current implementation of λ-RTL does not support the kill effects. In a later implementation, we plan to use the question mark to stand for an “undefined” value. All λ-RTL functions and RTL operators will be strict in this value, and RTL.STORE(1oc, ?) will become RTL.KILL 1oc. We would like to use the question mark in our example descriptions, even though λ-RTL doesn’t yet have the machinery to give it its eventual meaning, so we introduce it as a nullary RTL operator.

4.7 Floating-point arithmetic

Floating-point operations need to come in different families, so for example we could distinguish IEEE 754 floating point from VAX floating point. Right now the name space of RTL operators is flat, so such distinctions are impossible. It nevertheless seems reasonable to put these operators in a structure. Our treatment of floating-point operators is not complete; we only sketch some possibilities.
We use operators surrounded by slashes to designate floating-point arithmetic operations.

\[ \text{IEEE 754 floating point 23a)\equiv} \]

\[
\begin{align*}
\text{rtlop } [/+ /- /\ast /\backslash] & \quad \text{-- basic floating-point arithmetic} \\
\text{rtlop } [\text{fabs } \text{fsqrt}] & \quad \text{-- special floating-point operators}
\end{align*}
\]

All these operators will eventually be parameterized by the size (type) of their operands.

When we get a type system, the conversions will be polymorphic in two widths: the argument and result widths. We may need a whole family of float-to-integer operators in order to account properly for rounding modes.

\[ \text{IEEE 754 floating point 23a)\equiv} \]

\[
\begin{align*}
\text{rtlop } [\text{i2f } \text{f2i}] & \quad \text{-- conversions to/from integer.}
\end{align*}
\]

There are the special values \( \pm 0 \) and \( \pm \infty \).

\[ \text{IEEE 754 floating point 23a)\equiv} \]

\[
\begin{align*}
\text{rtlop } [\text{pzero } \text{mzero}] & \quad \text{-- plus and minus zero} \quad \text{-- type will be } \#\text{n bits} \\
\text{rtlop } [\text{pinf } \text{minf}] & \quad \text{-- plus and minus infinity} \quad \text{-- type will be } \#\text{n bits}
\end{align*}
\]

There are NaNs.

\[ \text{IEEE 754 floating point 23a)\equiv} \]

\[
\begin{align*}
\text{rtlop } \text{NaN} & \quad \text{-- type will be function from significand to value}
\end{align*}
\]

There are four rounding modes required by the standard.

\[ \text{IEEE 754 floating point 23a)\equiv} \]

\[
\begin{align*}
\text{structure Round is struct} \\
\text{rtlop } [\text{nearest zero down up}] & \quad \text{-- what to round toward end}
\end{align*}
\]
Chapter 5

Describing the SPARC, Version 8

This chapter illustrates \( \lambda \)-RTL by specifying a handful of SPARC instructions. Load and store instructions illustrate the basic techniques used to move data of different sizes. Logical instructions and add instructions show simple groups of computational instructions; each group offers a slightly different treatment of condition codes. Specifications of save and restore instructions illustrate one of several possible treatments of register windows. In this case we have not yet achieved our goal of separating hardware behavior from software conventions; our model of register windows describes the abstraction that is presented by the combination of hardware, calling convention, and operating system.

5.1 Storage spaces

Storage locations manipulated by SPARC instructions include memory and several kinds of registers. We have little to say about memory other than that the machine is byte-addressed and uses big-endian byte order.

\[
\begin{align*}
\text{(SPARC basics 24)} & \equiv \\
\text{storage} & \\
&m' & \text{is cells of 8 bits called "memory" aggregate using bigEndian} \\
\text{val bigE is bigEndian -- useful abbreviation} \\
\end{align*}
\]

We introduce the abbreviation bigE for cases where we want to write aggregation explicitly.

Integer Registers

We normally expect that a machine’s registers can be modeled as a simple collection of mutable cells, but the SPARC has two architectural features that don’t fit this model: an immutable register and register windows. Register 0 is immutable; fetches from register 0 always return zero, and stores into register 0 have no effect. Registers 1–7 are ordinary mutable cells, but registers 8–31 are aliases for locations in a collection of register windows. The register windows are overlapping sets of 16 registers; an implementation may provide from 2 to 32 sets. Bits 0–4 of the processor state register constitute a current window pointer (CWP). When an instruction refers to a register numbered 8–31, the processor uses the CWP and the register number to identify a window of 16 registers and a register in that window. The SPARC save and restore instructions manipulate the CWP, as do traps and returns from traps.
This low-level model would be easy to describe using $\lambda$-RTL, but it is not the model used by most compiler writers. Compiler writers seldom need to use register 0 explicitly, because SPARC assembly language provides "synthetic" instructions that use register 0 as needed. Instead of using the detailed semantics of register windows, compiler writers adhere to the SPARC calling convention, which (with some help from the operating system) gives the illusion of an infinite collection of register windows, and which allocates one register window to each activation of each procedure. The compiler must reserve space on the stack for use as backing store, and it must use `save` and `restore` in procedure prologs and epilogs.

$\lambda$-RTL is not biased toward any particular model of register windows, and in fact it could be used to specify different models which might be useful in different situations. Eventually, $\lambda$-RTL will have a modules system that will enable us to create different models, at different levels of abstraction, of such things as SPARC register windows. For this document, we have chosen a fairly abstract model that is convenient for compiler writers. We hide most of the low-level hardware behavior, and we describe registers (except register 0) as a simple collection of mutable cells. Section 5.2 shows the details of the model that are needed to specify the effects of the `save` and `restore` instructions.

Dealing with register 0 is fairly simple; there are only two reasonable models. One is the full semantics; the other is a model which specifies only that register 0 is immutable. The simpler model suffices for some compilers, and the resulting RTLS are easier to understand. As in Chapter 4, we use `[simple]` and `[full]` to show the two alternatives.

25a \begin{verbatim}
<SPARC basics [simple] 25a>≡

structure Reg is struct
  fun fetch n is $r[n]
  fun store (n, v) is n <> 0 --> $r[n] <== v
  (alias functions for in, out, local, and global registers 29a)
end

Conditional definition.

We make register 0 immutable by ensuring that attempts to store into it have no effect. The full semantics also shows that fetches return 0.

25b \begin{verbatim}
<SPARC basics [full] 25b>≡

structure Reg is struct
  fun fetch n is if n = 0 then 0 else $r[n] fi
  fun store (n, v) is n <> 0 --> $r[n] <== v
  (alias functions for in, out, local, and global registers 29a)
end

Conditional definition.

In this straightforward model of registers, we use these special fetch and store methods, and we also permit registers to be aggregated into pairs, so we can describe instructions like 1dd.

25c \begin{verbatim}
<SPARC basics 24>≡

storage
  'r' is 32 cells of 32 bits called "registers"
  store using Reg.store
  fetch using Reg.fetch
  aggregate using bigEndian

\end{verbatim}

1 "Optimized leaf procedures" may use their caller's register window.
**Integer Unit control/status registers**

This specification gives a bare minimum of information about special registers. It emphasizes condition codes.

\[\text{storage 'i' is 6 cells of 32 bits called "IU control/status registers" locations [PSR WIM TBR Y PC nPC] is $i[[0..5]]} \]

Properly speaking, the integer condition codes are part of the processor state register (PSR).

\[\text{storage '-'} \text{ alias bogus is cells of 32 bits called "bogus space for abstract locations"} \]

\[\text{rtllop [N Z V C] locations [N Z V C] is $bogus[[N Z V C]]} \]

**5.2 SPARC instructions**

**SPARC addresses and operands**

The SPARC has a simple structure for effective addresses and operands. As far as the hardware is concerned, there are only two addressing modes, depending on whether the immediate bit is used. We've chosen to let the default attribute of the addressing modes be the address, not the location in memory denoted by that address. The address is the value of register rs1, plus either the value of register rs2 or the result of sign-extending a 13-bit immediate value.

\[\text{default attribute of indexA(rs1, rs2) is $r[rs1] + r[rs2]} \]
\[\text{dispA (rs1, simm13) is $r[rs1] + sx simm13} \]

The rmode and imode constructors produce the values used in operands of type reg_or_imm.

\[\text{default attribute of rmode (rs2) is $r[rs2]} \]
\[\text{imode (simm13) is sx simm13} \]
Load instructions

The SPARC architecture provides for not one but 256 possible address spaces. By default, load instructions use space 0x0A in user mode and space 0x0B in supervisor mode. The mode is determined by the value of the S bit in the processor state register. Values from other address spaces may be obtained by using the “load from alternate space” instructions, but these instructions are privileged. We omit all this complexity from our description, treating the machine as if it were always in user mode. This omission is partly for simplicity, but partly because λ-RTL does not deal well with numbered collections of storage spaces.

The specifications of the load integer instructions give us our first opportunity to factor λ-RTL descriptions. On the left-hand side, the phrases [s u] and [b h] are like for loops, and the carets (”) join parts of constructor names, so the opcode on the left-hand side expands to the list of constructors ldsh, ldub, and ldah. Corresponding to [s u] on the right-hand side is the “expression group” [sx zx]. sx and zx were introduced in Section 4.3 to stand for sign extension and zero extension, respectively. Corresponding to [b h] is the expression group [#8 #16]. The # sign introduces an integer constant representing a width, and the effect here is to produce big-endian aggregations of 8 and 16 bits. Combining the two groups specifies four instructions—signed and unsigned versions of byte and halfword loads.

27a <instruction defaults 27a>≡

default attribute of

\[
\text{ld}^* [s u] [b h] \quad \text{(address, rd)} \quad \text{is} \quad r [rd] \leftarrow [sx \ xz] \quad \text{(bigE [#8 #16] \text{[address]})}
\]

This double factoring works smoothly because the two 2-groups are in the same order on the left- and right-hand sides.

We use another group to define load and load-double instructions, which don’t require sign extension or zero extension. For ldd, two 32-bit registers are implicitly aggregated to hold a 64-bit value.

27b <instruction defaults 27b>≡

\[
\text{ld}^* ["d] \quad \text{(address, rd)} \quad \text{is} \quad r [rd] \leftarrow \text{bigE [#32 #64] \text{[address]}}
\]

The chunks named <instruction defauls 27a> define the default attributes of instructions. We have chosen to use the default attributes to specify the “main effect” of these instructions, but there are circumstances in which the processor traps instead of executing this main effect. Traps are not relevant to all specifications, so we have chosen to give the trap semantics separately, by binding them to an attribute named trap. Again, with a proper modules system, trap semantics could be omitted from some specifications.

All loads except byte loads could trap if the address is improperly aligned. The load-double also traps if rd is an odd-numbered register.

27c <trap semantics 27c>≡

attribute trap of

\[
\text{ld}^* ["uh] \quad \text{(address, rd)} \quad \text{is} \quad \text{alignTraps(address, [4 2 2])}
\]

\[
\text{ldd(address, rd)} \quad \text{is} \quad \text{alignTraps(address, 8); rd0[@]} \leftarrow 0 \rightarrow \text{trap(illegal_instruction)}
\]

27d ⟨SPARC utilities 27d⟩≡

fun alignTraps (address, k) is \text{address \text{mod} k \leftarrow 0 \rightarrow \text{trap(mem_address_notAligned)}}

The interaction of the trap semantics with the main semantics is not specified in λ-RTL.

The instructions that load floating-point registers or coprocessor registers don’t illustrate anything new, so we have omitted them from this example specification.
Store instructions

The store-halfword and store-byte instructions use bit extraction to show what part of the register is stored.

\[\begin{align*}
\text{sth} & \quad (\text{rd, address}) \quad \text{is} \quad \$_m[\text{address}] \leftarrow \text{sr}[\text{rd}][16 \text{ bits at } 0] \\
\text{stb} & \quad (\text{rd, address}) \quad \text{is} \quad \$_m[\text{address}] \leftarrow \text{sr}[\text{rd}][8 \text{ bits at } 0] \\
\text{st} & \quad (\text{rd, address}) \quad \text{is} \quad \$_m[\text{address}] \leftarrow \text{bigE} \#32 \#64 \text{ sr}[\text{rd}]
\end{align*}\]

The trap semantics for store instructions is the same as for load instructions. A more aggressively factored specification might merge the trap semantics of the two groups.

\[\begin{align*}
\text{trap semantics 27c} & \quad \text{+} \\
\text{st} & \quad (\text{address, rd}) \quad \text{is} \quad \text{alignTrap}(\text{address}, [4 2]) \\
\text{std}(\text{rd, address}) & \quad \text{is} \quad \text{rd} @ [0] \leftarrow 0 \rightarrow \text{trap(illegal_instruction)}; \text{alignTrap}(\text{address, 8})
\end{align*}\]

Swapping load-store instructions

Because the semantics of a register-transfer list is that of simultaneous assignment, we can specify swapping instructions without resorting to temporaries.

\[\begin{align*}
\text{instruction defaults 27a} & \quad \text{+} \\
\text{ldstub}(\text{address, rd}) & \quad \text{is} \quad \text{sr}[\text{rd}] \leftarrow \text{zx} \$_m[\text{address}]; \$_m[\text{address}] \leftarrow 0\text{ff} \\
\text{swap} & \quad (\text{address, rd}) \quad \text{is} \quad \text{sr}[\text{rd}] \leftarrow \text{bigE} \#32 \$_m[\text{address}]; \text{bigE} \#32 \$_m[\text{address}] \leftarrow \text{sr}[\text{rd}]
\end{align*}\]

We plan some experiments to determine under what circumstances we want not to require explicit aggregation.

Trap semantics for swap is like that for load and store.

\[\begin{align*}
\text{trap semantics 27c} & \quad \text{+} \\
\text{attribute} & \quad \text{trap of swap(address, rd) is alignTrap(address, 4)}
\end{align*}\]

Save and restore instructions

To describe the effects of save and restore, we introduce a fictional storage space \(w\) to stand for the locations where registers are saved. Some of these locations correspond to hardware register windows; others correspond to reserved locations on the stack.

\[\begin{align*}
\text{register windows 28e} & \quad \text{+} \\
\text{storage} & \quad \text{is} \quad \text{cells of 32 bits called } \text{"register windows"} \\
\text{,} & \quad \text{alias dot is 1 cell of 32 bits called } \text{"register-window pointer"} \\
\text{locations} & \quad \text{winptr is } \$_{\text{dot}[0]}
\end{align*}\]

The location called winptr is analogous to the current window pointer (CWP), but they are not identical. At any moment during the execution of a program, cells \(w[0..\text{winptr}-1]\) hold the contents of registers that have been saved with previous save instructions. Other cells in \(w\) space have undefined contents. Figure 5.1 shows the layout of \(w\) space.
The registers have aliases, which are shown in Table 4-1 in the SPARC manual (SPARC 1992). To make it easier to specify the save and restore instructions correctly, we create functions that implement these aliases.

**Figure 5.1: Layout of w space used to model register windows**

29a

\[\text{alias functions for in, out, local, and global registers} \quad \text{(25)}\]

\[
\begin{align*}
\text{fun in' } \ n & \text{ is } \$r[n+24] \\
\text{fun local' } \ n & \text{ is } \$r[n+16] \\
\text{fun out } \ n & \text{ is } \$r[n+8] \\
\text{fun global } \ n & \text{ is } \$r[n]
\end{align*}
\]

We have to use in' and local' because in and local are reserved words in \(\lambda\)-RTL.

The true behavior of a save instruction is to decrement CWP and to trap if the new value is invalid (according to the Window Invalid Mask register). The normal trap handler provides the illusion of infinite register windows, by saving register windows on the stack and by adjusting the WIM register. We model this illusion as movement of out registers to in registers and movement of in and local registers to the \(w\) space.

Global registers are unchanged by save; Figure 5.1 helps clarify what happens to the others.

29b

\[\text{ SPARC utilities} \quad \text{(27d)} \quad \text{=} \quad \text{(35c)} \quad \text{<27d, 29c>} \]

\[
\begin{align*}
\text{fun saveOut } \ n & \text{ is } \text{Reg.in' } \ n \leftarrow \text{Reg.out } \ n \\
\text{fun saveLocal } \ n & \text{ is } \$w[\text{winptr}+8+n] \leftarrow \text{Reg.local' } \ n \\
\text{fun saveIn } \ n & \text{ is } \$w[\text{winptr}+n] \leftarrow \text{Reg.in' } \ n
\end{align*}
\]

We use functions from the built-in Vector structure to create do\& such that do\& \(f\) applies \(f\) to the integers 0 to 7 and returns the simultaneous composition of the resulting effects. If this idiom proves useful, we will provide syntactic sugar for it.

29c

\[\text{SPARC utilities} \quad \text{(27d)} \quad \text{=} \quad \text{(35c)} \quad \text{<29b, 30b>} \]

\[
\text{fun do\& } \text{f is Vector.foldr } (\backslash (n, \text{effects}). f \ n \ ; \ \text{effects}) \text{ RTL.SKIP } (\text{Vector.spanning } 0 \ 7)
\]

We would like simply to write

29d

\[\text{incorrect instruction specifications} \quad \text{(29d)} \quad \text{=} \quad \text{(30a)} \]

\[
\begin{align*}
\text{save (rsi, reg_or_imm, rd) is (} \\
\text{ do\& saveOut; do\& saveLocal; do\& saveIn; winptr } \leftarrow \text{winptr + 16;} \\
\text{ \$r[rd] } \leftarrow \text{\$r[rsi] + reg_or_imm)}
\end{align*}
\]

Conditional definition.
but the problem with this specification is that it is ill-formed whenever rd is an “in” register—the explicit assignment to rd conflicts with the assignment created by do8 saveOut, and there is no way to say which one takes priority. We do not want to write

\[\text{save (rs1, reg.or_imm, rd) is (do8 saveOut; do8 saveLocal; do8 saveIn; winptr } \leftarrow \text{ winptr } + 16\]
and then
\[\$r[rd] \leftarrow \$r[rs1] + \text{reg.or_imm}\]

because the right-hand side of the assignment would use values from the new register window, not from the old one. One possible, if unpleasant, way out of this dilemma is to guard the assignments created by saveOut, so that register rd is not touched:

\[\text{fun saveRegs rd is (let fun saveOut n is rd } \leftrightarrow \text{n+24 } \rightarrow \text{Reg.in’ n } \leftarrow \text{Reg.out n} \text{fun saveLocal n is }$w[\text{winptr+n}] \leftarrow \text{Reg.local’ n} \text{fun saveIn n is }$w[\text{winptr+n}] \leftarrow \text{Reg.in’ n in do8 saveOut; do8 saveLocal; do8 saveIn; winptr } \leftarrow \text{winptr } + 16\end{\text{end}}\]

We use similar tactics to specify restore.

\[\text{fun restoreRegs rd is (let fun restoreOut n is rd } \leftrightarrow \text{n+8 } \rightarrow \text{Reg.out n } \leftarrow \text{Reg.in’ n} \text{fun restoreLocal n is rd } \leftrightarrow \text{n+16 } \rightarrow \text{Reg.local’ n } \leftarrow \$w[\text{winptr+8+n}] \text{fun restoreIn n is rd } \leftrightarrow \text{n+24 } \rightarrow \text{Reg.in’ n } \leftarrow \$w[\text{winptr+16+n} \text{in do8 restoreOut; do8 restoreLocal; do8 restoreIn; winptr } \leftarrow \text{winptr } - 16\end{\text{end}}\]

\[\text{restore (rs1, reg.or_imm, rd) is restoreRegs rd; }\$r[rd] \leftarrow \$r[rs1] + \text{reg.or_imm}\]

**Logical instructions**

The SPARC doesn’t have a bitwise-complement instruction; instead, each logical instruction has a variant that complements its second operand. Using existing RTL operators, we define functions to represent these “logical-complement” operations. We use \text{com} for bitwise complement; \text{not} works only on booleans.

\[\text{fun [and in orn xnor] (a, b) is [and or xor] (a, com b)}\]

\[\text{fun [and in orn xnor] (a, b) is [and or xor] (a, com b)}\]
The logical instructions include variants that set condition codes and variants that leave condition codes unchanged. Because this is a common pattern among SPARC instructions, we define `leave_cc` and `set_cc` to help specify these two kinds of effects. Instructions that do set condition codes typically set the N and Z bits according to the value of an integer result. There is no single typical treatment of the 0 and C bits, so we require that values for these bits be passed in.

31a \langle SPARC utilities 27d \rangle+\equiv
\begin{align*}
\text{fun set_cc} \ (\text{result, overflow, carry}) \ is \\
\text{icc.N } &\leftarrow\text{ bit (result < 0);} \\
\text{icc.Z } &\leftarrow\text{ bit (result = 0);} \\
\text{icc.V } &\leftarrow\text{ overflow;} \\
\text{icc.C } &\leftarrow\text{ carry.}
\end{align*}
\text{fun leave_cc} \ (\text{result, overflow, carry}) \ is\ \text{do nothing}

The logical instructions clear overflow and carry.

31b \langle SPARC utilities 27d \rangle+\equiv
\begin{align*}
\text{fun logical_cc} \ (\text{result}) \ is \ set_cc(\text{result, 0, 0})
\end{align*}

The logical instructions are among the SPARC instructions that take the form

\[ \oplus \text{ rs1, reg_or_imm, rd} \]

where \( \oplus \) is a binary operator. We define the function `binary` to get the effect of this common form.

31c \langle SPARC utilities 27d \rangle+\equiv
\begin{align*}
\text{fun binary} \ (\text{operator, rs1, r_o_i, rd}) \ is \ $r[rd] \leftarrow\text{ operator($r[rs1], r_o_i)}
\end{align*}

This function by itself isn’t sufficient for variants that set the condition codes. `binary with cc` combines the main effect with whatever effects are produced by `special_cc`, a code-setting function that is passed in. If this function is `leave_cc`, the condition codes won’t be changed.

31d \langle SPARC utilities 27d \rangle+\equiv
\begin{align*}
\text{fun binary_with_cc} \ (\text{operator, rs1, r_o_i, rd, special_cc}) \ is \\
\text{let result } &\leftarrow\text{ operator($r[rs1], r_o_i)} \\
\text{in } $r[rd] &\leftarrow\text{ result; special_cc result end}
\end{align*}

We use factoring and these utility functions to specify all the logical instructions at once.

31e \langle instruction defaults 27a \rangle+\equiv
\begin{align*}
\text{and or xor andn orn xnor}” [cc “] \ (\text{rs1, reg_or_imm, rd}) \ is \\
\text{binary_with_cc}([\text{and or xor andn orn xnor}], \text{rs1, reg_or_imm, rd}, [\text{logical_cc leave_cc}])
\end{align*}

Add instructions

Our treatment of the add instructions is similar to our treatment of the logical instructions, except

- These instructions can set the overflow and carry bits, and
- Addition is a trinary operator, because there may be “carry in.”

Overflow semantics is a bit tricky, so in the [[simple]] version of the specification we leave it abstract.

31f \langle SPARC utilities [[simple]] 31f \rangle+\equiv
\begin{align*}
\text{rtlop add_overflows}
\end{align*}

Conditional definition.

31
fun add_overflows (x, y, c) is
  let val {result, carry} = add(x, y, c)
  in x[31] = y[31] andalso x[31] <> result[31]
end

Conditional definition.

An add-like operator takes three arguments and returns two results, so we can't use binary_with_cc. The main effect always takes one result and stores it in rd, plus there may be a secondary effect on the condition codes. This secondary effect requires all the operands.

fun add_like (operator, rs1, operand2, operand3, rd, special_cc) is
  let val main = (operator($rs1, operand2, operand3)).result
  in $rd <> main; special_cc ($rs1, operand2, operand3)
end

fun add_cc (x, y, c) is
  let val {result, carry} = add(x, y, c)
  in set_cc(result, bit(add_overflows(x, y, c)), carry)
end

These functions suffice to specify the add instructions, but the naming of the instructions makes the factoring a bit tricky. A 4-group on the left-hand side corresponds to two 2-groups on the right-hand side. Groups are evaluated in left-to-right LIFO order, so the rightmost group varies most rapidly, and so for example the addcc constructor corresponds to the use of 0 for operand3 and the use of add_cc for special_cc.

[add addcc addx addzcc] (rs1, reg_or_imm, rd) is
add_like(add, rs1, reg_or_imm, 10 (icc.C), rd, [leave_cc add_cc])

One might wish that icc.C did not require parentheses here.

Multiply instructions

The multiplication and division instructions treat a general-purpose register and the Y register as a single 64-bit register. The Reg64 structure shows how to use such a pair to hold a 64-bit value. To put the value in the pair, we put the most significant 32 bits in the Y register and the least significant 32 bits in the general-purpose register. To recover the value, we zero-extend the general-purpose register to 64 bits, then insert the contents of the Y register in place of the most significant 32 bits (which we know to be zeroes).
The multiply instructions produce a 64-bit result, and they have their own way of setting condition codes, so we define another pair of auxiliary functions.

\[
\text{fun mul\_like (operator, rs1, r\_o\_i, rd, special\_cc) is}
\]
\[
\text{let val result is operator($r[rs1], r\_o\_i)}
\]
\[
\text{in Reg64.set(rd, result); special\_cc result}
\]
\[
\text{end}
\]

fun mul\_cc(result) is set\_cc(result, ?, ?)

An undefined, rather than unspecified, value is the right thing for the V and C bits, because the manual says that “specification of this condition code may change in a future revision to the architecture. Software should not test this condition code.”

Branch on integer condition codes

The SPARC processor can branch on any of 16 conditions, each of which is a function of the 4 condition-code bits. Here, we give the tests abstractly, without showing the functions. The val declaration of the 16 tests hides the previous declaration of the same identifiers as RTL operators.

\[
\text{struct Ic\_test is struct}
\]
\[
\text{rtlop [A N NE E G LE GE L GU LEU CC CS POS NEG VC VS]}
\]
\[
\text{-- type is function from integer condition codes to boolean}
\]
\[
\text{val [A N NE E G LE GE L GU LEU CC CS POS NEG VC VS] is}
\]
\[
\text{[A N NE E G LE GE L GU LEU CC CS POS NEG VC VS] icc}
\]
\[
\text{-- functions are applied to icc, and results are boolean}
\]
\[
\text{end}
\]
Here, we define the test conditions as specified in the manual. We let Z, N, V, and C stand for the proper bits within the condition code, and we use or, xor, and not as bit operations, not boolean operations. Finally, after all the bit computations, we use the utility function bool to turn the result into a boolean, so it can be used as a guard.

\[\textit{SPARC utilities } [\text{full}] \text{32a}\] +

\begin{verbatim}
structure IccTest is struct
  val [A N NE E O LE GE L GU EU LEU CC CS POS NEG VC VS] is
  val not is com -- make usage conform to manual
  infix 3 [or xor]
in  bool [ 1 0
  (not Z) Z
  (not (Z or (N xor V))) (Z or (N xor V))
  (not (N xor V)) (N xor V)
  (not (C or Z)) (C or Z)
  (not C) C
  (not N) N
  (not V) V]
end

end
\end{verbatim}

The way we specify the branch instructions may look a bit odd, because the selection of a condition, which we are used to thinking of as a postfix notation like x:A, becomes an ordinary functions by virtue of being parenthesized, e.g., (x:A).\(^2\) These functions are then grouped and applied to IccTest. The result of the application is a group of tests, each of which is the appropriate member of the IccTest structure. That group is then used to guard an assignment to nPC, which models a delayed branch.

\[\textit{instruction defaults } 27a\] +

\begin{verbatim}
b"[a n e e g le ge l gu leu cc cs pos neg vc vs] (target, annul) is
  let val conditionOf is [(A) (N) (NE) (E) (G) (LE) (GE) (L)
    (GU) (LEU) (CC) (CS) (POS) (NEG) (VC) (VS)]
  in  conditionOf IccTest --> nPC <-- target
end
\end{verbatim}

To specify when the instruction in the delay slot should be annulled, we define an attribute annul_delay. Annuling occurs only if the annul bit is set. For the conditional branches, there is an additional requirement that the branch not be taken.

\[\textit{other attributes of instructions } 34c\] +

\begin{verbatim}
attribute annul_delay of
  b"[a n e e g le ge l gu leu cc cs pos neg vc vs] (target, annul) is
    annul <> 0 andalso
    (iccTest.LE) (iccTest.GE) (iccTest.L)
    (iccTest.GU) (iccTest.LEU) (iccTest.CC) (iccTest.CS)
    (iccTest.PS) (iccTest.NEG) (iccTest.VC) (iccTest.VS)])
  [ba bn] (target, annul) is annul <> 0
\end{verbatim}

\(^2\)It's possible that we should drop this notation in favor of the somewhat clearer (\(\land x, A\)).
Call and jump instructions

Both call and jump instructions save the current value of the program counter.

\[\text{call (target)} \quad \text{is nPC} \leftarrow \text{target}; \quad \text{Reg.out 7} \leftarrow \text{PC}\]
\[\text{jmpl (address, rd)} \quad \text{is nPC} \leftarrow \text{address}; \quad \$t[rd] \leftarrow \text{PC}\]

5.3 Putting it all together

This code chunk is expanded into our \(\lambda\)-RTL description of the SPARC. Actually, it is expanded into two different descriptions, depending on whether the \([\text{simple}]\) or \([\text{full}]\) alternative is chosen. The \text{Sparc} structure includes not only the code from this chapter but also the \(\langle \text{RTL basics (conditional)} \rangle\) from Chapter 4.

\[\text{structure Sparc is struct} \quad \text{(RTL basics (conditional))}\]
\[\text{(SPARC basics 24)}\]
\[\text{(register windows 28e)}\]
\[\text{(address modes and operands 26d)}\]
\[\text{(window dressing for trap semantics 35d)}\]
\[\text{(SPARC utilities 27d)}\]
\[\text{default attribute of}\]
\[\text{(instruction defaults 27a)}\]
\[\text{(other attributes of instructions 34c)}\]
\[\text{attribute trap of}\]
\[\text{(trap semantics 27c)}\]
\[\text{end}\]

Code written to file \text{sparc.rtl}.

All that is left to do is to define what we mean by \text{trap}. Ideally, we would like a \(\lambda\)-RTL abstraction mechanism that would let us define \text{trap} as “an unspecified function from a 7-bit code to an effect.” Without a type system or a modules system, we have to specify a concrete effect. Rather than try to specify the complete semantics of traps, we model a trap as an assignment to a nonexistent location.

\[\text{storage ’t’ is 1 cell of 7 bits called “bogus trap value”}\]
\[\text{fun trap k is } \$t[0] \leftarrow k\]

These bindings include only the “trap type” of a handful of traps. We have omitted trap priorities.

\[\text{val [data_store_error}\]
\[\text{illegal_instruction}\]
\[\text{mem_address_notAligned}\]
\[\text{tag_overflow}\]
\[\text{]} \quad \text{is [0x2b 0x02 0x07 0x0a] -- 7-bit constants to be passed to \text{trap}}\]
Chapter 6

Describing the Intel Pentium

This chapter shows how we use A-RTL to describe some aspects of the Pentium instruction set that don’t have obvious analogs on the SPARC. Most Pentium instructions come in byte, word (16-bit), and doubleword variants, and these variants treat parts of the Pentium registers as first-class locations. We show how to work with these locations and how to give them their usual names. A single effective address can refer to different parts of a register depending on the opcode with which it is used, so we specify the meanings of effective addresses as triples from which one element is selected depending on context.

By a mixture of opcodes and instruction prefixes, the Pentium offers many variants of each basic operation, so factoring the specification is a concern. We use the logical instructions to illustrate aggressive factoring, different notations for instructions that implement binary operations, and a slightly different treatment of condition codes.

The large-scale structure of the Pentium description resembles that of the SPARC description, except that we add some examples that use infix and.

\[ \text{structure Pentium is struct} \]
\[ \langle \text{RTL basics (conditional)} \rangle \]
\[ \langle \text{basics 37a} \rangle \]
\[ \langle \text{effective-address utilities 39a} \rangle \]
\[ \langle \text{effective addresses 40a} \rangle \]
\[ \text{val} \ [\text{OF CF AF SF ZF PF}] \ is \ [(\text{Reg.OF}) \ (\text{Reg.CF}) \ (\text{Reg.AF}) \ (\text{Reg.SF}) \ (\text{Reg.ZF}) \ (\text{Reg.PF})] \]
\[ \text{utilities 39d} \]
\[ \text{local} \]
\[ \text{infixl 3 and} \]
\[ \text{in} \]
\[ \text{default attribute of} \ \langle \text{instruction defaults using infix and 41b} \rangle \]
\[ \text{end} \]
\[ \text{default attribute of} \ \langle \text{instruction defaults 43a} \rangle \]
\[ \text{end} \]

Code written to file intel.rtl.
6.1 Storage spaces

We begin with the basics—registers and memory. The Intel registers require no special fetch or store methods. The machine uses little-endian byte order.

37b (basics 37a)\(\equiv\) (36) 37b>
structure Reg is struct
  (registers 37c)
end

It is straightforward to define these locations in \(\lambda\)-RTL.

37c (registers 37c)\(\equiv\) (37b) 37d>
locations

  [ EAX ECX EDX EBX ESP EBP ESI EDI ] is \(\text{\$r}[0..7]\)
    -- EAX is \(\text{\$r}[0]\), etc...

  [ AX CX DX BX SP BP SI DI ] is \(\text{\$r}[0..7]\) \& \(0..15\)

  [ AL CL DL BL ] is [EAX ECX EDX EBX] \& [8 bits at 0]
  [ AH CH DH BH ] is [EAX ECX EDX EDX] \& [8 bits at 8]

Dealing with the instruction encoding is more complex. Registers are referred to by number, but as can be seen from Figure 6.1, the number doesn’t uniquely determine the location. In fact, a register number maps to one of three locations, depending on context. These three arrays specify which register, or part thereof, is denoted by which number in which context:

37d (registers 37c)\(\equiv\) (37b) 37c 39a>
val byte is [. AL, CL, DL, BL, AH, CH, DH, BH .]
val word is [. AX, CX, DX, BX, SP, BP, SI, DI .]
val dword is [. EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI .]

To find out which location is denoted by a register number, we use the number to index the array chosen to represent the proper context.
Figure 6.1: Intel Pentium registers
Status and Control Registers

The Pentium has two 32-bit status and control registers, EFLAGS and EIP. EIP is the instruction pointer. EFLAGS is a collection of 1-bit status flags and 1- or 2-bit control and system flags.

\( \text{storage} 'c' \text{ is 2 cells of 32 bits called "control registers"} \)
locations
[ EFLAGS EIP ] is $c[[0..1]]

What are commonly called condition codes are referred to as status flags in Intel documentation. The Pentium has six.

\( \text{locations} \)
[ CF PF AF ZF SF OF ] is EFLAGS @ [1 bit at [0 2 4 6 7 11]]
-- carry parity auxiliary-carry zero sign overflow

Conditional definition.

In the simplified specification, we pretend that flags occupy their own storage space.

\( \text{storage} 'F' \text{ is 6 cells of 1 bit called "abstraction representing flag bits"} \)
locations [ CF PF AF ZF SF OF ] is $F[[0..5]]

Conditional definition.

Most instructions set flags in stylized ways, as shown in Table 3-2 on page 3-14 of the Pentium manual (Intel 1993). In particular, the values of SF, ZF, and PF are determined by simple tests of the result of an operation. We capture this common semantics in the \( \lambda \)-RTL function set_flags. OF, AF, and CF are set based on conditions that are different for different kinds of instructions, so we pass their values to set_flags. We pass a record, not a tuple, to set_flags, so the order in which values are given does not matter.

Defining the other flags within EFLAGS is straightforward, and they are omitted from this example.

Effective addresses

Within an effective address, a register number may denote one of three locations, depending on context. We define functions regb, regw, and regd to be used in these contexts.

\( \text{fun regb n is Reg.byte sub n -- returns an 8-bit location} \)
\( \text{fun regw n is Reg.word sub n -- returns a 16-bit location} \)
\( \text{fun regd n is Reg.dword sub n -- returns a 32-bit location} \)
The context is usually determined by the opcode, not within the effective address itself. We would nevertheless like an effective address to have a denotation (a default attribute) independent of any context. Our solution to this problem is to let the denotation be a record with three elements—one for each context. The \texttt{R} function creates such a record from a register number. We name the elements \texttt{b}, \texttt{w}, and \texttt{d} to stand for the byte, word, and doubleword locations.

\begin{align*}
\text{(effective addresses 40a)} & \equiv \\
\text{fun } & \texttt{R n} \quad \text{is } \{ \texttt{b is regb n}, \texttt{w is regw n}, \texttt{d is regd n} \}
\end{align*}

An address in memory also stands for one of three locations, depending on context. (Because a location includes \texttt{size} as well as address, we get one byte, two bytes, or four bytes, all at the same address.) The \texttt{M} function does for memory what \texttt{R} does for registers: turn a number into a collection of locations, one of which will be used based on context.

\begin{align*}
\text{(effective addresses 40a) + } & \equiv \\
\text{fun } & \texttt{M address} \quad \text{is } \{ \texttt{b is littleEndian #8 $m[address]} \\
& \quad \texttt{, w is littleEndian #16 $m[address]} \\
& \quad \texttt{, d is littleEndian #32 $m[address]} \}
\end{align*}

Finally, in order to make immediate operands look like register or memory operands, we define an \texttt{I} function, which creates a record that provides the \textit{same} immediate value, no matter what the context.

\begin{align*}
\text{(effective addresses 40a) + } & \equiv \\
\text{fun } & \texttt{I immed} \quad \text{is } \{ \texttt{b is immed, w is immed, d is immed} \}
\end{align*}

Effective addresses denote registers or memory locations.

\begin{align*}
\text{(effective addresses 40a) + } & \equiv \\
\text{default attribute of} \\
& \begin{align*}
\text{Indir} \quad (r) & \quad \text{is } \texttt{M (regd r)} \\
\text{Disp8} \quad (d, r) & \quad \text{is } \texttt{M (sx d + regd r)} \\
\text{Disp32} \quad (d, r) & \quad \text{is } \texttt{M (sx d + regd r)} \\
\text{Index} \quad (\text{base, index, ss}) & \quad \text{is } \texttt{M (regd base + regd index << ss)} \\
\text{Index8} \quad (d, \text{base, index, ss}) & \quad \text{is } \texttt{M (regd base + sx d + regd index << ss)} \\
\text{Index32} \quad (d, \text{base, index, ss}) & \quad \text{is } \texttt{M (regd base + sx d + regd index << ss)} \\
\text{ShortIndex} \quad (d, \text{index, ss}) & \quad \text{is } \texttt{M (sx d + regd index << ss)} \\
\text{Abs32} \quad (a) & \quad \text{is } \texttt{M a} \\
\text{Reg} \quad (r) & \quad \text{is } \texttt{R r}
\end{align*}
\end{align*}

We define C notation for 32-bit shifts.

\begin{align*}
\text{(effective-address utilities 39e) + } & \equiv \\
\text{val } & \texttt{[>> << is } (\texttt{n, k}\.\texttt{[shl shl] (32, n, k)} \\
& \quad \texttt{[>> <<]} \texttt{) }
\end{align*}
Instructions

The Pentium architecture provides more opportunities for factoring than does the SPARC. Most instructions come in three size variants, and each variant uses one of the three contexts. These variants would be tedious to specify repetitiously, so we use \( \lambda \)-RTL's higher-order functions and generators to reduce repetition. A very common pattern of execution is the "two-operand instruction," which we can write

\[ l \leftarrow l \oplus r \]

where \( \oplus \) is a generic binary operator, \( l \) is an effective address dependent on context, and \( r \) could be a context-dependent effective address or an immediate value that has been "put in context" by the \( I \) function.

The \( \lambda \)-RTL function \( \text{llr} \) embodies this pattern of computation. It takes two (curried) arguments: the triple \((l, \oplus, r)\), and a size function that determines the context by choosing the \( b \), \( \mathbf{w} \), or \( d \) element from each operand record.\(^1\)

41a \(<\text{utilities 39d}>\)\(+\)

\[
\text{fun llr (left, op, right) is } \text{size left} \leftarrow \text{op(size left, size right)}
\]

\[
\text{val [b w d] is } [(\cdot b) (\cdot w) (\cdot d)]
\]

We also define functions \( b \), \( \mathbf{w} \), and \( d \) to be used as size. The parentheses around \( .b \), \( .\mathbf{w} \), and \( .d \) are required because record-field selection is normally a postfix operation.

Logical instructions

The Pentium has 42 instructions that implement the three logical operations AND, OR, and XOR. (\( \lambda \)-RTL counts different size variants as different instructions.) All these instructions have side effects on the status flags (condition codes) as well as the \( l \leftarrow l \oplus r \) effect. This section shows progressively more aggressive ways to use \( \lambda \)-RTL to describe such large groups of similar instructions.

In \( \lambda \)-RTL, one can declare any identifier to be an infix operator. Infix operators can be convenient when specifying single instructions, as shown in the specification of these three variants of AND, which operate on the A register.

41b \(<\text{instruction defaults using infix and 41b}>\)\(=\)

\[
\begin{align*}
\text{ANDiAL} (i8) & \quad \text{is } \text{Reg.AL} \leftarrow \text{Reg.AL and i8} \\
\text{ANDiAX} (i16) & \quad \text{is } \text{Reg.AX} \leftarrow \text{Reg.AX and i16} \\
\text{ANDiEAX} (i32) & \quad \text{is } \text{Reg.EAX} \leftarrow \text{Reg.EAX and i32}
\end{align*}
\]

Infix notation is less convenient when describing groups of instructions. Here we use the \( \text{llr} \) function defined above to define some variants that work on two or three sizes of values. The parentheses around \( \text{and} \) convert it from an infix operator to an ordinary value that can be passed as a parameter to \( \text{llr} \).

41c \(<\text{instruction defaults using infix and 41b}>\)\(+\)

\[
\begin{align*}
\text{ANDi } [b w d] (\text{addr}, i) & \quad \text{is } \text{llr (addr'), (and), I i) [b w d]} \\
\text{ANDi } [w d] (\text{addr}, i) & \quad \text{is } \text{llr (addr', (and), I 1 (sx i)) [w d]} \\
\text{ANDMr } [b w d] (\text{addr}, \mathbf{r}) & \quad \text{is } \text{llr (addr', (and), R reg) [b w d]} \\
\text{ANDMr } [b w d] (\text{addr}, \mathbf{r}) & \quad \text{is } \text{llr (R reg, (and), addr') [b w d]}
\end{align*}
\]

\(^1\)This function won't work quite the way we want under a Hindley-Milner type system, because some uses require the \text{size} parameter to be polymorphic. The problem can be corrected by making \text{size}, \( b \), \( \mathbf{w} \), and \( d \) pairs, and by applying one element to \text{left} and another to \text{right}.

41
The I and R functions create three-element records that are selected from by the size functions also passed to \texttt{l1r}. Groups in square brackets are used to define multiple variants at once. Each variant uses either b, w, or d as a size function.

We would like the parameter \texttt{addr} also to represent a three-element record, since that is the type of the default attribute of an effective address, as defined in Section 6.1. Unfortunately, the current implementation of λ-RTL has no type information, and it cannot infer that \texttt{addr} must represent a record containing locations. We therefore use the bogus value \texttt{addr'} in place of the address parameter \texttt{addr}. When λ-RTL is improved, these definitions will go away, and \texttt{addr'} will be replaced with \texttt{addr}.

\begin{verbatim}
42a ⟨utilities 39d⟩+≡
   rtlop stands_for_addr
   val addr’ is M stands_for_addr

   The l1r function helps factor the specification, but the ordinary function-application syntax is not very evocative of what is going on. Here, we use the infix operators :=, <, and \textgreater{} with non-standard meanings, creating a sort of "mixfix" notation that is more suggestive of the semantics:

   \[
   dst := l \lhd r.\]

   We use the notation first, then define it.

42b ⟨instruction defaults using infix and 41b⟩+≡
   ANDiAL (i8) is Reg.AL <- Reg.AL and i8
   ANDiAX (i16) is Reg.AX <- Reg.AX and i16
   ANDiEAX (i32) is Reg.EAX <- Reg.EAX and i32
   ANDi “[b w d](addr, i) is bin (addr’ := addr’ <(and)> I i) [b w d]
   ANDio “[w d]”b (addr, i) is bin (addr’ := addr’ <(and)> I (sx i)) [ w d]
   ANDmr “[b ow od] (addr, reg) is bin (addr’ := addr’ <(and)> R reg) [b w d]
   ANDmr “[b ow od] (addr, reg) is bin (R reg := R reg <(and)> addr’) [b w d]

   We implement the notation by defining :=, <, and \textgreater{} to build up records, then defining \texttt{bin} to create the effect describing the assignment. Since < and \textgreater{} have existing definitions from Chapter 4, we save those definitions in 1t and gt.

42c ⟨utilities 39d⟩+≡
   nonfix [ < ] -- save these for later
   fun < (l, operator) is \{ l is l, operator is operator \}
   fun > (l, operator), r) is \{ l is l, operator is operator, r is r \}
   fun := (dst, {l, operator, r}) is \{ dst is dst, l is l, operator is operator, r is r \}
   infix ^"10 <
   infix ^"11 >
   infix ^"12 :=

   fun bin {dst, l, operator, r} is \size . size dst <- operator (size l, size r)
\end{verbatim}
It turns out that OR and XOR have the same structure as AND, so we factor the description to include those instructions as well. Once we do that, there’s no point in making the operators infix.

\(\langle\text{instruction defaults}\rangle \equiv\)

\[
\begin{align*}
\text{AND OR XOR}^i & \text{AL} \quad \text{i8} \quad \text{is Reg.AL} \leftarrow \text{[and or xor]} \quad \text{(Reg.AL, i8)} \\
\text{AND OR XOR}^i & \text{AX} \quad \text{i16} \quad \text{is Reg.AX} \leftarrow \text{[and or xor]} \quad \text{(Reg.AX, i16)} \\
\text{AND OR XOR}^i & \text{EAX} \quad \text{i32} \quad \text{is Reg.EAX} \leftarrow \text{[and or xor]} \quad \text{(Reg.EAX, i32)} \\
\text{AND OR XOR}^i & [\text{b w d}] \quad \text{(addr, i)} \quad \text{is} \\
\text{bin (addr} & \leftarrow \text{addr} \leftarrow \text{[and or xor]} \leftarrow \text{I i}) \quad \text{[b w d]} \\
\text{AND OR XOR}^i & \text{[ow od]} \quad \text{b (addr, i)} \quad \text{is} \\
\text{bin (addr} & \leftarrow \text{addr} \leftarrow \text{[and or xor]} \leftarrow \text{I (sx i)}) \quad \text{[w d]} \\
\text{AND OR XOR}^i & \text{mr} \quad \text{[b ow od]} \quad \text{(addr, reg)} \quad \text{is} \\
\text{bin (addr} & \leftarrow \text{addr} \leftarrow \text{[and or xor]} \leftarrow \text{R reg}) \quad \text{[b w d]} \\
\text{AND OR XOR}^i & \text{mm} \quad \text{[b ow od]} \quad \text{(addr, reg)} \quad \text{is} \\
\text{bin (R reg} & \leftarrow \text{R reg} \leftarrow \text{[and or xor]} \leftarrow \text{addr'}) \quad \text{[b w d]}
\end{align*}
\]

Now that we have machinery in place, a minor extension will enable us also to specify the effects of these instructions on the status flags (condition codes). Section 4.4.1 of the Pentium manual says “The AND, OR, and XOR instructions clear the OF and CF flags, leave the AF flag undefined, and update the SF, ZF, and PF flags.” The function logical_flags implements this behavior.

\(\langle\text{utilities 39d}\rangle \equiv\)

\[
\begin{align*}
\text{fun logical_flags n is set_flags \{result is n, o is 0, c is 0, a is ?\}}
\end{align*}
\]

In our description of the SPARC, we defined a “main effect” function like bin and passed it a function that sets the condition code. Here, we use a slightly different approach—we define bin’ to return both the main effect and the result of the instruction, and we pass bin’ and its arguments to a function that sets the condition code.

\(\langle\text{utilities 39d}\rangle \equiv\)

\[
\begin{align*}
\text{fun bin' \{dst, l, operator, r\} is} \\
\text{\size.} \\
\text{let val result is operator (\size l, \size r)} \\
\text{in \{result is result, effect is \size dst \longleftarrow \size result\}} \\
\text{end}
\end{align*}
\]

logical combines the “main effect” and the effects on the flags.

\(\langle\text{utilities 39d}\rangle \equiv\)

\[
\begin{align*}
\text{val logical is main.\arg.\size.} \\
\text{let val \{result, effect\} is main \arg \size} \\
\text{in \effect; logical_flags result} \\
\text{end}
\end{align*}
\]
Here’s the final definition of 42 Pentium instructions, including their effects on the flags. We have re-cast the first three groups so we can apply logical. Because Reg.AL represents a single location, not a triple, we use \(x.x\) (the identity function) as a “size selector.” The same trick applies to Reg.AX and Reg.EAX.

\[
\begin{align*}
\text{44a} & \quad \langle \text{instruction defauls 43a} \rangle \vdash \exists \\
& \quad \begin{cases}
\text{[AND OR XOR]}^{-i}AL \ (i8) & \text{is logical bin' (Reg.AL := Reg.AL \langle [\text{and or xor}] \rangle i8) \ (x.x)} \\
\text{[AND OR XOR]}^{-i}AX \ (i16) & \text{is logical bin' (Reg.AX := Reg.AX \langle [\text{and or xor}] \rangle i16) \ (x.x)} \\
\text{[AND OR XOR]}^{-i}EAX \ (i32) & \text{is logical bin' (Reg.EAX := Reg.EAX \langle [\text{and or xor}] \rangle i32) \ (x.x)} \\
\text{[AND OR XOR]}^{-i}[b w d] (\text{addr, i}) & \text{is logical bin' (\text{addr'} := \text{addr' \langle [\text{and or xor}] \rangle i (sx i)}) \ [\ b \ w \ d]}
\end{cases}
\end{align*}
\]

The remaining logical instruction, NOT, comes in only three variants, and it does not affect the condition codes.

\[
\begin{align*}
\text{44b} & \quad \langle \text{instruction defauls 43a} \rangle \vdash \exists \\
& \quad \text{NOT}^{-[b \ w \ d]} (\text{addr}) \text{ is unary (com, addr')} \ [\ b \ w \ d]
\end{align*}
\]

\[
\begin{align*}
\text{44c} & \quad \langle \text{utilities 39d} \rangle \vdash \exists \\
& \quad \text{fun unary (op, v) is \(\text{\textbackslash size } . \text{size } v \leftarrow \text{op } (\text{size } v)\)}
\end{align*}
\]
Chapter 7

Index

The indices are split into two pieces, one for the SPARC and one for the Intel. We’ll combine them one day.

7.1 List of chunks

(for every i that is an input, print i and a blank 18b)
(IEEE 754 floating point 25a)
(RTL basics 19a)
(RTL basics [full] 20b)
(RTL basics [simple] 20a)
(summarize the problem 18a)
(address modes and operands 26d)
(alias functions for in, out, local, and global registers 29a)
(incorrect instruction specifications 29a)
(instruction defaults 27a)
(other attributes of instructions 34c)
(register windows 28c)
(RTL basics (conditional))
(SPARC basics 24)
(SPARC basics [full] 25b)
(SPARC basics [simple] 25a)
(SPARC utilities 27d)
(SPARC utilities [full] 32a)
(SPARC utilities [simple] 31f)
(sparc.rtl 35c)
(trap semantics 27c)
(window dressing for trap semantics 35d)
(basics 37a)
(effective addresses 40a)
(effective-address utilities 39e)
(instruction defaults 43a)
7.2 Identifier index

+: 21a, 21b, 26d, 29a, 29b, 29d, 30a, 30b, 30c, 30d, 30e, 40d
-: 18c, 21a, 21b, 22a, 23a, 26c, 28e, 30d, 35e, 39b, 39e
->: 19a, 25a, 25b, 27c, 27d, 28b, 30b, 30d, 34b
/: 23a, 23b, 26a
*/: 23a
/-: 23a
<=: 18b, 20g, 21b, 31a, 39d, 42b, 42c, 43a, 44a
<--: 19a, 25a, 25b, 27a, 27b, 28a, 28c, 29b, 29d, 30a, 30b, 30c, 30d, 30e, 31a, 31c, 31d, 32b, 32d, 34b, 35a, 35d, 39d, 41a, 41b, 42b, 42c, 43a, 43c, 44c
<=: 20g, 21b
<: 20d, 20f, 25a, 25b, 27c, 27d, 28b, 30b, 30d, 32a, 34c
>: 20g, 21b, 42b, 42c, 43a, 44a
>=: 20g, 21b
?: 22c, 33a, 43b
add: 20g, 21a, 32a, 32b, 32c
and: 21d, 23c, 30f, 31e, 33c, 36, 41b, 41c, 42b, 43a, 44a
andalso: 20a, 20b, 20c, 32a, 34c
andthen: 19b, 30a
bigEndian: 22b, 24, 25c
bit: 20e, 20f, 22a, 26b, 31a, 32b, 35e, 39b, 39c, 39d, 39e
bitExtract: 21d
bitInsert: 21d, 32d
bitTransfer: 21e, 22a
bool: 19a, 20e, 20f, 34a
com: 21d, 30f, 34a, 44b
divs: 20g, 21b
divu: 20g, 21b
do nothing: 19a, 31a
down: 23e
fabs: 23a
f2i: 23b
fsqrt: 23a
IEEE754: 22d
i2f: 23b
littleEndian: 22b, 37a, 40b
minf: 23c
mods: 20g, 21b
modu: 20g, 21b, 27d
muls: 20g, 33b
mulu: 20g, 33b
mzero: 23c
NaN: 23d
nearest: 23e
neg: 20g, 34b, 34c
not: 19c, 34a, 34c
or: 21d, 30f, 31e, 34a, 43a, 44a
orelse: 20a, 20b, 20c
pinf: 23c
pzero: 23c
quotes: 20a, 21b
rems: 20g, 21b
Round: 23e
shl: 21f, 22a, 40e
shra: 21f, 22a
shr1: 21f, 22a, 40e
subtract: 20g, 21a
sx: 21c, 22a, 26d, 26e, 27a, 40d, 41c, 42b, 43a, 44a
up: 23e
xor: 21d, 30f, 31e, 34a, 43a, 44a
zero: 23c, 23e, 39b
zx: 21c, 27a, 28c, 32d
zx: 21c, 27a, 28c, 32d
+: 21a, 21b, 26d, 29a, 29b, 29d, 30a, 30b, 30c, 30d, 30e, 40d
-: 18c, 21a, 21b, 22a, 23a, 26c, 28e, 30d, 35e, 39b, 39e
=: 42b, 42c, 43a, 44a
<: 18b, 20g, 21b, 31a, 39d, 42b, 42c, 43a, 44a
<=: 19a, 25a, 25b, 27a, 27b, 28a, 28c, 29b, 29d, 30a, 30b, 30c, 30d, 30e, 31a, 31c, 31d, 32b, 32d, 34b, 35a, 35d, 39d, 41a, 41b, 42b, 42c, 43a, 43c, 44c
>: 20g, 21b, 42b, 42c, 43a, 44a
>>: 40e
?: 22c, 33a, 43b
addr': 41c, 42a, 42b, 43a, 44a, 44b
AF: 36, 39b, 39c, 39d
and: 21d, 23c, 30f, 31e, 33c, 36, 41b, 41c, 42b, 43a, 44a
b: 40a, 40b, 40c, 41a, 41c, 42b, 43a, 44a, 44b
bin: 42b, 42c, 43a
bin': 43c, 44a
bit: 20e, 20f, 22a, 26b, 31a, 32b, 35e, 39b, 39c, 39d, 39e
byte: 37d, 39e
CF: 36, 39b, 39c, 39d
com: 21d, 30f, 34a, 44b
d: 40a, 40b, 40c, 40d, 41a, 41c, 42b, 43a, 44a, 44b
dword: 37d, 39e
gt: 42c
I: 40c, 41c, 42b, 43a, 44a
littleEndian: 22b, 37a, 40b
lr: 41a, 41c
logical: 43d, 44a
logical_flags: 43b, 43d
lt: 42c
M: 40b, 40d, 42a
OF: 36, 39b, 39c, 39d
or: 21d, 30f, 31e, 34a, 43a, 44a
parity: 39b, 39d
Pentium: 36
PF: 36, 39b, 39c, 39d
R: 40a, 40d, 41c, 42b, 43a, 44a
Reg: 25a, 25b, 25c, 29b, 30b, 30d, 35a, 36, 37b, 39e, 40d, 41b, 42b, 43a, 44a
regb: 39e, 40a
regd: 39e, 40a, 40d
regw: 39e, 40a
set_flags: 39d, 43b
SF: 36, 39b, 39c, 39d
shl: 21f, 22a, 40e
shr: 21f, 22a, 40e
stands_for_addr: 42a
dx: 21c, 22a, 26d, 26e, 27a, 40d, 41c, 42b, 43a, 44a
unary: 44b, 44c
w: 40a, 40b, 40c, 41a, 41c, 42b, 43a, 44a, 44b
word: 37d, 39e
xor: 21d, 30f, 31e, 34a, 43a, 44a
zero: 23c, 23e, 39b
ZF: 36, 39b, 39c, 39d
References

A formal model and specification language for procedure calling conventions.
In Conference Record of the 22nd Annual ACM Symposium on Principles of Programming Languages,
pages 298–310, San Francisco, CA.

———. 1996 (May).
Target-sensitive construction of diagnostic programs for procedure calling sequence generators.
Proceedings of the ACM SIGPLAN ’96 Conference on Programming Language Design and Implementation,

Bala, Vasanth and Norman Rubin. 1995 (November 29–December 1).
Efficient instruction scheduling using finite state automata.
In Proceedings of the 28th Annual International Symposium on Microarchitecture,
pages 46–56, Ann Arbor, Michigan.

A portable global optimizer and linker.
Proceedings of the ACM SIGPLAN ’88 Conference on Programming Language Design and Implementation,

Davidson, Jack W. and Christopher W. Fraser. 1980 (April).
The design and application of a retargetable peephole optimizer.

BEG — a generator for efficient back ends.
Proceedings of the ACM SIGPLAN ’89 Conference on Programming Language Design and Implementation,

Fauth, Andreas, Johan Van Praet, and Markus Freericks. 1995 (March).
Describing instruction set processors using nML.
In The European Design and Test Conference, pages 503–507.

Fernández, Mary F. 1995 (November).
A Retargetable Optimizing Linker.

BURG—fast optimal instruction selection and tree parsing.
Griswold, Ralph E. 1982 (October).
   The evaluation of expressions in Icon.

   Mount Prospect, IL.

   EEL: machine-independent executable editing.

Larus, James R. 1990 (September).
   Technical Report 966, Computer Sciences Department, University of Wisconsin, Madison, WI.

   VHDL: Hardware Description and Design. 12 edition.

   The Definition of Standard ML.
   Cambridge, Massachusetts: MIT Press.

Milner, Robin. 1978 (December).
   A theory of type polymorphism in programming.

Morrisett, Greg. 1995 (December).
   Compiling with Types.
   Published as technical report CMU-CS–95–226.

   Detecting pipeline structural hazards quickly.
   In Conference Record of the 21st Annual ACM Symposium on Principles of Programming Languages, pages 280–286, Portland, OR.

Ramsey, Norman and Mary F. Fernández. 1997 (May).
   Specifying representations of machine instructions.
   ACM Transactions on Programming Languages and Systems, 19(3):492–524.

Ramsey, Norman and David R. Hanson. 1992 (July).
   A retableable debugger.

Ramsey, Norman. 1994 (September).
   Literate programming simplified.
——. 1996 (April).
A simple solver for linear equations containing nonlinear operators.


Srivastava, Amitabh and Alan Eustace. 1994 (June).
ATOM: A system for building customized program analysis tools.

Using and Porting GNU CC (Version 2.0).
Free Software Foundation.

Thomas, Donald and Philip Moorby. 1995.
The Verilog Hardware Description Language. 2nd edition.

Thompson, T. 1996 (February).
An Alpha in PC clothing.
Byte, pages 195–196.

The zephyr abstract syntax description language.
In Proceedings of the 2nd USENIX Conference on Domain-Specific Languages, pages ??–??, Santa Barbara, CA.