

## USING THE R3081<sup>™</sup> IN R3051<sup>™</sup>-BASED SYSTEMS

#### By Peter McDonald

## INTRODUCTION

The IDT79R3081<sup>™</sup> RISController<sup>™</sup> is the newest member of IDT's family of high-performance and price-competitive 32bit microprocessors. Designed to provide the high-performance MIPS<sup>®</sup> RISC architecture to low-cost and system integration-sensitive solutions, this processor adds to the growing family of RISControllers from IDT. The R3081 RISController is superset and pin compatible with the R3051/ 52, and includes 20kB of cache, a Floating-Point Accelerator, Hardware Cache Coherency support, and a series of system integration and interface features.

With its larger caches, FPA and interface features, incorporating the R3081 in an existing R3051 design can dramatically increase system performance without adding design complexity. Often upgrading to the R3081 is as simple as placing an R3081 in the R3051 socket. This applications note describes common considerations when upgrading existing R3051 systems with the R3081. As an example, this application note describes how to upgrade the 7RS385 evaluation board from an R3051 processor to an R3081 processor.

## **NEW FEATURES BROUGHT BY THE R3081**

The R3081 is superset pin-compatible with the R3051. That is, in general it is possible to remove an R3051 from a system and replace it with an R3081. The system should run without any hardware or software changes. However, the R3081 adds additional capabilities to the R3051 family; some systems may wish to take explicit steps to take advantage of these new capabilities.

Before discussing system changes needed to implement the superset features of the R3081, a definition of these capabilities is needed. As mentioned above, the R3081 includes larger Instruction and Data Caches, a Floating-Point Accelerator, Hardware Cache Coherency support, and a series of integrated control options. All the hardware options are selected by either the mode initialization vectors (values sampled on the interrupt input lines during reset) or programmed through the new CP0 Configuration register. Below is a summary of the new R3081 features. A more detailed list of these features along with a list of the differences between the R3051 and R3081 are included in the IDT79R3081/3081E Integrated RISController Hardware User's Manual.

#### • Larger Instruction and Data Caches

The R3081 instruction and data caches total 20kB. The default (reset) configuration is 16kBl and 4kBD, although they are dynamically programmable to 8kB apiece. Both instruction and data caches are parity protected over the

data and tag fields. This differs from the R3051, in that both caches are larger than the caches supported by the R3051 or R3052, the cache is configurable and the caches are parity protected.

• Addition of a Floating-Point Accelerator

A full-featured R3010A-compatible floating-point accelerator is incorporated on the R3081 adding single- and doubleprecision add, multiply, and divide instructions to the instruction set. Which of the six integer unit Interrupts inputs is used for the floating-point interrupt signal is programmable. Int3 is the default FP interrupt. Thus, one of the six interrupt inputs of the R3051 is used for the floating-point interrupt and coprocessor 1 instructions will be directly executed by the on-chip floating-point units.

Cache Coherency Interface

The R3081 has a hardware-based cache coherency interface for multi-master systems. If selected, DMA cycles between memory and I/O can invalidate lines within the R3081 cache, insuring that there is no stale data and avoiding software directed cache flushing. This mechanism can be disabled to achieve full R3051 compatibility; alternately, the system designer can choose to increase the performance of multi-master systems, by performing hardware cache coherency.

Power Reduction Mode

The R3081 RISController can be dynamically programmed to reduce its operation frequency. In this mode the execution clock, and therefore the output clock, is internally divided by 16. This function allows the power reduction benefits of a lower speed clock to be achieved during idle periods, without requiring external clock shaping logic.

• Programmable Halt Mode

This programmable mode forces the R3081 RISController to stall until either an interrupt or reset is issued. This mode has two effects: it further reduces power consumption; and, it allows software to halt until some external event occurs.

Half-Frequency Bus Mode

A selectable mode allows the R3081 bus interface to operate at one-half the frequency of the processor core. For example, the execution core can run at 33MHz, and the bus interface at 16MHz. Given the substantial amount of cache on-chip, the slow system interface will not dramatically degrade performance. The end result is a high-performance system with very low system cost.

1x or 2x Clock Input

The R3081 can operate with either an R3051 compatible double-frequency clock input (2x clock mode), or can operate from a clock at the execution rate (1x clock mode). This capability both simplifies EMI at high frequency, and also

The IDT logo is a registered trademark and IDT79R3051, IDT79R3081, IDT/c, IDT/sim, IDT/kit and RISController are trademarks of Integrated Device Technology, Inc. All others are trademarks of their respective companies. allows for "clock doubling" when used in conjunction with the one-half frequency bus mode.

• Slow Bus Turnaround

A common problem for a high-speed I/O bus is the amount of time available for mastership changes. The R3081 allows software to specify a larger minimum time when transitioning from the memory driving the bus (i.e. read data) and the processor driving the bus (e.g. writes). This reduces the speed requirement of data transceivers, with minimal performance impact.

• Dynamically programmed data cache refill

The R3081 allows software to dynamically select between single word and quad word refill on data cache miss. This allows for additional performance tuning, by enabling the kernel to select the best algorithm for a given section of code. The default refill size is selected at reset time, the same as for the R3051.

### **POSSIBLE CHANGES**

The R3081 hardware options are either mode selectable at reset or programmed through an internal register. Hardware cache coherency support and all clocking modes, half-frequency bus mode and 1x or 2x clock input mode, are selected at reset based on the level of the Int[5:3]. In the R3051, Int[5:3] are required to be driven HIGH during reset initialization.

The interrupt inputs, SInt[2:0] are already used by both the R3051 & R3081 to select data cache refill sizes, tri-state test mode, and big or little endian system architectures. The complete table of the R3081 reset mode vectors is listed in Table 1.

A complete description of these modes is provided in the IDT79R3081/3081E Integrated RISController Hardware User's Manual.

#### **Floating-Point Interrupt**

The one area where hardware changes may be necessary are with respect to the Floating-Point Accelerator. In the MIPS RISC architecture, the floating-point interrupt is fed into a general purpose interrupt. Interrupts cause the processor to jump to the system's exception handler which then decodes its status to determine the exception cause. One of the six external R3081 interrupts (by default Int3) is programmed to be the FPA interrupt. All activity on the external interrupt pin corresponding to the FPA interrupt is ignored.

Although software can use a different interrupt input other than the default, it is still the case that only five external interrupt pins remain available to external peripherals. Therefore, systems that required six external interrupts will need to modify their external interrupt structure, perhaps by causing multiple peripherals to share a single interrupt input. Obviously, software would then need to decode which device on that interrupt actually signalled the exception.

Systems that have defined an interrupt other than Int3 for the FPA need to modify their startup code so as not to ignore

#### Table 1. R3081 Mode Selectable Features

| Interrupt Pin | Mode Feature       |
|---------------|--------------------|
| Int5          | CoherentDMAEn      |
| Int4          | 1xClockEn          |
| Int3          | Half-frequency Bus |
| SInt2         | DBlockRefill       |
| SInt1         | Tri-State          |
| SInt0         | BigEndian          |

the assertion of Int3.

Some software applications incorporate exception handlers that allow the user to set the FPA interrupt through software. The IDT/sim<sup>TM</sup> diagnostics uses this method. This adds system flexibility at the cost of the extra performance required to decode the interrupt.

#### The Config Register

Selecting which interrupt is used by the on-chip FPA, the cache configuration, power reduction mode, current size of data cache refill, halt/stall mode, or slow bus turnaround are all accomplished by writing to the new CP0 configuration register. The Configuration Register data format is shown in Figure 1.

The reset initialization value of the config register depends somewhat on the mode vectors selected at reset. Specifically, the initial values of the Data Block Refill bit, and of the slow bus turnaround bit, are dependent on the reset vectors. At reset, the FPInt field will correspond to Int3, and the Lock, Alt. Cache, Halt, and RF bits will be cleared.

Reading and writing all CP0 registers is accomplished by issuing coprocessor load and store instructions. The configuration register is CP0 register 3. An interactive tool to read and write the R3081 configuration register, "the R3081 Configuration Tool", is available as a demo tool through your local sales office, and runs on IDT/sim-based platforms. To insure strict software compatibility with older applications, the Config register can be isolated from subsequent writes by writing a '1' to the configuration register "Lock" field.

#### Software Compatibility

The R3081 will directly execute applications written for the R3051. The larger on-chip caches will directly benefit existing applications, and thus bring an increase in system performance. Additional gains are possible, depending on the application code, by taking advantage of the hardware FPA on the R3081. Whereas the R3051 must either trap and emulate floating-point instructions, or perform explicit calls to software floating-point libraries, the R3081 can directly execute these operations.

It may be advantageous to generate two distinct binaries from one source; one, which uses software libraries to emulate floating-point operations, and is used with the R3051 or R3052 and another, which uses the on-chip FPA to perform floating point. However, if the prospect of two distinct binaries

| 31   | 30          | 29           | 28 |       | 26 | 25   | 24 |    | 22       | 0 |
|------|-------------|--------------|----|-------|----|------|----|----|----------|---|
| Lock | Slow<br>Bus | DB<br>Refill |    | FPInt |    | Halt | RF | AC | Reserved |   |

|                     | <ul> <li>1 -&gt; Ignore subsequent writes to this register</li> <li>1 -&gt; Extra time for bus turnaround</li> <li>1-&gt; 4 word refill</li> <li>Power of two encoding of FPInt &lt;-&gt; CPU Interrupt</li> </ul> |
|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Halt:<br>RF:<br>AC: | <ol> <li>Stall CPU until reset or interrupt</li> <li>Divide frequency by 16</li> <li>8kB per cache configuration</li> </ol>                                                                                        |
| Reserved:           | Must be written as 0; returns 0 when read                                                                                                                                                                          |

Figure 1. CP0 Configuration Register Data Format

is too onerous for a particular application, the binary could include FPA instructions; with an R3051 processor, a trap will be generated, and software could emulate the operation. Although a single binary suffices for both processors, the cost is reduced performance for the R3051.

Software can dynamically determine whether there is an FPA available, by performing simple FPA diagnostics. Such diagnostics is included in IDT/sim, IDT/c<sup>TM</sup>, and IDT/kit<sup>TM</sup> startup code. Thus, the boot software could check for the presence of an FPA, and initialize the Coprocessor One useable bit according to the results. This allows a single binary to dynamically determine whether a hardware FPA is available, and can be used to enable the FPA instruction trap mechanism of the R3051 and R3052.

#### **Manipulating the Cache Characteristics**

Another possible performance gain may exist by dynamically manipulating the cache characteristics of the R3081. The Config register allows the cache configuration to be dynamically changed from 16kB I-Cache and 4kB D-Cache to 8kB I-Cache and 8kB D-Cache. A kernel may choose to dynamically change the cache organization, depending on the nature of the task about to be executed. The only caveat is that when changing the cache configuration (from 16kB/4kB to 8kB/8kB or vice versa), both the instruction and data caches need to be flushed.

In addition, software could dynamically alter the D-Cache refill size. Changing this bit does not require a cache flush.

Note that to insure compatibility amongst multiple generations of R3051 family members, cache flushing routines that assume a constant cache size are discouraged. The R3081 Hardware User's Manual presents an algorithm where software can determine the cache size available.

# UPGRADING THE RS385 BOARD WITH THE R3081

Upgrading the RS385 board with the R3081 RISController is easy to accomplish. Simply remove the R3051 and replace it with the R3081. Both share the same footprint and pinout. The 1xClockEn, Half-frequency bus, and Coherent DMA modes are all disabled in a default 7RS385, thus no further hardware modifications are necessary. Int[5:3] are pulled HIGH during reset disabling these three modes.

The IDT/sim included with the 7RS385 automatically sizes the cache available; thus, the increased cache sizes of the R3081 pose no problem. IDT/sim will not, however, write to the Config register. Thus, the FPU interrupt will default to Int3, unless explicit steps are taken.

Currently on the RS385, the R3051 Int3 is used for the Centronics port interrupt. If using the Centronics port and the R3081 FPA, the system and/or software must be modified so that the FPA is allowed its own dedicated interrupt. This needs to be done by either re-writing the boot prom to modify the config register or using a different Centronics interrupt and modifying the Centronics driver.

If the 7RS385 has been used as a porting target for another application, the types of software changes needed will be application dependent. Applications developed with IDT/kit and/or IDT/c include startup code that resizse the cache every time they are executed. IDT/sim startup code does not resize the cache at each execution. In addition, it may be desirable to recompile for any floating-point instructions that are implemented with software emulation.

#### **Implementing Additional Reset Modes**

When using any of the three reset mode features unique to the R3081, minor modifications to the RS385 board are necessary to implement the interrupt input signal multiplexing during reset. As a general note, the RS385 uses a tri-statable interrupt bus to implement the multiplexing for the SInt[2:0]. An asserted MRES# enables the reset mode vector driver. A modification to the RS385 board was made to enable or disable any of the six mode selectable features with jumpers, including the new mode vectors of the R3081. Figure 2 shows the modified R3051/R3081 interface to allow enabling and disabling of the six reset modes. A buffer, U1A, was added to provide the tri-state mux for the three new reset modes.

Other solutions to implement the reset mode selection abound, depending on one's application. All R3051 designs should already pull Int[5:3] HIGH during reset as specified in the IDT79R3051 Family Hardware User's Manual. Therefore, only the new modes being selected need to be added to the current muxing on the RS385. If only one additional mode is needed, jump the one remaining output on the current 74FCT244 reset mode mux (U37 pin 18) to the appropriate interrupt input. The interrupt PAL, U28, can be reprogrammed to do some of the muxing. (If the PAL can not be easily removed from the board, an additional device can be added to the wire-wrap area.)

#### An Interesting Upgrade

One of the more interesting upgrades possible is to increase the execution speed while decreasing the bus clock. To do this, select 1x clock mode and half-frequency bus from the new mode reset logic, and replace the R3051 osciallator with a 40MHz oscillator. The result will be a CPU core executing at 40MHz rather than 25MHz, although the bus speed has been reduced to 20MHz.

## **UPGRADING OTHER R3051 SYSTEMS**

Upgrading any R3051-based system with the R3081 RIS-Controller is very similar to updating the RS385 board. The one hardware item that may differ has to do with DRAMs and their refresh.

Specifically, if the refresh period is based on counting SysClk cycles, then using the reduced frequency mode of the

R3081 may violate the reset period (reduced frequency mode also divides the frequency of the output clock). There are two solutions to this, depending on the application:

- Reprogram the counter to a smaller number of SysClks. This is possible with devices such as the R3721 DRAM controller.
- Use a different reference clock for refresh. Choices include a UART clock, or the clock used to generate the input clock to the processor.

The RS385 board refresh request is generated from a clock independent of  $\overline{\text{SysClk}}$ . The clock used is derived from the UART clock.

## CONCLUSION

Incorporating the high-performance R3081 RISController into existing R3051-based systems is often as simple as merely swapping processors. Little design complexity is added, yet system performance increases due to the larger caches, Floating-Point Accelerator, and other features. Using more of the R3081 features to increase performance even more can be accomplished with minimal hardware and software modifications.



Figure 2. 7RS385 Mode Vector Logic Upgraded for R3081