## Fast Communication Mechanisms in Coarse-grained Dynamically Reconfigurable Array Architectures

Jürgen Becker\*, Manfred Glesner\*

\*Darmstadt University of Technology Institute of Microelectronic Systems Karlstr. 15, D-64283 Darmstadt, Germany e-mail: {becker, glesner}@mes.tu-darmstadt.de

#### Abstract

The paper focuses on coarse-grained dynamically reconfigurable array architectures promising performance and flexibility for different challenging application areas, e. g. future broadband mobile communication systems. Here, new and flexible microelectronic architectures are required solving various problems that stem from access mechanisms, energy conservation, error rate, transmission speed characteristics of the wireless links and mobility aspects. This paper sketches first the major motivation for developing flexible microelectronic System-on-Chip (SoC) solutions for the digital baseband processing in future mobile radio devices. The paper introduces a new parallel and dynamically reconfigurable hardware architecture tailored to this application area. The focus of this contribution is the efficient communication and dynamic reconfiguration realization for such reconfigurable array architectures, which is crucial for their overall performance and flexibility.

#### 1. Introduction and Motivation

The various future demands for flexible mobile communication systems presents a set of challenging problems to system designers in the wireless industry. The requirement of future generations mobile terminals can be summed in dynamic flexibility, higher performance and less power consumption compared to current terminals. The combination of advances in integrated circuit technology and novel system-level solutions can contribute efficiently to the widespread commercialization of mobile high-speed communication systems. In the last years, the fast technological development in *very large scale integration* (VLSI) possibilities has brought the notion to single *system*- Ahmad Alsolaim\*\*, Janusz Starzyk\*\*

\*\*Ohio University Electrical and Computer Engineering Athens, OH 45701 USA e-mail:{alsolaim,starzyk}@bobcat.ent.ohiou.edu

on-a-chip (SoC) solutions. Thus, the implementation of various functions required by different abstraction layers of a wireless mobile network should result in a highly integrated single-chip in the future, according to the dramatic improvement in the size and speed of electronic devices in recent years. Trends in microelectronic systems design point to higher integration levels, smaller form factor, lower power consumption and cost-effective implementations. The achievement of this goal has to be efficiently supported by the concurrent development of new design methods including in addition such aspects as flexibility, mixed-signal system-level exploration, re-usability and top-down SoC design. The design of mobile baseband systems involves several heterogeneous areas, covering various aspects in communication system application, in efficient CAD tool support, as well as in microelectronic architectures and technology questions. A good understanding of all relevant points related to those inter-disciplinary areas is essential to the success of the final product.

Future mobile communication systems, e.g. third generation (3G) systems, will not only offer the same old services (voice transmission and low data rates) with improved quality, but in addition these devices will have to offer many new exciting services, which will range from internet browsing to real-time multimedia communication applications. Moreover, next generation mobile terminals should also support new services that will soon emerge when the system is deployed. The upcoming future standards should also allow the introduction of such new services as easy as possible. Thus, the design of a corresponding mobile system has to reflect all these forecasted services and flexibility. At the same time the mobile devices should realize all services within the physical and operational requirements of the given mobile system infrastructure. In addition, the mobile terminal has to provide an acceptable power consumption in order to be feasible for multimedia terminal operation. Finally, the *time-to-market* and low price requirements have to be fulfilled in order to be competitive. This results in the following two major requirements:

- a new efficient system model in 3G systems, e.g. CDMA-based transmission schemes being highly flexible and adaptable to new services [3], and
- new innovative flexible microelectronic design solutions.

Currently, most of the microelectronic system solutions in mobile communication are a combination of ASICs, microcontrollers, and Digital Signal Processors (DSP) devices. Universal reconfigurable hardware architectures have been proven in different application areas [1] [2] [5] to produce at least one order of magnitude in power reduction and increase in performance, e. g. for implementing filters, correlators, multipliers etc. The new coarse-grained reconfigurable architecture introduced here promises for the selected application area more flexibility than ASICs and better performance values than DSPs or even today's fine-grained commercial reconfigurable devices [10]. Thus, the major general goal is to evaluate flexibility versus power/performance trade-offs by releasing the DSP for other tasks, or by migrating functionality from ASICs to our coarse-grained reconfigurable hardware supporting the implementation of highly efficient SoCs for hand-held devices in mobile communication systems.

The paper is structured as follows: in section 2 the future challenges and considerable aspects for hardware/software SoC implementations for the digital baseband processing are sketched. Section 3 provides a brief description of the proposed dynamically reconfigurable hardware part of such flexible SoCs. Section 4 focuses on efficient communication and dynamic reconfiguration mechanisms for this type of coarsegrained parallel array architectures, incl. the interfacing of different hardware/software SoC components.

# 2. Flexible Hw/Sw System-on-a-Chip Solutions for Mobile Communication

Next generation mobile communication systems (3G), i.e. based on the UMTS standard, are defined to provide a transmission scheme which is highly flexible and adaptable to new data-intensive services [12], adding a new dimension to the transceiver design and architectures for digital baseband processing. Con-

cepts such as Software Radios are discussed in detail [4] [6]. Since within such concepts the necessary overall system performance is missing [8] [9], alternative solutions have to be developed. For computation-intensive arithmetic-dominated functions with flexibility [11] requirements found in mobile communication applications, reconfigurable hardware offers an alternative solution to the software programmable DSPs. This relatively new hardware architecture and technology concept can provide increased system performance at lower cost and risk of system implementation, combining the flexibility of a general-purpose DSP with the speed, density, and low cost of ASIC solutions. In addition, there are many operational challenges, such as battery life, easy and flexible terminals to exploit dynamically different and new services, e g. also by downloading upgrades and new services or protocols from the internet and configure the hand-held devices according to these downloaded codes. Flexibility can be defined as the ability of the mobile terminal to support many modes of operation, e g. voice, audio, video, navigation, data transmission etc.. This means for example also, that the mobile device has to have the ability to operate within different standards, such as GSM, UMTS and IS-95. Adaptability is the ability of the mobile terminal to easily and quickly accommodate a new service.

Target SoC architectures may be composed of different cores such as DSPs, microcontrollers and memories, as well as of reconfigurable hardware and/or various ASIC support parts. An overview of a possible SoC architecture related to a *Baseband Single Chip Mobile Transceiver* is shown in figure 1. In such heterogenous hardware/software architectures integrating different technologies the efficient interfacing of different SoC components is crucial for the overall



Figure 1 SoC-Architecture Components of a Baseband Single Chip Mobile Transceiver

SoC performance and testability (see section 4). As stated above, the choice of the final target architecture will result from a detailed application and performance analysis while considering VLSI oriented implementation issues. The required flexibility will be supported by the inclusion of a new coarse-grained dynamically reconfigurable architecture realizing layer 0 (L0) and layer 1 (L1) hardware operations, e.g. channelization, detection, decoding etc. Thus, this promising technology supports important aspects like reduced *time-to-market* and *risk minimization*, because ASIC development risks and fabrication times are reduced enormously, or even avoided completely in some cases.

### **3.** A New Coarse-grained Dynamically Reconfigurable Array Architecture

The variety of services in next generation's mobile communication systems will have large spectrum of requirements, e. g. different data rates, different quality of services (QoS), and real-time services etc. For preparing future mobile terminals and its microelectronic components to cope with all these challenges, we developed a new coarse-grained and dynamically reconfigurable architecture. The integration of this application-tailored but flexible hardware architecture within flexible SoCs solutions for the digital baseband processing will support the efficient realization of the above mentioned features.

The proposed Dynamically Reconfigurable Architecture for Mobile Systems (DReAM) consists of an array of parallel operating coarse-grained Reconfigurable Processing Units (RPUs). Each RPU is designed for executing all required arithmetic data manipulations for the data-flow oriented mobile application parts, as well as to support necessary controlflow oriented operations. The complete DReAM array architecture connects all RPUs with reconfigurable local and global communication structures (see figure 2). In addition, the architecture will provide efficient and fast dynamic reconfiguration possibilities for the RPUs as well as for the interconnection structures, e.g. only partly and during run-time while other parts of the reconfigurable architecture are active. The corresponding hardware components and communication protocol implementations are described in section 4. In the following, the design, structure and performance issues of the major hardware components in the DReAM architecture are explained briefly. For more information on DReAM and details about all operation's performance values see [7].

The decisions during the design of the architecture were mainly based on the careful reviewing of the tailored application area requirements, e. g. on the study of different algorithms needed in future mobile transceivers incl. their operations and implementation precision. Examples for such complex algorithms, requiring also flexibility in execution, are RAKE-receiving parts, interpolation filtering, searcher and synchronization algorithms, coding and modulation techniques etc. Based on the set of used arithmetic and control-flow operations the performance/power optimized structure development of the RPUs, and of socalled Communication Switching Units (CSUs) was done. As shown in figure 2, the DReAM architecture consists of a scalable array of RPUs that have 16-bit fast direct local connections between neighbouring RPUs, whereas each sub-array of four RPUs shares one common Configuration Memory Unit (CMU). The CMU holds configuration data for performing fast dynamic reconfiguration for each of these four RPUs and is controlled by one responsible CSU. Each CSU controls two CMUs and four global interconnect Switching Boxes (SWB). All CSUs communicate to



Legend:

- Reconf. Proc. Unit (RPU).
- Dedicated IO (DIO).
- Comm. Switching Unit (CSU).
- Configuration Memory Unit (CMU) and its Controller.
- Switching Box (SWB).
- RPU to Bus Connection point.
- \_\_\_\_16-Bit Global Interconnect Line.
- —16-Bit Local Interconnect line.

## Figure 2: Hardware Structure of the Dynamically Reconfigurable DReAM Architecture

one *Global Communication Unit* (GCU), which coordinate centralized all dynamic reconfiguration steps, e. g. of the RPUs as well as of the global interconnection structure. Moreover, the GCU controls the external communication with other hardware components of the flexible SoC. Therefore, *Dedicated I/O Units* (DIOs) for fast and parallel transfers of the input/output data of DReAM are placed around the array architecture. The detailed hardware structure of the corresponding hardware modules and the related global as well as local inter-RPU communication mechanisms are described in section 4.

The dynamically *Reconfigurable Processing Units* (RPUs) are the major hardware components of the DReAM architecture for executing the arithmeticdominated data manipulations. Thus, these application-tailored RPUs perform efficiently the required coarse-grained (8-/16-bit) integer operations needed for the examined application parts. In contrast, the CLBs (*Configurable Logic Blocks*) of today's commercially available fine-grained and universal FPGAchips are operating on the 1-bit level [10]. As shown in Figure 3 each RPU consists of:

- two dynamically reconfigurable 8-bit data paths, called *Reconfigurable Arithmetic Processing Units* (RAPs),
- one Spreading Data Path (SDP),
- one RPU-controller,
- two dual port RAMs, and
- one Communication Protocol Controller

Each RAP can perform all necessary arithmetic operations (8-/16-bit) identified in the above mentioned examined application parts of mobile communication



Figure 3 Hardware Structure of the Reconfigurable Processing Unit (RPU)

systems. The performance values of these operations for n operation repetitions on a stream of data are provided in Table 1, and are based on a 0.35 µm CMOS standard cell synthesis for the RPU by using an Mietec/Alcatel process. For the repeated operation execution only one configuration set is necessary. The available set of two-input operations support either operations with one constant operand (fixed Y), as well as operations with two variables as inputs (variable Y). The RAP unit is built around a fast integer multiplier operator, providing a high speed constant/ variable multiplication (i.e. one of the operands is constant for some time interval) and small compact design by using modified Look-Up Table (LUT) multiplication procedure applying distributed arithmetic. According to [11] most of the multiplication within the mobile system are fixed operand operation. One Spreading Data Path (SDP) for fast and efficient execution of CDMA-based spreading tasks is designed and implemented in each RPU. This SDP unit can be used together with the adding operations of 2 RAPs for implementing efficiently fast complex PN-code correlation operations. Such spreading operations are required often in QPSK-modulation (Quadrature *Phase Shift Keying*). The detailed hardware description and implementation issues of all RPU operations implementations and a complex application example, e. g. a CDMA-based RAKE-receiver, used in mobile devices mapped onto DReAM can be found in [7].

The *RPU-controller* is responsible for guiding all data manipulations and transfers inside the RPU, as well as to determine from which local neighbour RPU or global interconnect line input data is consumed. Moreover, the *RPU-controller* performs together with the CMU and its controller the fast dynamic reconfiguration of the RPU, as explained in the following.

| Operation <sup>)*</sup>                                    | Speed<br>Best Case |                | Speed<br>Worst Case |                |
|------------------------------------------------------------|--------------------|----------------|---------------------|----------------|
| )* n repetitions of<br>single operations<br>(0.35 µm CMOS) | cycles             | freq.<br>[MHz] | cycles              | freq.<br>[MHz] |
| Multipl. (variable Y)                                      | n+10               | 60             | 10n+1               | 11.8           |
| Multipl. (fixed Y)                                         | n+2                | 100            | n+10                | 60             |
| MAC (variable Y)                                           | 3n                 | 40             | 11n                 | 10.9           |
| MAC (fixed Y)                                              | n+2                | 100            | n+11                | 57.1           |
| Addition                                                   | n                  | 120            | n                   | 120            |
| Subtraction                                                | n                  | 120            | n                   | 120            |

**Table 1: DReAM Operation Performance Values** 

# 4. Efficient Dynamic Reconfiguration and Communication Mechanisms

The performance of applications mapped onto the DReAM array architecture depends strongly on the efficiency and speed of the local and global inter-RPU communication mechanisms. Since the coarse-grained RPUs implement control and datapath parts, e.g. loop structures, an advanced asynchronous synchronization and communication mechanism is required. Here, an efficient data communication protocol has be specified and implemented, in contrast to today's finegrained FPGA-architectures, where simple point-topoint bitlevel connections can be switched between configurable logic blocks (CLBs) [10]. In the DReAM array architecture, each RPU is locally connected to it's four neighbours (North, East, South, and West) through 16-bit fast direct connection lines. In addition, it can be connected to the global lines through a SRAM-based switching box (SWB), as shown in figure 2. The data-driven communication mechanism inside the DReAM array architecture is realized by an asynchronous communication protocol, performed on the 16-bit local and global interconnect lines. The protocol is an efficient hand-shaking protocol realizing a unidirectional point-to-point connection between two RPUs. The intra-array communication protocols can be distinguished into two types:

- local communication between neighbouring RPUs (see figure 4 (b)), and
- global communication between any two distanced RPUs (see figure 4 (a)).

For local communication a half-interleaved handshake is implemented (1-cycle delay for one 16-bit



Figure 4 Transmitter/Receiver Synchronization Modules for global / local inter-RPU Communication in DReAM

data word, see figure 5 b), and for global inter-RPU communication a fully-interleaved handshake is used (minimum 2-cycle delays, see figure 5 a). This has to be done due to the difference in length between the local and the global interconnect wires, resulting in different communication signal delays. Each RPU has a *Transmitting Unit* (TX-RPU) and *Receiving Unit* (RX-RPU). In both communication cases (local / global) operate the TX- and RX- RPUs with the rising edge and the *Transmitting Unit* (TX) with the falling edge. The *Receiving Unit* (RX) is needed only during-global communication for handshake synchronization.

As a part of the global communication realization efficient switching boxes (SWBs) have to implemented. The global interconnect lines are implemented by two 16-bit lines, running to neighbouring SWBs in the way, that each RPU has access to global interconnection (see figure 2). Each global line coming from one direction can be routed to any of the other three directions. Each SWB consists of 20 switchingpoints being implemented by SRAM-controlled pass transistors (see figure 6 (a)). The lines are named based on their location and direction with respect to the SWB, e.g. a line going to the upper west direction is called *west1*. The upper horizontal lines can be connected with the left vertical lines, and the lower horizontal lines can be connected to the right vertical lines.



Figure 5 Transmitter/Receiver Synchronization Signals for global / local inter-RPU Communication in DReAM



**Figure 6** (a) Hardware Structure of reconfigurable Switching Box (SWB) and (b) Examples of its Interconnection Routing Possibilities

Based thereupon, all combinations of these one-line connections are possible, if there is no resource conflict by two one-line connections driving the same line. Some examples of multiple-line connections are given in figure 6 (b). The global interconnect structure realized by the above described SWBs implemented in DReAM can be partly and dynamically reconfigured during run-time by the Communication Switching Units (CSUs, see figure 7). Each CSU contains 4 configuration RAMs keeping the SWB configuration data of frequently needed mappings within different mobile communication situations. This results in fast and parallel dynamic reconfigurations (3 clock cycles) of all corresponding SWBs. The necessary timing of the corresponding data- and control-signals is shown in figure 7. Not frequently used SWB configurations are loaed from the on-chip SoC memory via the Dedicated I/O Units(DIOs) and through the GCU into the RAMs of the CSUs, which will be described below. The hardware structure of the CMUs for simultanous dynamic reconfiguration of 4 RPUs is similar to the CSU, with appropriate RAM sizes for the configuration data of the RPUs (54 bit for one RPU configuration).

The realization of a high performance SoC data interface for connecting the DReAM architecture through its DIOs to other hardware components on the same *system-on-a-chip* (SoC, see figure 8) is realized by an efficient combination of AMBA AHB-based bus systems (<u>Advanced Microcontroller Bus Archi</u>



Figure 7 Hardware Structure and Signal Timing Overview for fast Dynamic Reconfiguration in DReAM

tecture - <u>A</u>dvanced <u>High</u> Performamnce <u>B</u>us [13]) and corresponding buffered bridges. Each DIO can be connected either to one RPU at the border of the DReAM architecture, and/or to RPUs inside the array through the global interconnect lines. Every DIO is able to perform the internal local and global DReAM communication protocols (see figure 4 and figure 5), as well as the interfacing functionality to the other components of the flexible SoC, e.g. the DSP, the microcontroller, and the on-chip memories. The communication to these other on-chip components is performed by an internal protocol controlling the READ- and WRITEbus of the buffered AHB-bridge, followed by the widely supported AHB-bus protocol to the other SoC



**Figure 8** (a) Hardware Structure of Interfaces to different SoC components, with (b) Dedicated I/O Units (DIOs), and (c) buffered AMBA AHB-Bridge

cores (see also [13]). The AHB-bridge contains three state-machines for controlling the READ- WRITEand AHB SLAVE-Interfaces. For data buffering and pipelined data transfers each of the 16 DIO has a corresponding FIFO-buffer for reading and writing 16-bit data words. The depth of these FIFOs depends on the maximal possible burst-transfer of the AMBA AHBsystembus. The AHB SLAVE-interface is controlled by the microcontroller, e. g. an on-chip ARM-core.

### 5. Conclusions

The paper presented first overview of the challenges in realizing flexible microelectronic system solutions for future mobile communication applications, e. g. for the digital baseband processing. The paper introduced the hardware structure of a new coarse-grained dynamically reconfigurable architecture (DReAM), including its potential for SoC-solutions in adaptive air interface candidate systems for future generations of wireless communication systems. DReAM is tailored to future mobile signal processing, providing an acceptable trade-off between flexibility and application performance requirements. The focus of the paper was the detailed description of suitable dynamic reconfiguration and communication mechanisms and the hardware structures of their implementation modules. Such architecture features are crucial for the overall performance and run-time flexibility of coarse-grained dynamically reconfigurable arrays.

#### References

- P. Athanas, A. Abbot: Real-Time Image Processing on a Custom Computing Platform, IEEE Computer, vol. 28, no. 2, Feb. 1995.
- [2] R. W. Hartenstein, J. Becker et al.: A Novel Machine Paradigm to Accelerate Scientific Computing; Special issue on Scientific Computing of Computer Science and Informatics Journal, Computer Society of India, 1996.
- [3] H. Erben, K. Sabatakakis: Advanced software radio architecture for 3rd generation mobile systems., Vehicular Technology Conference, 1998. VTC 98. 48th IEEE Published: 1998 Volume: 2, Page(s): 825 - 829 vol.2
- [4] D. Efstathio, et al.: Recent Developments in Enabling Technologies for Software Radio, IEEE Comm. Mag., Aug. 1999. pp. 112-117.
- [5] G. R. Goslin, : Using Xilinx FPGAs to Design Custom Digital Signal Processing Devices, Proc. of 1995 DSPx Technical Program, pp. 595-604.
- [6] Mitola.: The software Radio Architecture., IEEE Communication Mag., May 1995, pp. 26-38.
- [7] A. Alsolaim, J. Becker, M. Glesner, J. Starzyk: Architecture and Application of a Dynamically Reconfigurable Hardware Array for Future Mobile Communication Systems; Proc. of IEEE Symposium of Field-Programmable Custom Computing Machines (FCCM'00), April 17-19, 2000, Napa, USA
- [8] David Nicklin :Utilising FPGAs in Re-configurable Basestations And Software Radios, Xilinx Inc. Electronic Eng. Mag.
- [9] Gregory Ray Goslin: A Guide to Using Field Programmable Gate Arrays (FPGAs) for Application-Specific Digital Signal Processing Performance, Xilinx Inc. 1995.
- [10] Xilinx Corp.: http://www.xilinx.com/products/virtex.htm.
- [11] Peter Jung, Joerg Plechinger., "M-GOLD: a multimode basband platform for future mobile terminals", CTMC'99, IEEE Int'l. Conf. on Communications, Vancouver, June 1999.
- [12] Tero Ojanpera, et. al.: Wideband CDMA for Third Generation Mobile Communicatios., Artech House Pub., 1998.
- [13] AMBA Specification (Rev. 2.0), Internal Report (ARM IHI 0011A), ARM Limited, 1999