[IEEE Test Symposium (EWDTS) - Sevastopol, Ukraine (2011.09.9-2011.09.12)] 2011 9th East-West Design & Test Symposium (EWDTS) - High performance audio processing SoC platform

High Performance Audio Processing SoC Platform

Denis Muratov Waves Audio, UDC

[email protected]

Vladimir Boykov Waves Audio, UDC

[email protected]

Yuri Iskiv Waves Audio, UDC

[email protected]

Igor Smirnov Waves Audio, UDC

[email protected]

Sergey Berdyshev

SevNTU [email protected]

Dr. Valeriy Vertegel

SevNTU [email protected]

Dr. Yuri Gimpilevich

Sevastopol National Technical University, Ukraine [email protected]

Gilad Keren

Waves Audio, Israel [email protected]

Abstract

The following article describes a new ASIC hardware platform (MX76k), which will be used to launch current audio enhancement algorithms from Waves Audio (MaxxAudio 3) on a variety of consumer electronics devices. This platform can also be extended for usage in other audio applications.

1. Introduction

From day to day, modern systems are becoming

ever more complex; more and more blocks can be integrated inside a single IC. This trend also affects audio systems, which are becoming increasingly sophisticated, while the market demands lower cost solutions, which means that entire systems must be built inside a single IC.

Every digital audio processing system has to execute DSP algorithms, which is usually done using processor cores optimized for these tasks – DSP.

While there are many different DSP cores available on the market, not all of them can be used as a soft IP core inside a single ASIC together with the other. Available cores do not always meet the cost, scalability, performance and security needs of a particular application. These limitations sometimes

force software developers to create their own hardware platforms, optimized for their applications.

This is the case of MaxxAudio 3 [1] and the MX76k hardware platform, which is described in the subsequent chapters of this article. The second chapter describes overall system architecture; the third is devoted to the processor core architecture and features; the fourth discusses the verification methods used to prove IC functionality; and the fifth chapter presents system implementation results. 2. Audio processing platform architecture

The developed HW platform (MX76k) is a soft

IP core written in Verilog HDL. This allows the system to be easily ported to FPGA or any ASIC library. The architecture below describes an ASIC implementation of it.

The architecture of the system is illustrated in figure 1. The heart of the platform is a 24-bit fixed point DSP (processor) core which executes audio processing algorithms and some management routines. Software which runs on it takes control over all other hardware blocks, which are described below.

The clock source feeding the system can be either an external oscillator or an audio clock. It is multiplied by PLL to ensure that the DSP has enough speed to execute the selected audio processing algorithms.

978-1-4577-1958-5/11/$26.00 ©2011 IEEE

Optionally, an audio clock may be used as a DSP clock to reduce available ADC noise on the die when ADC is used.

Figure 1. SoC architecture

System has four input and four output stereo

audio channels. Exchange of the digital audio stream between DSP and external devices is done by means of I²S interface [2]. Several I²S lines are multiplexed with ADC and DAC, which allows system having analogue inputs and outputs.

External configuration of the system is managed using an I²C interface [3]. Device can operate as an I²C master by reading the initial configuration from the external EEPROM or as a slave. In the second case, it can dynamically interact with the external host.

One of the peculiarities of the I²C module is that, in slave mode, it operates using clock extracted from SCL and SDA pair of lines. This feature allows I²C module operation when all other clocks are switched off and the IC is in a deep power-down state; in this case, the external host can wake-up the system by sending appropriate I²C commands when clock from PLL is not available.

All I/O pins of the system can be alternatively used as GPIOs.

3. DSP core architecture

As mentioned above, the heart of the system is a

24-bit DSP core with fixed point arithmetic. It is optimized to run Waves Maxx Audio algorithms.

The architecture of the core is illustrated in figure 2. The DSP has Harvard architecture with a 6-stage pipeline which allows execution of a MAC instruction, with reading data from on-chip memory and updating

addresses at a single clock cycle. The DSP core has the following primary features:

- 24x24-bit multiplier; - 56-bit accumulator and barrel shifter; - Four 24-bit ALU registers for the instruction

source operands and two 56-bit registers to store accumulator result;

- Eight 24-bit address index registers, which can be updated using linear, reverse-carry and modulo N arithmetic rules;

- Interrupts and hardware loops support.

Figure 2. MX76k DSP core architecture

DSP pipeline stages are listed below. 1. Instruction fetch stage 1: address phase.

Address of the instruction to be decoded is set on the program memory address bus.

2. Instruction fetch stage 2: data phase. Instruction code is latched in instruction register for subsequent decoding.

3. Instruction decode stage. 4. Data memory address generation stage. 5. Execution stage 1: multiply phase. Multiply

and complex shift operations are executed at this clock cycle. Address of the memory cell to be accessed by the instruction is set on DSP data memory address bus.

6. Execution stage 2: accumulate phase. Add, subtract and logic operations are executed at this clock

PIC PFCU

Pipeline Registers

Program Decode Unit

PAGU AGU

Debug

ALU

IRQ

Internal Data Buses

Program memory interface

X memory interface

Y memory interface

JTAG

PCU

PLL GPIO WDT

I²S

ADC

24-bit DSP core

I²S

DAC

JTAG I²C Timer

Control Debug

Audio In

Audio Out

Clock

cycle. Instruction execution result is latched inside the register or addressed memory cell.

The primary four blocks, which constitute the core, are described below:

ALU (Arithmetic-Logic Unit) – All DSP arithmetic is located here;

AGU (Address Generation Unit) – Calculates new values and updates the data memory address registers;

PAGU (Program Address Generation Unit) – Calculates new value and updates program counter and related registers.

PCU (Program Control Unit) – This is one of the most complex blocks, which is actually the brain of the system. It performs instruction decoding and supplies the other blocks with control signals. In addition, the PCU contains the PIC (Program Interrupt Controller), which detects interrupts from peripheral devices, sorts them according to their priority, and notifies the PFCU (Program Flow Control Unit) that it must change the instructions execution flow accordingly.

A network of Internal Data Buses allows moving multiple data between different DSP blocks and memories in order to execute arithmetic and memory load/store instructions in parallel.

A Debug module is used to load and debug programs inside the DSP core. This block interfaces with the external devices using a standard JTAG interface [4].

Memories are connected to the DSP core via a high performance bus (HPB) interface, which combines simplicity and a high data rate. The key feature of this interface is that it has split address and data cycles for write and read access. It is required by the DSP pipeline architecture to perform data exchange with memory without stalls insertion.

A write buffer is implemented for the memories, which require high speed access, but do not comply with this feature in terms of write procedure.

Peripheral devices are connected to the core data buses either via its own high performance bus interface (X memory space) or via APB bridge (Y memory space), which is slower [5].

The core is implemented in such as way as to consume minimum power: if a particular block is not used by an instruction which is currently executing, this block is turned off. Turning the block off is done by means of the ASIC gated clock elements, which disable clock propagation through the appropriate

clock tree branches. In case of FPGA implementation, this mechanism is disabled. 4. Verification methodology

The complexity of the digital section within the

developed platform is about 75000 ASIC gates. It does not contain any pre-verified third party IP blocks and thus requires substantial pre-silicon verification before passing it to the FAB. This chapter describes verification approaches which were used to prove device operation.

The basic approach, which was done, is development of the Verilog test data base, which includes test cases covering all the features of peripheral blocks and basic features of the DSP core.

System Verilog assertions [6] were used to check compatibility of different bocks against the common industry standards (I2C, I2S, AMBA etc).

Unfortunately, this is not enough to prove DSP core operation because the number of different programs which can be executed there is enormous, and covering even a fraction of the possible instruction combinations could easily take a small development team years. Therefore, a decision was made to develop a random test engine using C/C++ resources.

As a result, specialized software, which generates random assembly code according to the constraints defined in a certain test case, was developed. This program is automatically passed to the compiler and then to the software simulator; results of software simulation are compared with Verilog simulation results and a pass/fail decision is made according to the results of this comparison.

All these steps are done automatically. Moreover, this methodology allows the generation of an infinite number of tests based on supplied constraints, launching them, and checking until the time a certain number of tests fail.

To ensure proper cooperation of the digital section together with the third party analogue blocks (PLL, ADC, DAC), simulation test bench was supplied with Verilog models of these blocks and appropriate test cases were developed.

The drawback of any kind of Verilog simulations is that they are slow and unable to model real-world very accurately. To compensate for these

disadvantages, the entire project was prototyped on FPGA. This type of emulation was very useful in detecting problems at asynchronous paths and multiple clock domain boundaries. In addition, it allowed the launching of more complex applications such as Maxx Audio 2, and checking their operation in real time.

Finally, the number of Verilog tests was selected for gate-level simulation to prove system operation at the silicon level.

As a result of all these efforts, the first silicon implementation of the audio platform was successful and does not require any fixes, even at the metal layer.

5. Conclusion

In this paper, we have illustrated an SoC which is

designed to perform audio processing tasks within various applications. Table 1 shows the results of system implementation with different technologies including ASIC and FPGA.

Table 1. System implementation results

Technology Frequency (MHz)

Xilinx Virtex 6 FPGA 80TSMC 160nm 125 Fujitsu 90nm 250

An ASIC version of the system will be going to mass production in autumn 2011.

Introduced audio platform usage can be extended to be a part of a more complex SoC, where several such cores are running in parallel with the main CPU. The soft nature of the core allows further application-specific modifications and improvements. 6. References

[1] MaxxAudio / Waves Audio Ltd. — Tel-Aviv : Waves

Audio Ltd., 2011. — http://www.maxx.com/Content.aspx? id=739

[2] I2S bus specification, Phillips Semiconductors, 1996. [3] I2C bus specification and user manual, NXP

Semiconductors, 2007. [4] IEEE Standard Test Access Port and Boundary-Scan

Architecture 1149.1-2001, IEEE Computer Society, 2001.

[5] AMBA ™ 2 Specification, ARM, 1999. [6] IEEE Standard for System Verilog – Unified

Hardware Design, Specification and Verification Language, 1800™-2005, IEEE Computer Society, 2005

Documents

[IEEE Test Symposium (EWDTS) - Sevastopol, Ukraine (2011.09.9-2011.09.12)] 2011 9th East-West Design & Test Symposium (EWDTS) - High performance audio processing SoC platform