Upload
gilad
View
220
Download
1
Embed Size (px)
Citation preview
High Performance Audio Processing SoC Platform
Denis Muratov Waves Audio, UDC
Vladimir Boykov Waves Audio, UDC
Yuri Iskiv Waves Audio, UDC
Igor Smirnov Waves Audio, UDC
Sergey Berdyshev
SevNTU [email protected]
Dr. Valeriy Vertegel
SevNTU [email protected]
Dr. Yuri Gimpilevich
Sevastopol National Technical University, Ukraine [email protected]
Gilad Keren
Waves Audio, Israel [email protected]
Abstract
The following article describes a new ASIC hardware platform (MX76k), which will be used to launch current audio enhancement algorithms from Waves Audio (MaxxAudio 3) on a variety of consumer electronics devices. This platform can also be extended for usage in other audio applications.
1. Introduction
From day to day, modern systems are becoming
ever more complex; more and more blocks can be integrated inside a single IC. This trend also affects audio systems, which are becoming increasingly sophisticated, while the market demands lower cost solutions, which means that entire systems must be built inside a single IC.
Every digital audio processing system has to execute DSP algorithms, which is usually done using processor cores optimized for these tasks – DSP.
While there are many different DSP cores available on the market, not all of them can be used as a soft IP core inside a single ASIC together with the other. Available cores do not always meet the cost, scalability, performance and security needs of a particular application. These limitations sometimes
force software developers to create their own hardware platforms, optimized for their applications.
This is the case of MaxxAudio 3 [1] and the MX76k hardware platform, which is described in the subsequent chapters of this article. The second chapter describes overall system architecture; the third is devoted to the processor core architecture and features; the fourth discusses the verification methods used to prove IC functionality; and the fifth chapter presents system implementation results. 2. Audio processing platform architecture
The developed HW platform (MX76k) is a soft
IP core written in Verilog HDL. This allows the system to be easily ported to FPGA or any ASIC library. The architecture below describes an ASIC implementation of it.
The architecture of the system is illustrated in figure 1. The heart of the platform is a 24-bit fixed point DSP (processor) core which executes audio processing algorithms and some management routines. Software which runs on it takes control over all other hardware blocks, which are described below.
The clock source feeding the system can be either an external oscillator or an audio clock. It is multiplied by PLL to ensure that the DSP has enough speed to execute the selected audio processing algorithms.
978-1-4577-1958-5/11/$26.00 ©2011 IEEE
Optionally, an audio clock may be used as a DSP clock to reduce available ADC noise on the die when ADC is used.
Figure 1. SoC architecture
System has four input and four output stereo
audio channels. Exchange of the digital audio stream between DSP and external devices is done by means of I²S interface [2]. Several I²S lines are multiplexed with ADC and DAC, which allows system having analogue inputs and outputs.
External configuration of the system is managed using an I²C interface [3]. Device can operate as an I²C master by reading the initial configuration from the external EEPROM or as a slave. In the second case, it can dynamically interact with the external host.
One of the peculiarities of the I²C module is that, in slave mode, it operates using clock extracted from SCL and SDA pair of lines. This feature allows I²C module operation when all other clocks are switched off and the IC is in a deep power-down state; in this case, the external host can wake-up the system by sending appropriate I²C commands when clock from PLL is not available.
All I/O pins of the system can be alternatively used as GPIOs.
3. DSP core architecture
As mentioned above, the heart of the system is a
24-bit DSP core with fixed point arithmetic. It is optimized to run Waves Maxx Audio algorithms.
The architecture of the core is illustrated in figure 2. The DSP has Harvard architecture with a 6-stage pipeline which allows execution of a MAC instruction, with reading data from on-chip memory and updating
addresses at a single clock cycle. The DSP core has the following primary features:
- 24x24-bit multiplier; - 56-bit accumulator and barrel shifter; - Four 24-bit ALU registers for the instruction
source operands and two 56-bit registers to store accumulator result;
- Eight 24-bit address index registers, which can be updated using linear, reverse-carry and modulo N arithmetic rules;
- Interrupts and hardware loops support.
Figure 2. MX76k DSP core architecture
DSP pipeline stages are listed below. 1. Instruction fetch stage 1: address phase.
Address of the instruction to be decoded is set on the program memory address bus.
2. Instruction fetch stage 2: data phase. Instruction code is latched in instruction register for subsequent decoding.
3. Instruction decode stage. 4. Data memory address generation stage. 5. Execution stage 1: multiply phase. Multiply
and complex shift operations are executed at this clock cycle. Address of the memory cell to be accessed by the instruction is set on DSP data memory address bus.
6. Execution stage 2: accumulate phase. Add, subtract and logic operations are executed at this clock
PIC PFCU
Pipeline Registers
Program Decode Unit
PAGU AGU
Debug
ALU
IRQ
Internal Data Buses
Program memory interface
X memory interface
Y memory interface
JTAG
PCU
PLL GPIO WDT
I²S
ADC
24-bit DSP core
I²S
DAC
JTAG I²C Timer
Control Debug
Audio In
Audio Out
Clock
cycle. Instruction execution result is latched inside the register or addressed memory cell.
The primary four blocks, which constitute the core, are described below:
ALU (Arithmetic-Logic Unit) – All DSP arithmetic is located here;
AGU (Address Generation Unit) – Calculates new values and updates the data memory address registers;
PAGU (Program Address Generation Unit) – Calculates new value and updates program counter and related registers.
PCU (Program Control Unit) – This is one of the most complex blocks, which is actually the brain of the system. It performs instruction decoding and supplies the other blocks with control signals. In addition, the PCU contains the PIC (Program Interrupt Controller), which detects interrupts from peripheral devices, sorts them according to their priority, and notifies the PFCU (Program Flow Control Unit) that it must change the instructions execution flow accordingly.
A network of Internal Data Buses allows moving multiple data between different DSP blocks and memories in order to execute arithmetic and memory load/store instructions in parallel.
A Debug module is used to load and debug programs inside the DSP core. This block interfaces with the external devices using a standard JTAG interface [4].
Memories are connected to the DSP core via a high performance bus (HPB) interface, which combines simplicity and a high data rate. The key feature of this interface is that it has split address and data cycles for write and read access. It is required by the DSP pipeline architecture to perform data exchange with memory without stalls insertion.
A write buffer is implemented for the memories, which require high speed access, but do not comply with this feature in terms of write procedure.
Peripheral devices are connected to the core data buses either via its own high performance bus interface (X memory space) or via APB bridge (Y memory space), which is slower [5].
The core is implemented in such as way as to consume minimum power: if a particular block is not used by an instruction which is currently executing, this block is turned off. Turning the block off is done by means of the ASIC gated clock elements, which disable clock propagation through the appropriate
clock tree branches. In case of FPGA implementation, this mechanism is disabled. 4. Verification methodology
The complexity of the digital section within the
developed platform is about 75000 ASIC gates. It does not contain any pre-verified third party IP blocks and thus requires substantial pre-silicon verification before passing it to the FAB. This chapter describes verification approaches which were used to prove device operation.
The basic approach, which was done, is development of the Verilog test data base, which includes test cases covering all the features of peripheral blocks and basic features of the DSP core.
System Verilog assertions [6] were used to check compatibility of different bocks against the common industry standards (I2C, I2S, AMBA etc).
Unfortunately, this is not enough to prove DSP core operation because the number of different programs which can be executed there is enormous, and covering even a fraction of the possible instruction combinations could easily take a small development team years. Therefore, a decision was made to develop a random test engine using C/C++ resources.
As a result, specialized software, which generates random assembly code according to the constraints defined in a certain test case, was developed. This program is automatically passed to the compiler and then to the software simulator; results of software simulation are compared with Verilog simulation results and a pass/fail decision is made according to the results of this comparison.
All these steps are done automatically. Moreover, this methodology allows the generation of an infinite number of tests based on supplied constraints, launching them, and checking until the time a certain number of tests fail.
To ensure proper cooperation of the digital section together with the third party analogue blocks (PLL, ADC, DAC), simulation test bench was supplied with Verilog models of these blocks and appropriate test cases were developed.
The drawback of any kind of Verilog simulations is that they are slow and unable to model real-world very accurately. To compensate for these
disadvantages, the entire project was prototyped on FPGA. This type of emulation was very useful in detecting problems at asynchronous paths and multiple clock domain boundaries. In addition, it allowed the launching of more complex applications such as Maxx Audio 2, and checking their operation in real time.
Finally, the number of Verilog tests was selected for gate-level simulation to prove system operation at the silicon level.
As a result of all these efforts, the first silicon implementation of the audio platform was successful and does not require any fixes, even at the metal layer.
5. Conclusion
In this paper, we have illustrated an SoC which is
designed to perform audio processing tasks within various applications. Table 1 shows the results of system implementation with different technologies including ASIC and FPGA.
Table 1. System implementation results
Technology Frequency (MHz)
Xilinx Virtex 6 FPGA 80TSMC 160nm 125 Fujitsu 90nm 250
An ASIC version of the system will be going to mass production in autumn 2011.
Introduced audio platform usage can be extended to be a part of a more complex SoC, where several such cores are running in parallel with the main CPU. The soft nature of the core allows further application-specific modifications and improvements. 6. References
[1] MaxxAudio / Waves Audio Ltd. — Tel-Aviv : Waves
Audio Ltd., 2011. — http://www.maxx.com/Content.aspx? id=739
[2] I2S bus specification, Phillips Semiconductors, 1996. [3] I2C bus specification and user manual, NXP
Semiconductors, 2007. [4] IEEE Standard Test Access Port and Boundary-Scan
Architecture 1149.1-2001, IEEE Computer Society, 2001.
[5] AMBA ™ 2 Specification, ARM, 1999. [6] IEEE Standard for System Verilog – Unified
Hardware Design, Specification and Verification Language, 1800™-2005, IEEE Computer Society, 2005