FPGAs Power Net-Centric - Xilinx · Letter from the Publisher Customer Collaboration, Teamwork Key to Next-Gen FPGA Development…4 Xpert OpinionWhy Software Programmability of Electronics

Issue 69Fourth Quarter 2009

Xcell journalXcell journalS O L U T I O N S F O R A P R O G R A M M A B L E W O R L DS O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

INSIDE

Make MicroBlaze Processing Roar With Hardware Acceleration

FPGAs Help CERN TrackParticles Approaching Speed of Light

Hardware Trumps Software in Medical Device Design

Taming Power Draw inConsumer MPUs

INSIDE

Make MicroBlaze Processing Roar With Hardware Acceleration

FPGAs Help CERN TrackParticles Approaching Speed of Light

Hardware Trumps Software in Medical Device Design

Taming Power Draw inConsumer MPUs

www.xilinx.com/xcell/

FPGAs Power Net-CentricBattlefield on Many Fronts

FPGAs Power Net-CentricBattlefield on Many Fronts

1 . 8 0 0 . 3 3 2 . 8 6 3 8www.em.avnet.com

Copyright© 2008, Avnet, Inc. All rights reserved. AVNET and the AV logo are registered trademarks of Avnet, Inc. All other brands are the property of their respective owners.

Prices and kit configurations shown are subject to change.

Xilinx®Spartan

®

-3A Evaluation Kit

The Xilinx® Spartan®-3A Evaluation Kit provides an

easy-to-use, low-cost platform for experimenting

and prototyping applications based on the Xilinx

Spartan-3A FPGA family. Designed as an entry-level

kit, first-time FPGA designers will find the board’s

functionality to be straightforward and practical,

while advanced users will appreciate the board’s

unique features.

Get Behind the Wheel of the Xilinx Spartan-3A Evaluation Kit and take

a quick video tour to see the kit in action (Run time: 7 minutes).

Ordering Information

Part Number Hardware Resale

AES-SP3A-EVAL400-G Xilinx Spartan-3A Evaluation Kit $39.00* USD (*Limit 5 per customer)

Take the quick video tour or purchase this kit at: www.em.avnet.com/spartan3a-evl

Target ApplicationsGeneral FPGA prototyping»MicroBlaze» ™ systems

Configuration development»USB-powered controller »Cypress» ® PSoC® evaluation

Key FeaturesXilinx XC3S400A-4FTG256C »Spartan-3A FPGA

Four LEDs »Four CapSense switches »I» 2C temperature sensor

Two 6-pin expansion headers »20 x 2, 0.1-inch user I/O header »32 Mb Spansion» ® MirrorBit®

NOR GL Parallel Flash

128 Mb Spansion MirrorBit »SPI FL Serial Flash

USB-UART bridge »I» 2C port

SPI and BPI configuration »Xilinx JTAG interface »FPGA configuration via PSoC» ®

Kit IncludesXilinx Spartan-3A evaluation board »ISE» ® WebPACK™ 10.1 DVD

USB cable»Windows» ® programming application

Cypress MiniProg Programming Unit»Downloadable documentation »and reference designs

Toggle among banks of internal signals for incremental real-time internal measurements without:

See how you can save time by downloading our free application note.

www.agilent.com/find/fpga_app_note

Quickly see inside your FPGA

u.s. 1-800-829-4444 canada 1-877-894-4414© Agilent Technologies, Inc. 2009

Also works with all InfiniiVision

and Infiniium MSO models.

L E T T E R F R O M T H E P U B L I S H E R

Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

© 2009 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby.

PUBLISHER Mike [email protected]

EDITOR Jacqueline Damian

ART DIRECTOR Scott Blair

DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

ADVERTISING SALES Dan [email protected]

INTERNATIONAL Melissa Zhang, Asia [email protected]

Christelle Moraga, Europe/Middle East/[email protected]

Yumi Homura, [email protected]

SUBSCRIPTIONS All Inquirieswww.xcellpublications.com

REPRINT ORDERS 1-800-493-5551

Xcell journal


Customer Collaboration, Teamwork Key to Next-Gen FPGA Development

t the year-and-a-half mark, I’ve been here at Xilinx long enough to have witnessed first-hand the remarkable effort the company and its network of partners undertake to con-ceive and design a next-generation FPGA, and the invaluable role you, our customers,

play in its creation. The many steps involved in developing one FPGA in pace with Moore’s Lawis remarkable. Doing two FPGAs at once, as Xilinx did this year, and then creating the TargetedDesign Platform strategy to automate customer flows is extraordinary.

Many years before the latest FPGA ends up on your desk, a small army of Xilinx productdevelopment experts visit a selection of customers, potential customers and even former cus-tomers to understand the requirements of their next-generation systems, what functionality theyplan to include and, in the case of the ex-customers, why they switched to an alternate approachor device. We then vet the findings against the ongoing feedback gathered by Xilinx FAEs andsales personnel.

Performing this due diligence with customers across multiple markets provides a basis formaking informed decisions about what functions to hardwire into next-generation FPGAs, whatsoft IP to offer (or which IP partners to work with) and what design tool functionality design-ers will need. Of course, this undertaking is tempered by the fact that to be successful commer-cially, an FPGA architecture’s feature set must serve the needs of either a broad set of customersor of those playing in high growth markets—or both.

While the customer interviews are under way, the financial and manufacturing teams aredoing their own extensive analysis of independent vertical-market and economic data as well asfoundry and process node capabilities. Collating all this research, the development experts andsilicon architects then start to formulate an architecture and associated tool and IP requirementspecifications. A big part of the architectural-specification process is evaluating which of themany new, patented technologies our engineers have concocted to incorporate. So rich are thechoices that the biggest challenge by far is determining what not to include.

Indeed, often a given circuit or architecture could be really cool from a hardware-engineeringor academic standpoint. However, if the software can’t take advantage of it, or if the new circuitryis too complicated for a broad set of users and including it would severely jeopardize a releasewindow, then from a business perspective, it’s wise to omit it.

After several cycles of refinement, including reviews with Xilinx’s Customer Advisory Boardand Field Advisory Board, we finalize the overall architecture specification and the chip archi-tects and designers kick off their intensive design work. Simultaneously, the tool developmentgroup begins to assess what tools will be needed to support the size of the new architecture andany new circuitry the FPGA incorporates; the IP group transitions legacy intellectual propertyand develops (or partners for) new IP; and the development board group starts formulating thebase and market-specific boards. With the advent of the Targeted Design Platforms, these groups

AYears of hard work and coordinated effort underlie a new Xilinx architecture.Even before it’s launched, plans are under way for the next one.

Mike SantariniPublisher

must now work even more closely together to create domain- and market-specific platforms(daughter card development boards, IP, reference designs and other documementation) to auto-mate the more mundane design blocks and tasks, allowing customers to focus on the parts oftheir design that add value.

O nce the foundry delivers first silicon, another cadre of Xilinx engineers does extensivecharacterization and testing. Xilinx’s large quality-assurance group works closely withour foundries and prides itself on being thorough and rigorous, to the point of frown-

ing upon a failure rate of one part per million. At the same time, Xilinx trains its FAEs, sales staffand distributors on the feature set and benefits of our new devices, tools and IP.

Even now, the design team doesn’t get to rest on its laurels. Besides ironing out any minorkinks in the silicon, its next job is to craft automotive- and A&D-grade versions of the prod-uct family, adding circuitry and in some cases new semiconductor materials to the parts—and,thus, further layers of reliability to the silicon. Eventually these devices will spark an even moreextensive round of testing, such as temperature-range reliability for automotive and A&D partsalong with exhaustive single-event upset testing, in which our SEU gurus fire charged particlebeams (neutrons, et al.) at Xilinx devices to test their fault tolerance and the functionality oferror-correction software.

Finally, as each product family rolls out of testing, our marketing organization—of which Iam a part—pulls together the press materials, promotion collateral and support documentation,updates the Web site and launches the new offering—silicon, boards, tools and related IP—to thepublic. By the time we’re doing that, the due-diligence and even specification creation processesfor the next new generation are well under way.

Even as a longtime trade press editor and industry watcher, I didn’t fully appreciate all thehard work and coordinated effort it takes to garner the customer input that really shapes adevice’s future and get a new product generation up and running. The tripling of effort behindthe release of the Virtex®-6 and Spartan®-6 FPGA families and the Targeted Design Platformsmethodology shows that the company that invented the FPGA is relentlessly striving to perfectthe devices you use to create a plethora of remarkable products. These achievements would not

have been possible without your invaluable feedback and instruction. So thanks and congratula-tions—I can’t wait to see how you put the fruits of our immense collaboration to great use.

SUSA

N C

ART

ER

C O N T E N T S

VIEWPOINTS XCELLENCE BY DESIGN APPLICATION FEATURES 1818

2828

Letter from the Publisher Customer Collaboration, Teamwork Key to Next-Gen FPGA Development…4

Xpert Opinion Why Software Programmabilityof Electronics Is a Game Changer…16

Xcellence in Consumer ProductsHow to Tame the Power Beast in Consumer Handheld MPUs…18

Xcellence in Automotive & ISM FPGAs Help Measure Trajectory of Particlesin CERN’s Proton Synchrotron…22

Xcellence in Automotive & ISM Hardware-Centric Approach Builds More Reliable Medical Devices…28

Xcellence in Wireless CommunicationsImplementing Wireless LAN Interface in an FPGA…34

1616

Cover StoryXilinx FPGAs to Power Next-Generation Networked Battlefield 88

F O U R T H Q U A R T E R 2 0 0 9 , I S S U E 6 9

Xperts Corner Improving DDR SDRAM Efficiencywith a Reordering Controller…38

Xplanation: FPGA 101 Splayed Design Makes Pin Fin Heat Sinks Cooler…42

Ask FAE-X How to Supercharge MicroBlazeProcessing with Hardware Acceleration…46

XTRA READING

THE XILINX XPERIENCE FEATURES

4242

4646

5656

Xamples… A mix of new and popular application notes…52

Xtra, Xtra The latest Xilinx tool updates and patches, as of September 2009…54

Tools of Xcellence A bit of news about our partners and their latest offerings…56

Xclamations! Share your wit and wisdom by supplying a caption for our wild and wacky artwork…62

Product Selection Guide…63

ENTER TO WIN

AN SP601 EVAL KIT!

DETAILS, PAGE 62

8 Xcell Journal Fourth Quarter 2009

Xilinx FPGAs to Power Next-Generation Networked Battlefield

Xilinx FPGAs to Power Next-Generation Networked Battlefield FPGAs will play several pivotal roles in the GIG 2.0 network, speeding threat detection and the response time of military forces worldwide.FPGAs will play several pivotal roles in the GIG 2.0 network, speeding threat detection and the response time of military forces worldwide.

COVER STORY

by Mike Santarini Publisher, Xcell JournalXilinx, [email protected]

Roughly a decade ago, the U.S. governmentand its allies established a high-speed world-wide communications network they calledthe Global Information Grid. In the yearssince, the GIG has helped the military andvarious intelligence agencies to quickly iden-tify and assess threats around the world andcoordinate appropriate responses. What’smore, the GIG allows U.S. and allied forcesto communicate with one another on thebattlefield and thus perform operationsmore effectively. This speedy network hasalso enabled the Defense Department todeploy ever-more-advanced technologies,most notably unmanned spy and assaultvehicles that operators can control—in nearreal-time—from continents away.

But for many threat detection and bat-tlefield scenarios, near real-time simply isn’tfast enough, and so today the U.S. DefenseDepartment is in the early stages of givingthe GIG a massive performance overhaul.A major part of this overhaul will be theadoption of newer network equipment,which will speed data rates from 4Gbits/second today to 40 Gbps for wirelesscommunications, bringing real-time com-munications closer to reality.

Amit Dhir, senior director of Xilinx’saerospace and defense business, said thatthis move to GiG 2.0 will be accompaniedby a massive effort to add localized com-puting to the many systems and person-nel—satellites, airplanes, tanks, ships,unmanned aerial vehicles (UAVs) and footsoldiers—at the edge of the network. Theaim is to have UAV, satellite or observationequipment not only take video and photo-graphs, but perform some of the filteringand data analysis locally, on site, beforesending it back to analysts in the Pentagonor a local command center. This will makethe analysts more effective and efficient intheir jobs, paring down what’s commonlycalled “sensor to shooter” time.

“Xilinx has a rich, 20-year history serv-ing the A&D community, and we play asignificant role in the GIG’s current net-

backbones, said Loring Wirbel, longtimecommunications journalist, blogger andauthor of the book Star Wars: U.S. Tools ofSpace Supremacy (Pluto Press). “This coin-cides with the Pentagon’s increased use ofsemi-autonomous robotic vehicles. Wherefive years ago, UAVs were piloted fromcentralized bases like Creech Air Force Base

in Nevada and flown via joysticks at work-station consoles, today UAVs are flownfrom laptops with touchscreens, using spe-cial Google Earth-like applications. Withintwo years, those applications will be portedto devices the size of smart phones.”

In light of those changes, said Wirbel,“the need for the increased performance ofGIG 2.0 is obvious, and so is the need fordistributed intelligence and more paral-lelism in ever-smaller platforms—areaswhere FPGAs certainly excel.”

Networked-Battlefield Basic TrainingToday, the networked battlefield comprisesmany elements, starting with a sophisticat-ed fiber-optic network connecting variousassets to the Pentagon. The network equip-ment securely relays data to and from satel-lites, which in turn send it to a series oflocalized mobile networks or communica-tions bubbles around the world and, moreimportantly, to front-line military units—ships, submarines, aircraft, UAVs, tanks,trucks, artillery and even individual footsoldiers—enabling commanders and thosethey lead to communicate in near real-timeand more effectively coordinate militaryoperations (Figure 1).

Each element of the GIG has its ownunique operating requirements and itsindividual ways to communicate with therest of the network. And experts say that

work infrastructure,” said Dhir. FPGAscurrently are used in the GIG’s wired net-work equipment as well as the communica-tions systems of multiple satellites andUAVs such as the Global Hawk. They arefound in the communications systems onthe Joint Strike Fighter plane and manyother aircraft and land vehicles, as well as

the guidance systems of numerous genera-tions of missile platforms.

“All of these systems were developed 10or more years ago, because the defensedevelopment cycle is relatively slow,” saidDhir. Some of this technology, developedin-house by defense contractors, pairsASICs with early FPGA technology.

“Today’s FPGAs have advanced tremen-dously,” said Dhir. “In fact, today FPGAsare systems-on-chips.” Today’s devices havemultimillion-gate logic capacity andextreme logic performance, multigigabitI/O, DSP and embedded processing, mas-sive on-board memory, scalable security(encryption and tampering) and reliabilitythat users can further enhance to suit theirapplication requirements. “Where 10 yearsago, there was a viable reason to pair anFPGA with an ASIC for GIG-related appli-cations, today FPGAs are so advanced on allfronts, there simply is no good reason to goto the trouble and expense of designing andmanufacturing an ASIC,” Dhir said.“FPGAs can do the job of ASICs and theycan do more—they can be reprogrammed.FPGAs are going to be key to helping theDefense Department provide the rightinformation at the right time to the edge ofthe network—the front-line soldier.”

“The assumptions of the GIG 2.0 are allbased on distributed intelligence and wire-less real-time links,” rather than optical

Fourth Quarter 2009 Xcell Journal 9

COVER STORY

‘FPGAs will be key in helping the Defense

Department provide the right information

at the right time to the edge of the

network—the front-line soldier.’

each will greatly benefit from the use ofmodern FPGAs, which can function as thebasis for platforms that users can retask toserve a number of GIG-related applications.

Upgrading Command and Control The Pentagon’s command-and-controlnetwork is the center of the United States’military and intelligence operationsworldwide. As such, it is the GIG’s cen-tral communications hub. This networkuses high-speed fiber optics to communi-cate with assets in North America and ahigh-speed wireless network to issue com-

mands to a series of satellites, which inturn relay data to and from military andintelligence assets and allies worldwide.

Like any modern commercial commu-nications network, the system is com-posed of sophisticated high-speedcompute farms, servers and other networkequipment that must be highly reliable,secure and extremely fast. The networkmust be able to receive and transmit datavia many different communications pro-tocols. As such, the same Xilinx FPGAsthat manufacturers of commercial commsequipment have been using for decades

also play a key role in the Pentagon’s net-work. The major difference is that thePentagon’s network requires additionalmeasures for security and reliability (seesidebar, “FPGAs Make NetworkedBattlefield Safer, More Secure”).

Delfin Rodillas, senior manager for theXilinx A&D group’s avionics, space andhigh-performance computing segments,said that as the GIG moves to 2.0, XilinxFPGAs will be even more central.

“A key element of the networked battle-field is the ability to do packet processing atvery high line rates,” said Rodillas.


COVER STORY


“Bringing packets into and out of yourchip at high throughputs is critical, as is theability to inspect the contents of these flowsand do some processing on these packets,whether it is classification or traffic man-agement. A lot of our traditional commu-nications customers have chosen FPGAsover ASICs for many years now because ofthe flexibility and value FPGAs bring forpacket throughput and processing.”

Traditionally, in the GIG as well as thecommercial sector, communications com-panies have paired communicationsprocessors, which direct incoming data

streams, with various banks of FPGAs thattranslate protocols. But as Xilinx FPGAsget larger and transceiver speeds rise, ven-dors are looking to integrate more of theprocessing functions into the FPGAs them-selves, to further trim system latency andreduce the overall bill of materials (seecover story on wired communications inXcell Journal Issue 67). Similarly, in theGIG 2.0 buildout, the Defense Departmentcan take advantage of these increasingserial line rates—3.125 Gbps in theXilinx® Spartan®-6 FPGA to 11 Gbpsand beyond in the Virtex®-6 HXT—not

only to build optimized wired networksbut also to speed up the radios andreceivers of the many systems that willconnect wirelessly to them.

Pioneering Software-Defined Radio The ability to rapidly receive and translatevarious signals and protocols, and to do soreliably, is essential in the GIG, saidManuel Uhm, director of Xilinx’s WirelessCommunications Group and chair of themarkets committee of industry group theSDR Forum. Uhm also sits on the forum’sboard of directors. A given intelligence ormilitary action may involve communica-tion and coordination with allies, who like-ly use their own secure communicationsprotocols and wireless waveforms for theirown networks.

“The GIG actually has to be a meshednetwork, where everyone is connected toeveryone else in multiple ways,” said Uhm.“For example, if a unit of allied paratroop-ers enters the battlefield, the other units onthe battlefield and in the network need toidentify the paratroopers as friendly. Theparatroopers also need to be able to com-municate with the rest of the troops andcommanders in the action.”

Uhm said it’s important that the unit belinked to nearby troops in multiple ways—including via direct connection, by amobile ad hoc (local) network, as well asthrough satellite connections. “If, for exam-ple, a unit is cut off from direct communi-cations because it is in a canyon, the localad hoc network or, failing that, the satellitenetwork should still make sure they are onthe GIG’s communications grid,” he said.“If they fall off the grid or if a given alliedunit’s system doesn’t recognize the protocol,the allied unit can be mistakenly identifiedand targeted as an enemy.”

To create this connected-to-everyone-in-multiple-ways network ability, the DefenseDepartment has been an early adopter ofwhat’s commonly known as software-defined radio (SDR), in which a system’ssoftware reprograms the FPGA and otherdevices in a given system’s radio on the fly toreceive and send signals in various protocols(see cover story on next-generation wirelessnetworks in Xcell Journal Issue 65).

COVER STORY

Figure 1 – The Global Information Grid allowscommanders and those they lead to better assessthreats and coordinate responses to them.

In fact, thanks in large part to XilinxFPGAs, SDR, with its redundant reliableconnections, is one area where the DefenseDepartment has led the commercial com-munication sector. The GIG already hasdeployed FPGA-based SDR systems inmultiple points on the networked battle-field. As the GIG 2.0 comes online, SDRwill be mandatory for all points in the net-worked battlefield.

Ian Land, senior manager of A&Dproduct planning at Xilinx, points out thatbecause the company continues its relent-less pace to offer the most advanced andhighest-capacity commercial-, defense- andspace-grade FPGAs in the industry, cus-tomers will be able to integrate multiple

GIG-related functions onto a single FPGA.That will further cut BOM costs and bat-tery/power requirements, while also reduc-ing the size, weight, power and cooling(SWAP-C) of systems.

Xilinx has the industry’s most advancedand broadest range of FPGAs, from thespace-grade SEU/radiation-tolerant Virtex-4QV, to Virtex-5Q defense-grade FPGAsand commercial-grade 40-nanometerVirtex-6 devices, ranging all the way to thelower-cost, lower-power, yet very advanced45-nm Spartan-6 family. The latter FPGAsboast a generous amount of 3.125-Gbps seri-al I/O. Xilinx also gave the Spartan-6 FPGAsAES encryption with Block RAM andeFUSE support, better suiting them to the

small battery-operated, handheld devices thatconnect to the GIG.

In tandem with this year’s introductionof its Virtex-6 and Spartan-6 FPGAs,Xilinx took its technology a step further byrolling out its Targeted Design Platformstrategy (see cover story in Xcell JournalIssue 68). The company offers domain-and market-specific development boards,tools, IP, reference designs and related ref-erence materials to automate the more-mundane design tasks, allowing customersto focus the majority of their efforts on theareas that will differentiate their designsand make them unique. The A&D groupis offering a Single Chip Crypto market-specific Targeted Design Platform (see


COVER STORY

Depending on where in the world they operate and whatfunction they perform, some of the systems in the GIG

may have much more stringent reliability requirements thanothers. Many devices, especially those that operate in spaceor in high altitudes, must be able to handle single-eventupsets (SEUs), caused when charged particles collide withelectronic circuitry.

The number of charged particles in the atmosphere (andthus the chance of encountering an SEU) increases with alti-tude (see figure). If a given part is not radiation tolerant ordoesn’t employ at least some error-correction mitigationscheme, a single charge can cause reliability problems rangingfrom delay and the introduction of errors or false signals intoa bitstream to actually latching up a device and potentiallyfreezing an entire system (see http://www.xilinx.com/esp/aero_def/radiation_effects.htm).

To help, Delfin Rodillas, senior manager for the XilinxA&D group’s avionics, space and high-performance com-puting segments, said Xilinx offers customers a range ofSEU mitigation options, from what he refers to as radia-tion-hardened-by-test to radiation-hardened-by-designtechnologies. “We have done a lot of testing to characterizethe FIT [failure in time] rate of our devices under neutronSEU effects,” said Rodillas. “In addition, Xilinx has alsodeveloped reference designs and IP to help with SEU detec-tion and correction. For example, we have a Virtex-5 SEU

macro that leverages some of the built-in features of the sil-icon.” They include cyclic redundancy checks and frameECC. In addition, “we have a macro for error correction.This further increases the reliability of our devices if theyencounter SEUs,” he said.

Rodillas said Xilinx’s Triple Module Redundancy (TMR)tool allows customers to design further layers of SEU protec-tion and reliability into their products. Essentially, the tripleredundancy becomes useful if a device were to be hit by a largeor multiple charged particles at the same time, creating amultibit upset, said Rodillas. “If one circuit fails, the twoother redundant circuits do a voting function to identify, andself-correct, the leg that has been corrupted. All of this is donewithout disrupting the device operation.”

For space-borne and other extremely high-reliability appli-cations subject to radiation, Xilinx offers IP that can con-stantly correct the configuration of the FPGA, again withoutdisrupting device operation. Rodillas notes that while theseSEU mitigation and redundancy technologies, IP andmethodologies were originally developed for A&D applica-tions, they are also starting to find use in commercial systemssuch as high-speed computing and communications, whereSEU mitigation is becoming a growing requirement.

Xilinx is a leading researcher of SEEs (single-event effects)on semiconductors. Visit http://www.xilinx.com/esp/aero_def/see.htm.

FPGAs Make Networked Battlefield Safer, More SecureReliability and security are important for any application, but for the networked battlefield, they’re absolutely critical.


ht tp : / /www.xi l inx . com/e sp /aero_de f /crypto.htm) and will release a D0-254 (avion-ics reliability standard) market-specific TDP(see http://www.xilinx.com/esp/aero_def/do254.htm) to help A&D customers getproducts to market faster.

FPGAs Deliver Localized Computing In addition to upgrading the GIG net-work infrastructure, the military and itsmany contractors are just now starting toadd localized computing to the manyassets that connect to, and communicateby means of, the GIG.

Traditionally in the networked battle-field, said Land, a military or spy unit willsend data in a live feed through the net-

work to the Pentagon, where analystsdownload, decrypt and analyze it andthen respond. “A lot of customers aretelling us that a problem with sending asignal through the network is that alongthe way, it can pick up a lot of noise,” saidLand. “If the processing is done quicklyin the battlefield unit and then sent backthrough the network in digitized form, itcan save a lot of time [analysts would oth-erwise spend] cleaning up and analyzingthe photos, video images or messages atthe command-and-control center.”

Further, a spy vehicle may take severalthousands of photographs and severalhours of video. Analysts have to wadethrough all this material to find those

photos and video segments that are ofinterest. Adding more localized comput-ing or on-board processing in the surveil-lance craft itself will allow the craft tonarrow down which photos or video seg-ments are potentially of interest, and sendonly these to the analysts. Analysts wouldbe able to identify threats more quicklyand the military potentially couldrespond more rapidly and effectively tothreats as they arise. “FPGAs are great atdoing that kind of signal processing anddoing it in real-time,” said Land.

Also, doing more of the processinglocally reduces the overall amount of traf-fic on the network and improves the net-work’s overall performance.

COVER STORY

From Foxhole to Satellite In addition to reliability, security is another essential require-ment of the net-centric battlefield. Many assets connected tothe network must be able to communicate securely.

For communications, from a warfighter in a foxhole to thebig pipes of a satellite, the Defense Department has setencryption requirements at every point in the networked bat-tlefield. In addition, communication and interoperabilitybetween allied forces can be challenging, as each country usesits own encryption schemes and each has its own requirementsfor a given operational environment.

Xilinx’s A&D group pioneered a huge leap in FPGA-basedencryption back in 2006, when it delivered the industry’s firstsingle-chip encryption technology, which allows the defenseindustry to use FPGAs in cryptographic systems in a muchmore effective and efficient manner than they have in the past.

Prior to the availability of the single-chip product, crypto-graphic systems would use multiple chips to add layers ofredundancy and reliability. Those multichip solutions were notextensible, and they increased a system’s size, weight, powerand cooling (SWAP-C). Xilinx’s revolutionary one-chip cryptosolution allows customers to do all the cryptographic functionsin a single Xilinx FPGA while still meeting the governmentreliability requirements (see http://www.mil-embedded.com/pdfs/NSA.Mar07.pdf).

Cryptographic applications are only a small part of thegeneral security concerns within the GIG. Physical security,or anti-tamper technology, is another area contractors mustaddress. Xilinx, the first company to introduce bitstream

encryption, recently expanded its lead in FPGA security byintroducing bitstream authentication as well as encryptionin the Virtex-6. Besides benefiting defense applications, sil-icon enhancements to improve security, such as authentica-tion, also address commercial concerns about protection ofIP, piracy, cloning and the like. Xilinx has not only addedenhancements within the silicon, it has also invested in thedevelopment of IP that further increases the security of theFPGA. — Mike Santarini

π−

μ−γ

π0π+

n

n

n

n

n

n

p

p

p

pe−

e+

OUTERATMOSPHERE

EARTH SURFACE

AIRPLANE ALTITUDES=7200 NEUTRONS

CM2/HOUR

SEA LEVEL=20 NEUTRONS

CM2/HOURNEUTRONS CAUSE FAILURE

COSMIC RAYS

Graphic courtesy of EDN magazine.

Neutron concentration and the likelihood of encountering a single-event effect is greatest at an altitude of 50,000 feet. At 30,000 feet,

the typical cruising altitude for commercial aircraft, ICs are 300 timesmore likely to encounter an SEE than at sea level.

Expanding the Role of Unmanned Vehicles Of the many assets that connect to the GIG,unmanned vehicles will likely see the great-est boost from an increase in network per-formance and from localized computing.

Ching Hu, senior engineer of strategicapplications in Xilinx’s A&D group, saidthat over the last decade, the military hasplaced greater emphasis on the develop-ment and deployment of unmanned vehi-cles. Today the military is deploying notonly UAVs like the Global Hawk, Predatorand Reaper, but also an array of unmannedground vehicles (UGVs) and, in the case ofthe Navy, unmanned surface vehicles andunmanned undersea vehicles (respectively,USVs and UUVs).

“One of the mottoes the Air Force andthe Navy use today is ‘Persistence in thesky 24/7/365,’” said Hu. “What thatmeans is that the military wants to be per-vasive and persistent in the sky. For exam-ple, the U.S. Navy has a UAV programcalled BAMS [broad aerial maritime sur-veillance], a Global Hawk variant thatprovides coverage from multiple launch-ing sites, where they can almost cover theentire globe. At all times they have a UAVflying at 60,000 feet and taking picturescontinuously. The great thing about aUAV is that you don’t have to worry aboutit getting tired—you just have to makesure it has fuel to keep going. It is oneadvantage we didn’t have in the past.”

Hu notes that unmanned vehicles cantake on extremely dangerous and rigorous

missions that in many cases would tax theendurance of human pilots and put theirlives at risk. In some cases, UAVs performmissions that humans simply cannot phys-ically do, such as rapid and violent threatevasions that require the craft to sustain ahigh G-force level, or simply fitting intotight spaces that cannot accommodate apiloted craft.

The Pentagon is also broadening theroles of the unmanned vehicles. Where themilitary initially deployed them mainly forsurveillance, bomb detection and defusal, itis increasingly using the craft to deployordnance and actually attack targets.

Both armed and unarmed unmannedvehicles are remote controlled, “piloted” bypersonnel who in some cases may be verynear the craft’s area of operation and are run-ning it via a direct signal—or who mayreside on the other side of the globe, con-trolling the vehicle via the many connec-tions of the networked battlefield. Becauseof the distance, there is a delay between whatthe craft sees and what its controller sees.

As a result, said Hu, operations can misstargets of opportunity. Also, most UAVshave to be landed by a separate operatorclose to the runway where they are to touchdown, because the lag through the networkcould cause a distant operator to overshoota landing strip and cause a crash. As a mat-ter of government policy, UAVs cannotland at civilian airports because of the riskof crashing into commercial planes, airportstructures or nearby homes. In fact, UAVs

usually have their own designated militaryairfields, to keep soldiers and militaryequipment out of harm’s way. A crash canbe especially devastating if the vehicle is stillcarrying ordnance.

Because of this danger and the desire toshrink the sensor-to-shooter chain, the mil-itary is always looking for ways to speed upcommunications and to add more localizedcomputing and more sophisticated sensorarrays to these UAVs. The technology isdesigned to land unmanned vehicles moresafely and consistently, while enabling themto react more quickly to targets, sort data ofinterest and perform threat detection andevasion maneuvers in real-time.

Hu notes that customers designingunmanned vehicles could greatly benefitfrom Xilinx’s many years of deployment inadvanced sensor systems. In UAVs, the useof multifunction sensors teamed with local-ized computing could not only help short-en the sensor-to-shooter chain but alsoassist the craft in assessing and evadingthreats in real-time, and help it land moreprecisely, perhaps without the need of alocal operator.

“These same FPGA technologies canusher in a new era for UAVs and the rest ofthe systems connected to the networkedbattlefield,” said Hu.

Moving forward, the ever-increasingsophistication of FPGA technologies andthe inherent flexibility they provide will givethe U.S. military and its many contractorsthe opportunity to more rapidly mature andimprove all points of the networked battle-field, while streamlining communications.It’s not hard to foresee a day when, thanksto FPGA localized computing, the militarywill be able to perform threat detection andidentification and mobilize the appropriateresponse (diplomatically or, if need be, mil-itarily) so rapidly that it will defuse threatsbefore they escalate. That will bring U.S.forces closer to the ideal of military pre-paredness defined by George Washington,namely, that “There is nothing so likely toproduce peace as to be well prepared tomeet the enemy.”

For further information on Xilinx’s A&Dofferings, visit http://www.xilinx.com/esp/aerospace.htm.


COVER STORY

Figure 2 – The Global Hawk unmanned aerial vehicle typically flies reconnaissance missions at 60,000feet and can refuel in midair to stay aloft many days at a time. It is one of a growing armada of

unmanned vehicles remotely controlled via the GIG.

This new PCIe board from the Dini Group provides 8 million ASIC gatesusing two new Xilinx Virtex-6 FPGAs. The huge size and unmatched speed(1.6 Gb/s LVDS chip to chip) of these new devices give ASIC/ASSP developers far more power and performance; the Dini Group makes thatpower easy to use.The features you expect are all here:

• Easy hosting via 4-lane PCIe, USB2.0, 1000base-T Ethernet, or stand alone.

• Stuffing options (mix and match) between 6 types of the highest capacity/performance,Virtex-6 FPGAs.

• Three independent low-skew global clock networks.

• Two DDR3 SODIMM memory sockets (PC3-10600), up to 4GB in size.

• Daughter expansion cards for ARM core tiles, AD/DA, Cameras, IO breakout, etc.

Plus, new features (many still being developed) are made available by theaddition of the MV78200 Dual CPU Marvell processor, a mind-blowingmonster for data transfer and manipulation. After FPGA configuration,both 1.0 GHz CPUs, with floating point, are customizable to your application.

New logic density, new speed, and new versatility — call the Dini Groupfor easier prototyping and algorithmic acceleration.

www.dinigroup.com • 7469 Draper Avenue • La Jolla, CA 92037 • (858) 454-3419 • e-mail: [email protected]

Next Generation FPGAs are here now!

by Jacques BenkoskiUSVP Venture Partner;Executive Chairman,[email protected]

The common wisdom says that the semi-conductor industry is in serious trouble, asincreased design cost, growth in theamount of intellectual property (IP) neces-sary to put a product together and soaringmask costs combine to make any but thehighest-volume chips economically unfea-sible. This view foresees a world populatedby microprocessors and a few consumersystems-on-chip (SoCs), with the FPGAvendors focused on serving the low-volumeprice- and power-insensitive applications.

In fact, the drive to enable software pro-grammability of integrated circuits willturn that conventional view upside-down.Entering the scene are tools coming fromthe compilation world that are able to mapthe software description onto automatical-ly created hardware extensions. The elec-tronics are, in fact, software programmedand automatically compiled, without goingthrough the equivalent of assembly lan-guage that RTL represents.

The resulting architecture is clean, withan embedded microprocessor at the heartof the IC and a series of hardware accelera-tors neatly brought to bear on the appro-priate tasks. This is true of a dedicatedASSP or ASIC, but is equally applicable toa modern FPGA with embedded cores.

The Problem with MulticoreExecuting a given task in software on ageneric processor architecture, as opposedto dedicated hardware, has been proven tobe two orders of magnitude off in cost,power and performance. The pressure toreach low-power solutions is now perva-sive beyond mobile applications. With alldue respect to the embedded cores, theywould have to run at impossible speed toswallow a 5-Gbit/second stream, performlive video transcoding or aim a beam-shaping antenna array.

Although some argue that multicore isthe solution, with dedicated cores ear-marked for specific applications, the multi-core programming problem remainsstubbornly elusive beyond the simple mul-tithreading on an identical architecture.Nobody seems able to master theunbounded complexity of heterogeneousprocessing engines with different character-istics communicating over ill-characterized

buses and networks-on-chips. At the sametime, the number of software engineerscontinues to grow and the number of hard-ware engineers continues to shrink—at leaston a relative basis. Yet in most cases, com-panies give away the software as a necessarycomponent of a platform rather than as thevaluable differentiator it is made to be.

FPGA as Distributed Compute EngineThe FPGA world has been undergoing itsown revolution. No longer simply seen asgobs of glue logic, FPGAs have nowemerged as an attractive alternative imple-mentation for many applications focused onpower, price and performance. This alterna-tive is now making its way into consumerand even mobile applications. Yet FPGAsalso emerge as a fascinating distributed com-pute fabric with a regular architecture ofcomputational elements and memories.They suddenly represent a quasi-systolic-array alternative to the Von Neumann digi-


Why Software Programmability of Electronics is a Game ChangerThe architecture of the future has an embedded microprocessor at its heart and a series of hardware accelerators to handle particular tasks.

XPERT OPINION

tal processors, with a much more attractiveperformance, cost and power trade-off.

This contrast between microprocessor-or DSP processor-based designs and FPGAas a distributed compute engine is hard tounderstand. To get a grasp on it, one shouldtry to imagine the amount of hardwareresources that are being used to fetch datafrom memory, bring it to a datapath insidethe processor and then put the data back inmemory. To achieve the best performance,this Von Neumann processor architectureimplies further hardware resources, such asbranch prediction and speculative execu-tion, that are wasteful of silicon real estate aswell as being power hogs.

Now imagine that these repetitiveloops are unrolled from the time domainand mapped to a two-dimensional arrayof distributed compute and memoryresources available in an FPGA. The dataflows from one side of the array to theother and is processed efficiently—this isthe basic reason why software running ona processor is always inherently much lessefficient that its equivalent mapped ontohardware. However, historically theFPGA hardware implementation alwayshad to go through the equivalent ofassembly coding—that is, RTL-leveldesign—whereas the processors had com-pilers and high-level languages.

These trends all converge on the com-pelling need to enable software programma-bility of ICs. Not the kind ofprogrammability that the original siliconcompiler pioneers had dreamed of, but onethat has evolved from the world of very longinstruction word (VLIW) compilation andthe world of hardware accelerators.

Microprocessor designers have longknown that it pays to generate additionalinstructions to cope in hardware with eithercommon or costly software routines. Back inthe 1980s, we had Intel’s 8087 mathcoprocessor, which later evolved into themore elegant multimedia extension (MMX)to the famed Intel Architecture. Morerecently, as ARM cores became ubiquitous,designers have been quick to recognize thatthey could offload in a similar way the moretedious tasks that the embedded micro-processor could not cope with. The path

yet offering a similar clean programmingmodel. However, there is no reason toerect an unnecessary wall between the twoworlds, as many applications will be ideallyserved with an off-the-shelf SoC and acompanion FPGA as a hardware accelera-tor for specific tasks.

One can now see a continuum of solu-tions, from software running on processorsto processors with hardware accelerators onSoCs; from SoC/FPGA coprocessing solu-tions to FPGAs with cores and hardwareaccelerators; all the way to pure FPGA-as-

compute machines. The availability of aprogramming paradigm is the unifying fac-tor underlying them all.

Already today, the DSP processingcapability embedded into SoCs has sur-passed that of the discrete DSP, as notedin the “DSP Silicon Strategies” reportissued earlier this year. Meanwhile, theITRS has predicted an explosion in thenumber of embedded accelerators. It isnot clear if they are to be called “software-programmable integrated circuits” or ifthey will ever have a name. But I think ifone steps back and looks at the overalllandscape, that trend is obvious,inevitable and game changing.

Jacques Benkoski is working with several U.S.Venture Partners companies and currentlyserves as the executive chairman of Synfora.Before joining USVP, he was CEO of MontereyDesign Systems. Prior to that, he held technicaland management positions at Epic DesignTechnology, Synopsys, STMicroelectronics,IMEC and IBM. He holds a BSc in computerengineering from Technion Israel Institute ofTechnology, and an MS and PhD in computerengineering from Carnegie Mellon University.

from there to the most recent develop-ments—those that I believe are game chang-ing—was natural, at least in hindsight.

Built around embedded microprocessorsand accompanying hardware accelerators,the new architecture radically changes theconundrum that traditional ICs—bothSoCs and FPGAs—were facing by enablinga way to capture all the software horsepowerand differentiation, and to map it ontohardware. This architecture also tames theheterogeneous multicore programmingproblem, as each hardware accelerator is

seen as an accelerated subroutine. Thatmeans you can map all the software differ-entiation into hardware, minus theheadache. The work of the system softwareengineers, which contains so much of thedifferentiated and unique IP of semiconduc-tor and system companies, can be capturedelegantly. And since it is now delivered in adifferentiated, low-power, high-performancehardware package, it allows the manufactur-ers to actually get paid for their IP.

The outcome is a new wave of rapid prod-uct introductions of complex and targetedsolutions for ever-more-powerful consumerelectronics, myriads of mobile Internetdevices, gaming-platform breakthroughs andautomotive infotainment opportunities, toname only the most obvious.

Tear Down the WallThese compiler tools have an even moredramatic impact on the FPGA world, sincethe software maps cleanly using the sameparadigm on these devices’ prearchitecteddistributed compute platform. FPGAs cannow be a formidable competitor to DSPsand other dedicated machines, with a bet-ter power, cost and performance trade-off,


FPGAs can now be a formidable competitor toDSPs and other dedicated machines, with a

better power, cost and performance trade-off, yet offering a similar clean programming model.

XPERT OPINION

by Rahul V. Shah Director of Customer Solutions [email protected]

Vishesh Agrawal ASIC Design and Verification [email protected]

The consumer handheld market is growingby leaps and bounds. With more process-ing power and increased support for moreapplications, portable products are cross-pollinating with traditional computing sys-tems even as the product life cycle hasdecreased considerably in this market seg-ment. As a result, especially in this era ofeconomic slowdown, it is imperative thatnew products meet the time-to-marketwindow to gain maximum acceptance. Adecrease in product life cycles requires areduced development cycle and anincreased emphasis on reusability andreprogrammability.

The emerging handheld market is alsoseeing interesting trends in which each indi-vidual device in a family has lower volumesbut there is more customization across theseries of devices, effectively upping the totalunit volumes. The key challenge thenbecomes how to develop a system that iswidely reusable and also customizable.


How to Tame the Power Beastin Consumer Handheld MPUsUsing an FPGA is a popular way to expand the capabilities of an embedded system.Designers can reduce power consumption at the same time.

XCELLENCE IN CONSUMER PRODUCTS

These requirements have led designersincreasingly to turn to the FPGA for hand-held-product development. The FPGA hasbecome more powerful and feature-rich,while gate counts, area and frequency haveincreased. FPGA development and turn-around cycles are considerably shorter thanthose of custom ASICs, and the addedadvantage of reprogrammability can makethe FPGA a more compelling solution forhandheld embedded systems.

In an ASIC- or an FPGA-based design,designers must take certain performancecriteria into account. The challenges can bestated in terms of area, speed and power.

As with the ASIC, vendors are taking careof the area and speed challenges in FPGAdesign. With increased gate counts, theFPGA has more area and size to accommo-

sor interfaces, with various peripherals usingonboard connections. The performance ofthe system depends on the performance ofthe processor, which has a very standardarchitecture and is not easy to customize.

The processor may also end up utilizingits activity time in processing informationfrom a slower peripheral (for example, read-ing data from a slower ADC audio on anI2S interface). Although the processor usagemay reach 100 percent in this case, thedevice is not doing microprocessor-centricactivities but is working at a significantlylower performance level. Irrespective of itscore frequency, the microprocessor mustwait for the data from a slower clock. Thisalso results in higher power consumption, asthe processor is showing full utilization. Theresult is lower battery life along with larger

date larger applications, and tools includebetter algorithms to utilize the area optimal-ly. For example, technology advancementswith newer standard-cell libraries have led toFPGAs achieving higher frequencies.

The newer and better FPGA technologybrings with it a whole new set of challengesfor the designer. Power utilization is oneissue that moves to the forefront whendesigning an FPGA-based embedded sys-tem for a handheld or portable device.

FPGA in an Embedded SystemA typical embedded system consists of aprocessor, memory, standard interfaces likeUSB, SPI, I2C and so on, along with periph-erals such as liquid-crystal display, audio-outand the like. The heart of the applicationstill resides in the processor and the proces-



VideoEncoder

VideoDecoder

Host DSP

Video Encoder

Logic

Xilinx Spartan-3EFPGA

Video Decoder

Logic

Host DSP BusInterface

UARTInterface

RS232IC

I2CSlave

ADCIC

DACIC

SPDIFRX

SPDIFTX

I2CInterface

Analog AudioInput Logic

Analog AudioOutput Logic

Digital AudioInput Logic

Digital AudioOutput Logic

Figure 1 – FPGA architecture for audio/video distribution system

heat sinks or fans for cooling, eventuallyaffecting the reliability of the entire system.

FPGAs have started to play an impor-tant role on this front, since they canoffload the processor of many of itsperipheral interaction duties. For exam-ple, the embedded distribution system ofan uncompressed audio/video streamthrough a standard Gigabit TCP/IP net-work, shown in Figure 1, has a dedicatedDSP processor interfacing with a Xilinx®

FPGA over a standard bus interface. TheFPGA is connected to various slowerperipheral devices.

For starters, it is interfacing with a 12-bit PCM audio input and a 12-bit PCMaudio output, both over I2S. It also inter-faces with a video encoder and decoderand communicates with I2C slaves andRS232 devices. There are few general-pur-pose I/Os connected to the FPGA. Thestandard bus operates at a high speed of 66MHz, while the audio peripherals work ata low speed of 1.182 MHz. Serial inter-faces like UART and I2C operate at 56.6kHz and 100 kHz respectively. The datatransfer takes place over multiple clockdomains with only the processor configur-ing the data flow.

In this case, instead of the processorinteracting with the slower peripheraldevice, the FPGA can read data from theslower PCM ADC audio and store it inits internal buffer. Either the processorcan read that buffer periodically or theFPGA can send an interrupt to theprocessor when it has sufficient data. Theprocessor now has more idle time inwhich to do processor-centric work ifrequired; otherwise it can go into sleepmode during idle time.

Power ConsumptionIn a battery-operated embedded system,power saving is of the utmost importance.

Power consumption can be categorizedinto three areas: startup power, static powerand dynamic power.

The designer cannot control startuppower consumption, which plays an impor-tant role in deciding on the power supplyselection. Most of the maximum currentlevel drawn is achieved in this phase.

But static and dynamic power con-sumption are two areas where, with properplanning and by following correct guide-lines, the embedded designer working withFPGAs can make a marked improvementin power optimization.

Static power is the flow of currentthrough a component even when there isno activity in the system, generally due todevice bias and leakage. Static power alsodepends upon operating voltage. Reducingthe operating voltage will reduce the staticpower. But this decision is not always in thehands of the designer.

What the designer can do is to definethe architecture in such a way thatrequires the least amount of resources, touse resource sharing whenever possibleand to use the FPGA blocks in the mostefficient manner.

Another technique to minimize staticpower consumption is to perform powerestimations early in the design cycle and—if required and feasible—change the topol-ogy or use a different IP block. Xilinx’sxPower Estimator tool is useful in knowingvery early in the design whether you canmeet the power budget. Early-stage powerestimation may not be totally accurate, butit does serve as a guiding tool.

Dynamic Power ConsumptionDynamic power consumption is the resultof some activity on the FPGA gate—name-ly, signal switching—when both gates areswitched on briefly, causing current flowand capacitance. The speed of the signal

switching will determine how much poweris consumed. Another factor that deter-mines dynamic power consumption is theinherent capacitance created within thegeometry of the circuitry.

Dynamic power is a function of theclock rate, the number of gates that areswitching and the switching rate of thesegates. The capacitive load on the gate fan-out and wire add to the dynamic powerconsumption, which is proportional to theproduct of capacitance, voltage and squareof frequency.

The designer has maximum control overthis type of power consumption, withaccess to many techniques that will reap themaximum benefit when it comes todynamic power.

Reducing the signal switching frequencyslashes power consumption exponentially.As in the reference design shown in Figure 1,the control logic for the UART, paritycheck or frame overrun error occurs in theslower clock domain. Even though there isno reduction in the gate count, the powerconsumption falls. Designers can alsoreduce dynamic power consumption bylowering the overall operating frequency, iffeasible. After doing the feasibility and per-formance analysis, for example, the design-ers decided that instead of working at 133MHz, the above design also works at 66MHz. The DSP supported both thosespeeds, and reducing the voltage alsohelped in dissipating less power.

Another technique is to reduce thenumber of active gates in an operatingmode. Sometimes a part of the logic,though switched on and configured atpower-up, is not required to actually doanything. If, for example, the analog audiocapture unit is active, the application is notperforming any activity on digital SPDIFaudio capture. In this case, the typical digi-tal SPDIF audio capture still performs data



One technique to minimize static power consumption is to perform powerestimations early in the design cycle and, if feasible, change the topology

or use a different IP block. Xilinx's xPower Estimator tool signalsvery early in the design process whether you can meet the power budget.


sampling, biphase decoding and so forth,which is an unnecessary power drain. If theentire digital SPDIF audio capture circuitis disabled so that no signal switching takesplace in the circuit at all, the result will bea big reduction in dynamic power.

You can achieve this by disabling theclock that propagates to that portion of thedesign. A simple way of doing so is toAND the clock signal with an enable sig-nal, shown in Figure 2. If the enable signalis low, the output of the AND gate will staylow. If the enable is high, the output of theAND gate follows the clock.

Other methods may also be applicable.If possible and supported by topology,multiplexing address and data lines canreduce the number of signaling lines. Inour example, the output to the videoencoder was 16-bit data, which we multi-

plexed over 8-bit and sent over both theedges of the clock. This also resulted in sav-ing dynamic power. Choosing a serialrather than a parallel interface will alsolower the dynamic power. Using LVTTL orLVCMOS I/O with a lower capacitive loadhelps too, although the designer may notalways be the one to decide on the I/O.

Embedded Processors Embedding the processor inside the FPGAis another strategy that handheld designers

can adopt, with manifold benefits. First, itreduces the challenge of customizing theprocessor, as discussed earlier. Second, theinteraction between the peripherals and theprocessor resides inside the FPGA andreduces the I/O count. Since I/Os use con-siderable power, this also results in someamount of power saving. Xilinx’s Virtex®-5edition supports PowerPC® 440 proces-sors, hard processors and MicroBlaze™soft processors, all of which designers canleverage to make any system with high-endor low-end applications in mind.

Xilinx has been at the forefront in thisfield and its latest FPGAs offer many poweroptimization features. The company’spower analysis and power estimation tools,xPower Estimator and xPower Analyzer,have many features that help the designerin crafting a low-power FPGA system.

With the advent of 90- and 65-nanome-ter semiconductor technologies, the gates areshrinking in size, resulting in manifestationof static power consumption—a challengingphenomenon given the growing sensitivity topower metrics. There is a lot of excitement inthis field as power issues gain prominencewith various FPGA vendors. Low-powerdesign will determine how much more wecan pack into one system. There is a direneed to standardize a design methodologythat focuses on power consumption.


In Out

Enable

Clk

Figure 2 — A simple clock-gating mechanism

GetonTarget

Is your marketingmessage reachingthe right people?

Hit your target by advertising your product or service in the Xilinx

Xcell Journal, you’ll reach thousands of qualified engineers, designers, and

engineering managers worldwide.

The Xilinx Xcell Journal is an award-winning publication, dedicated specifically to helping programmable

logic users – and it works.

We offer affordable advertising rates and a variety

of advertisement sizes to meet any budget!

Call today: (800) 493-5551 or e-mail us at

[email protected]


FPGAs Help Measure Trajectory ofParticles in CERN’s Proton Synchrotron FPGAs Help Measure Trajectory ofParticles in CERN’s Proton Synchrotron System processes 15 billion analog samples per second to track the path of particles traveling close to the speed of light.System processes 15 billion analog samples per second to track the path of particles traveling close to the speed of light.

XCEL LENCE IN AUTOMOTIVE & ISM

by Terry BarnabySystem EngineerBeam [email protected]

A particle has a hard life at CERN. A pro-ton starts its journey as a hydrogen atom ina small, common-looking red bottle thesize of a soda siphon at the head of animposing 80-meter linear accelerator. Withits electron stripped, electric fields acceler-ate it to about one third of the speed oflight. It soon enters the 200-meter-diame-ter Proton Synchrotron (PS) machine,where further acceleration and condition-ing take place, with other protons, as abeam. After that, the particle is sent to abigger machine such as the Large HadronCollider or directly used in an experiment.Normally its journey will end in a near-speed-of-light collision with some otherunsuspecting particle. At every stage alongthe way, various sensors and high-speeddata-processing engines are spying on theproton, not least in the PS machine.

A versatile juggler of particles, theProton Synchrotron (Figure 1) is a keycomponent in the accelerator complex atCERN, the European Organization forNuclear Physics in Geneva, Switzerland.The PS machine accelerates and manipu-lates protons or heavy ions for variousexperiments. Apart from system down-times, the PS runs continuously, operatingon a different particle beam, for the vari-ous ongoing experiments, every 1.2 sec-onds or so. During operation it all looksquiet around the PS ring’s complex. Allthat an observer can notice of the particlemotorway traffic within the evacuatedtubes is a loud and heavy surging buzzfrom the multiple large power transform-ers outside every 1.2 seconds as megawattsof power surge to supply the system.

One of the many important instrumentsrequired for this machine is the TrajectoryMeasurement System. The purpose of theTMS is to measure the track of the particlesas they race around the ring at nearly thespeed of light. There are 40 sets of metalplate detectors spaced around the PS ring,providing 120 analog signal channels. Eachof these signals, three per detector pickup,

generally employ a generic Linux-basedcomputer as the main control and data-access module, with a number of FPGA-based modules at the front end to do thefast data capture and real-time processingwork. The TMS is based on that systemmodel. It comprises a Linux-based systemcontroller that provides control, datapostprocessing and external data access,and four, eight-slot, CompactPCI rackmodules, each with a ConcurrentTechnologies Intel dual-core processing

card running Linux. Within the rackmodules are specially designed, FPGA-based, Pickup Processing Engine, orPUPE, cards (Figure 2).

When developing such real-time sys-tems, especially with FPGAs, it is impor-tant to decide what work the varioussystem modules will handle. FPGA hard-ware is excellent for doing simple, real-time, repetitive things in parallel and atrelatively high speed. Software running ona conventional processor is good for con-trol and data access as well as flexible datapostprocessing. Getting the partitioning ofthis work right, together with the protocolsused among the system’s various hardware

needs to be digitized at 125 megasamplesper second. That is an overall data rate of 15billion samples, or 30 Gbytes of data, persecond. Due to this high data rate, there isa need to process the data in real time, toreduce the amount of data and pick out theinformation of interest before storing it.

CERN’s scientists and engineers haddevised the basic processing algorithmssuitable for FPGA implementation. Forengineers at Alpha Data and at Beam Ltd.,our job was to develop the idea and pro-

duce a complete working system in a rea-sonable time frame and at reasonable costto do the job.

Designing the TMSFrom the early stages of design, we envis-aged a very modular system that wouldprovide fault tolerance and ensure easy livemaintenance. It also had to be flexible andexpandable. We took the concept of mod-ularity down even into the FPGA fabric,where each FPGA implements three sepa-rate pickup-detector channel blocks.

Alpha Data and Beam have workedjointly on a number of real-time, FPGA-based instrumentation systems. These



Figure 1 – CERN scientists accelerate and condition particles in the ring-shaped Proton Synchrotron.

and software modules, is essential in proj-ects of this kind.

Communication between the TMSmodules uses Gigabit Ethernet for bothexternal links and internal communica-tions between the system controller andmodule controllers. The Beam ObjectAccess Protocol (BOAP) is a simple andefficient protocol for this work. The BOAPsystem employs quality-of-service protocolsto tighten message delivery times. Weimplemented communications with thePUPE FPGA cards as a register and sharedmemory interface over a CompactPCI bus.

Along with the 120 analog channels, theCERN PS machine produces a number ofdigital system-synchronization signals. Themain one is a systemwide 10-MHz clockthat is used to synchronize the completesystem. A digital timing bus links these sig-nals to each of the PUPE boards.

PUPE FPGA Board DesignTo reduce project costs and developmenttime, we decided to use commercial off-

the-shelf components wherever possible.For the Pickup Processing Engine, we ini-tially considered using off-the-shelf FPGAmodules and A/D converter modules thatAlpha Data produces. However, in thiscase, we decided instead to design and pro-duce a custom CompactPCI board basedon an existing PMC design. The size andperformance of FPGAs had reached the

stage where one could easily support nineADC channels with the appropriate data-processing algorithms. Sharing the FPGAand associated communications hardwareamong nine ADC channels allowed us toreduce the system cost considerably, trimthe system’s physical size, more tightly cou-ple the ADC clocking system and supportextra features. Alpha Data’s experience inproducing Xilinx FPGA boards allowed usto produce the new PUPE board withinseven months, from conception to firstworking boards.

These PUPE CompactPCI cards (des-ignated ACP-FX-N2/125) are the heart ofthe Trajectory Measurement System(Figure 3). They utilize a Xilinx®

Virtex®-4 FX100 FPGA, 1 Gbyte ofDDR2 SDRAM and nine 125-MHz, 14-bit ADCs. The board’s core design isbased on Alpha Data’s ADM-XRC/FX100-10/1G FPGA PMC mod-ule, providing a high degree of FPGAfirmware compatibility with this hard-ware. The design employs a low-jitter,phase-locked loop (PLL) synchronizedclock source for the ADCs. One of thedesign issues was to distribute this clockto all nine ADCs without increasing theclock jitter appreciably. The board’s PCBlayout was also crucial to achieving high-er performance from the A/D convertersas well as low on-board noise from theboard’s components.

The board employs a second XilinxVirtex-4 LX25 device for CompactPCIinterface duties. This uses the PCI bus’



Analog Inputs

9 x ADC

16 DigitalI/O Lines

FPGA DataProcessing Engine,

Virtex-4 Based

Digital Timing andTest Inputs/Outputs PUPE: per module

PC

I Bus

Timing B

us

Module Controller(1 per Processing

Module)Boots from

System Controller

Processing Module: 3 per System

System Controller2 Units (1 Spare),

Intel Pentium-Basedwith RAID Disk Storage

Gigabit E

thernet Sw

itch

LAN

Figure 2 – The Trajectory Measurement System

Figure 3 – Alpha Data’s Pickup Processing Engine (PUPE) board is based on Xilinx FPGAs.


FPGA firmware as developed by AlphaData for its existing PMC boards. ThePUPE also has two Gigabit EthernetPHYs with the associated RJ45 connec-tors on the front panel, connected direct-ly to the FPGA. The large number of I/Opins available on the FX100 allowed us toconnect all of the ADCs and all other on-board components directly to the FPGAwith minimal glue logic. We use quick-switch devices for interfacing to external5-volt digital signals.

CERN is a primarily a research institu-tion. So throughout the TMS designprocess, we tried to add features that couldbe useful in the future as needs changed.The FX100, for example, could be swappedfor an FX140 if a CERN project requiredmore hardware resources. Since five PUPEboards share the CompactPCI bus, there is acommunications limit of about 100Mbytes/s (20 Mbytes/s per board). TheGigabit Ethernet channels can provide up to500 Mbytes/s, per five PUPE modules, forfuture needs. The PUPE is thus capable ofemploying the internal PowerPC processorswith the Gigabit Ethernet interfaces to run alocal Linux control process if required.

Design and development of the PUPEprogressed relatively smoothly, with only afew minor issues to sort out before thefirst production board run. The on-boardJTAG chain and ChipScope™ interfacehelped enormously with initial boarddebugging, allowing us to configure theFPGA fabric for test purposes (Figure 4).

PUPE FirmwareThe PUPE FPGA firmware is written inVHDL. We used the ISE® 9.2 tool set fordevelopment purposes. It was helpful usinga readily available development package;CERN was using this same tool set, so we

could exchange FPGA firmware designsrelatively easily without the necessarychanges for different vendors’ tools. Wealso used Alpha Data's standard SDK tosimplify development and deployment.

The design makes use of a number ofkey core modules that we had developedfor other projects. The use of these stan-dard, proven cores helped us to producethe FPGA firmware within the project time

scale with relative ease. The most difficultand complex block is the SDRAM accessblock. This has a time-shared, dual-portaccess scheme so that the host computercan read the acquired data while it is beingcaptured in real time. This block caused us

the most issues during development, main-ly due to the tight timing requirements,and thus increased code compilation times.Thankfully, our modular approach helpedout here, allowing us to develop and testthe firmware using just one of the threepickup channels. This reduced compilationtimes significantly, with the lower gatecount used, and thus improved the rate offirmware development.

We implemented the interface to thehost computer as a register plus sharedmemory interface. The firmware imple-ments a set of state machines that are soft-ware programmable to carry out most ofthe data-capture and data-processing func-


9 ADCs

ADC PLLClock andDistribution

Digital I/O16 Channels

2 GbitEthernet PHYs

1 Gbyte of DDR2 SDRAM

in 4 Banks

ProcessingVirtex-4 FX 100

FPGA

FlashMemory

PCI BusVirtex-4 LX25

FPGA

Com

pact

PC

I Bus

Digital Clock

FlashMemory

Figure 4 – The PUPE Board Structure

The core module is the particle beam synchronized phase-locked

loop. The FPGA allowed us to implement this real-time

structure together with the DSP processing algorithms

side-by-side in the same chip rather than in dedicated hardware.

tions with minimal host software interac-tion. The FPGA hardware can thus beprogrammed to implement the appropri-ate real-time data capture and data pro-cessing for the PS machine’s cycle.

The core module is the particle beamsynchronized phase-locked loop. Thisstructure allows the processing algorithmsto synchronize to the incoming analogbeam signals. Implementing this real-time structure would have probablyrequired dedicated hardware. The FPGAallowed us to implement it together withthe DSP processing algorithms side-by-side in the same chip.

The firmware design called for a num-ber of clock domains. The ADC clock,the PCI bus clock, the external systemclock as well as the lower-level clockswere all needed. The Virtex-4’s DLLs andclock structure handle the requirementsadmirably.

We developed the firmware in parallelwith the hardware and system software,using Alpha Data’s FPGA boards as a testand development platform prior to hav-ing the real hardware available. Thishelped to tighten up the project develop-ment cycle and ensured that any prospec-tive issues emerged at an early stage.

The FPGA utilization was about 75percent. During development we wereconscious of and targeted the power levelsthe system would use. One thing wenoticed, during development, was theneed to turn off the “keep alive” for theMulti-Gigabit Transceivers. This saved7.5 watts of power usage by each FPGA.

TMS System and SoftwareWe used a reduced but standard FedoraCore 6 Linux distribution for the systemcontroller and our own simplified Busybox-based Linux system for each of the threemodule controllers. The open-source natureof Linux, as well as its stability, is a realboon when used in scientific research. Thereal-time software we developed providescontrol, data postprocessing and access, sys-tem monitoring and test.

The FPGA’s firmware bit file resides onthe system controller and is downloaded tothe FPGAs on initialization. This method

makes it easy to test out and use differentprocessing algorithms within the FPGAs.

We built test blocks into the FPGAfirmware and system software. Thisallowed testing of the system duringdevelopment when there were no ADCsavailable, and later when system testing.(It is difficult to find an old ProtonSynchrotron lying around for test purpos-es!) Again, the FPGA helped enormouslyhere. We could easily add test hooks andmodule emulation hardware within theFPGA that could be switched in and outunder system control or just includedduring development. In fact, the PUPEfirmware has a complete data and logicanalyzer built in to aid with diagnostics inthe running system.

FPGAs in UseIn scientific research projects like this one,FPGAs really do excel. The nature of scien-tific work entails a degree of algorithmdevelopment as ideas and experimentschange. The use of an FPGA makes it easyto tightly integrate parallel DSP functionstogether with digital PLL, logic statemachines and other structures into one chip.The programmable nature of the FPGAallows the engineers to modify the fast real-time data-processing algorithms as requiredto meet the needs of the experiments. In ourcase, it also made the overall TMS andPUPE board itself quite flexible and suitablefor many other application areas.

The resulting power usage of the com-plete system is around 700 W in full use(one system controller, three Linux modulecontrollers and 15 PUPE boards). EachPUPE board takes about 35 W—a veryreasonable figure considering the process-ing work involved.

The TMS is now in use at CERN. It isproviding more detailed and complete infor-mation on the trajectory of the particlebeams, giving CERN scientists more valu-able data in their never-ending search to dis-cover the workings of the universe we live in.

More information on this system isavailable at http://portal.beam.ltd.uk/support/cern. Alpha Data’s FPGA productrange is detailed at http://www.alpha-data.com/product_comp.php.



by Chuck RussoGeneral ManagerHEI [email protected]

P.J. TanzilloMedical Product Marketing ManagerNational [email protected]

Greg CrouchEmbedded Business Development ManagerNational [email protected]

Just about everyone who owns a mobilephone has experienced a dropped call at onetime or another. While system failures orglitches in these and other consumer prod-ucts are inconvenient, they don’t have cata-strophic consequences. But a single systemfailure or glitch in medical electronics canliterally prove fatal—one reason why med-ical equipment, the devices contained inthese systems and the software running onthose devices must pass rigorous testing andconform to stringent requirements imposedby the Food and Drug Administration(FDA). While the task of bringing thesesophisticated devices to market may seemdaunting, the rewards of creating a new sys-tem that will improve the quality and evenlength of life for some patients are whatmotivate our team at HEI Inc.


Hardware-Centric Approach BuildsMore Reliable Medical DevicesHardware-Centric Approach BuildsMore Reliable Medical DevicesImproved design methods combined with Xilinx FPGAs reduce time-to-market and enhance quality.Improved design methods combined with Xilinx FPGAs reduce time-to-market and enhance quality.


To ensure that our new designs functionoptimally and reliably, and stream throughthe FDA’s approval process with ease, weemploy a highly structured design method-ology we call the “scrum/sprint develop-ment process.” We also reduce anypossibilities of software errors by simplylowering the amount of functionality weimplement in software. Instead, we imple-ment functionality in Xilinx® FPGAs.

As a bit of background, HEI is a con-tract medical-device design company thatsupports the entire life cycle of our cus-tomers’ products, from early businessdevelopment support through design anddevelopment, new-product introductionand prototyping, initial production ramp,volume manufacturing and end-of-lifemanufacturing. To better understand ourmethodology, let’s first examine the med-ical-device design process.

Three-Phase Life Cycle The FDA has some strict regulations,requirements and guidelines for medicalelectronics, with the stated purpose ofensuring the safety of the general public.Within these regulations, the FDA hasstringent requirements for the life cycle of amedical device (Figure 1). Generally, elec-tronics companies must apply these regula-tions to any component, part or accessoryof a medical device; any software used inthe production of a device; and any soft-ware used in the implementation of thedevice manufacturer’s quality system, suchas a program that records and maintainsthe device history record.

One can divide the medical-device lifecycle into three main phases. The first is theearly product life cycle (see Figure 2). The

to sort out FPGA I/O. Once we under-stand the problem, we can design a solu-tion. For device development andprototyping, we reuse the math and signal-processing capabilities along with intuitivegraphical programming to develop newalgorithms. Then, using commercial off-the-shelf hardware, we verify the algo-rithm’s performance against real-worlddata. In many cases we use NI’s XilinxFPGA-based prototyping platforms forexperimental prototypes of the final device.Specifically, we use the LabVIEW Real-

least regimented of the phases, this is wherecompanies primarily focus on the researchand development of theories and ideas. Thisphase can last from a few weeks to manyyears, depending on the complexity of thesystem the company is trying to develop.

A fundamental part of the early prod-uct life cycle phase is data collection andanalysis. Researchers and product designspecification teams typically use manytools to help streamline this process (seesidebar). At this stage, HEI will often useNational Instruments’ LabVIEW product



CONCEPT

PROTOTYPE PR

EC

LINIC

AL

CLIN

ICA

L

MANUFACTURING

MARKETING

CO

MM

ER

CIA

L U

SE

OB

SOLE

SCEN

CE

SystemSpecification

GraphicalSystem Design

AlgorithmDesign

Logic Design

SoftwareImplementation

SoftwareTest

SystemIntegration

SystemTest

Figure 1 – Diagram of the medical-device design life cycle as defined by the FDA

Figure 2 – Applying virtual instrumentation and the HEI design method early in the product life cycle gives a clear understanding of the problem to be solved. Graphical system design helps us develop functional prototypes in record time.

The prototyping platform incorporates the same components as the final deployed system, reducing overall development time.

Time and FPGA modules, along with theNI CompactRIO to quickly iterate betweenthe algorithm design and the device proto-type stage. Using off-the-shelf hardware forprototyping shrinks the time-consumingstep of hardware development and integra-tion, and gives us more time to focus ondelivering a bulletproof software design.

The second phase of the medical-devicelife cycle can be called the middle productlife cycle (see Figure 3), addressing the pro-ductization, verification, validation andmanufacturing of the designed device. Thefocus in this phase is to develop well-defined specification documents that have

clear and measurable requirements. Oncethese specifications are defined, it is time todevelop a clear mapping between require-ment documentation and actual imple-mentation code.

In today’s complex medical-device mar-ket, customers must get to market as quick-ly as possible. Many companies try to dothis using the traditional “waterfall” devel-opment methodology, in which designgroups attempt to complete each stage ofthe design process before moving on to thenext step (Figure 4). The waterfall method-ology is highly dependent on having com-plete and accurate specifications at the

beginning of the project. However, in themedical-device market, more times thannot needs evolve with the development ofthe product. What’s required is a processthat takes this evolution into account.HEI’s scrum/sprint development process isthe answer to this problem.

In the scrum/sprint process, we requireonly a high-level system architecture andspecifications to start a project. We dividethe project into “sprints” of four to sixweeks in duration. Within each sprint, weidentify all tasks we think the process willrequire and place them on a “burn-down”list. At the beginning of each sprint, weallocate tasks to the team members in aplanning session. The team meets daily fora brief stand-up meeting called a “scrum,”in which each team member answers thefollowing three questions: “what did I com-plete yesterday,” “what will I completetoday” and “what obstacles are in my way?”The project manager, or “scrum master,”manages the burn-down list to trackprogress on a daily basis.

Figures 5 and 6 show schematics of theprocess. HEI’s companywide use of thescrum/sprint development process hasreduced our development times by 30 per-cent, allowing us to implement new prod-ucts months ahead of time.

Of course, medical-device productdevelopment is nothing if it does not com-ply with FDA and other regulatory require-ments. At HEI, we have a track record ofregulatory compliance going back 25 years.In fact, the FDA has audited ourscrum/sprint product development processand found it to comply with all elements ofthe agency’s Quality System Regulation(QSR). Table 1 summarizes the waterfalland scrum/sprint development approaches.

The third and final phase in the devel-opment of a medical device is the late prod-uct life cycle (Figure 7). Very littleengineering work is necessary in this phase,but customer feedback and market success-es help drive concept development of thenext-generation product, when the cyclestarts anew.

HEI is able to quickly iterate on futureproduct derivatives, thanks to thescrum/sprint product development process



FREQUENCY

RE

SO

LU

TIO

N

ACQUIRE ANALYZE PRESENT

SO

FTWARE VIEW

HARDWAR

E V

IEW

PR

EC

LINIC

AL

CLI

NIC

AL

MANUFACTURING

Requirements

Design

Implementation

Verification

Figure 3 – Software designs are typically validated and verified during the middle of the product life cycle.

Figure 4 – Traditional “waterfall” development process attempts to complete each stage before moving on to the next.


and our use of off-the-shelf FPGA-basedhardware as well as high-level FPGA soft-ware design tools that scale from research tomanufacturing. In many cases, we find thatwe can use the generic core architectures wedevelop in multiple products. For example,the same architecture for a pump controllerthat regulates an IV and medication flowpump can be used in another design projectthat controls a motor for administeringblood transfusions.

Why Hardware Trumps SoftwareTo employ this methodology effectivelyand further speed our design process, wehad to change the way we thought aboutdesigns, moving from a software-centric toa more hardware-centric approach. Asmany are aware, medical-device recallsreached an all-time high in 2008—up 43percent from 2007. Key causes, accordingto the experts at the FDA, involve two pri-mary issues: defects in manufactured rawmaterials and poorly developed software.Companies can monitor the quality oftheir raw materials fairly easily, but it canbe very tricky to address software quality.As the lines of code in devices continue tomushroom, the problem will only wors-en—a particular concern since the FDA’sconsumer safety branch says much of theburden of safety is now the responsibilityof the medical-device designer.

At HEI, we feel there is a potential solu-tion to this problem, but it’s not in moretesting, code reviews and process. Instead,we simply try to write less software and

push more of the logic into hardware ele-ments like Xilinx FPGAs. Let’s look atsome of the common causes of softwarefailure and how we are addressing theseissues with FPGAs.

Eliminating Deadlock Most modern devices need to be able tohandle multiple tasks simultaneously, yetmost embedded processors are still limitedto one processing core. This means theprocessor can execute only one instructionat a time. Meanwhile, parallel processes

aren’t much better, as they must still sharethe main CPU. In addition, other sharedresources such as communication drivers,hard disk and user interface elements pres-ent opportunities for deadlock—the con-dition that occurs when two or moreprocesses are waiting for one another torelease a resource.

Deadlock can be very difficult toreproduce and debug, because the situa-tion often relies on multiple processesand usually requires a specific and syn-chronized sequence of events to occur.Unit testing alone will not catch mostdeadlock issues; they are usually uncov-ered by code reviews, adept system testersor simply by luck.

With FPGAs, by contrast, “processes”that are independent have their ownphysical circuitry and therefore, there areno shared resources. On each clock tick,combinatorial logic latches in each circuitand stores values in separate registers. Nodeadlocking can occur because neitherprocess relies on the other’s resources.This allows you to put much more faithin the results of simulation and unit test-ing, since other unknowns like resourcecontention are no longer an issue.


DefineRequirements

and Burn-Down List

AdjustPlan

Verify

Build andIntegrate

Design

AnalyzeRisks

Scrum/SprintDevelopment Cycle

Scrum/Sprint Waterfall

QSR compliant Yes Yes

Speed in execution Yes No

Reduces program risk Yes No

Early verification of requirements Yes No

Easy to implement change Yes No

Able to proceed without complete requirements Yes No

Short development cycles Yes No

Overall time-to-market Much reduced Typical

Figure 5 – In the “scrum/sprint” process, only high-level system architecture and specifications are required to start the project. Development is divided into “sprints” four to six weeks in duration.

Each sprint follows a six-step development cycle, as shown.

Table 1 – The scrum/sprint methodology outperforms the waterfall approach to development.

Incompatible Middleware When developing embedded software,you almost never implement every line ofcode from scratch. Instead, various toolsare available to make the firmware design-er more productive. These range fromsimple drivers to network stacks to operat-

ing systems and even code-generationtools. Though these systems are generallywell tested individually, no software isbug-free. With so many possible combina-tions of tools and libraries, the likelihoodof using components together in a novelway is relatively high.

For this reason, the FDA mandates thatfor all off-the-shelf software used in medicaldevices, companies must validate that thesoftware stack works for each of their par-ticular design’s specific use case. What doesthat mean? If, for example, we are using asignal-processing library that contains afixed-point fast Fourier transform and weare detecting the presence of a certain fre-quency component, we do not need to vali-date that the FFT returns the correct answerfor all possible inputs. Rather, we need tovalidate that it returns what we expect for allvalid inputs according to specifications. Forexample, if we are detecting only tones audi-ble to humans, there is no reason to test thatthe function returns correct values forinputs over 20 kHz.

Unfortunately, software componentsthat seem independent are not necessarilyso. Therefore, if we are using that softwarestack with an SPI driver coupled with areal-time operating system (RTOS), weneed to validate all of these componentstogether to really have confidence in theFFT. If the FFT passes a valid output to theSPI driver, but the SPI driver crashes, thenclearly there is a problem. If we then decideto modify the SPI driver, we need to vali-date the entire software stack again. This


Design Tools Must be Reliable, Too

A n important part of HEI’s design methodology is our tool set. Because the FDA regulations are so stringent for med-ical devices, companies that develop medical products are wise to use tools that are mature (not buggy themselves) and

have a reputation for reliability. For example, among the many reliable tools we use in our flow is National Instruments’LabVIEW Real-Time and LabVIEW FPGA. We use this high-level programming tool to efficiently define a solution to agiven problem. Specifically, we apply LabVIEW FPGA to our design flow process to develop IP on accelerated time lines,with fewer development translation errors and no software rework. LabVIEW’s multicore characteristic helps in deployingcode to multiple processors.

We also leverage proven hardware building blocks such as NI’s Single-Board RIO hardware, which shares both a real-timeprocessor and a Xilinx FPGA. Because so many companies have used the RIO hardware and the underlying middleware driv-ers in a broad range of applications, we consider them very reliable. This gives us peace of mind and allows us to focus onmoving as much software as possible out of the processor and into the FPGA. It also allows us to concentrate on designingthe custom circuitry that a particular product requires.

Other off-the-shelf tools our design team relies on include IBM Requisite Pro for requirements management, the TigrisSubVersion configuration-management software for documents and source code tracking, the Seapine Test Track Pro fordefect tracking, SolidWorks for mechanical modeling and SolidWorks PDMWorks for mechanical-file management. ForPCB electronics design, we use the Mentor PADS Suite. – Chuck Russo


Sprint Backlog

Daily ScrumMeeting

Backlog TasksExpandedby Team

Product Backlogas Priortized by Product Owner

DemonstrableNew Functionality

24 Hours

Source: Adapted from Agile SoftwareDevelopment with Scrum by Ken Schwaber and Mike Beedle

Figure 6 – The team identifies a “sprint backlog” of work tasks for each sprint in the cycle, expandsand assigns them, and then reviews progress and removes roadblocks during a daily “scrum” meeting.

The team delivers product functionality to the customer at the end of each sprint.


can become very cumbersome, and onefaulty component can delay the entire sys-tem development. For this reason, at HEIwe reuse as much known-good and provenmiddleware and RTOS driver combina-tions as possible. We use, for example, themiddleware drivers that are a part of NI’sSingle-Board RIO platforms.

In addition to validating software ineach use case, we also need to validate allthe third-party intellectual property (IP)we are using in our FPGA-based design.However, once we have validated all of ouruse cases, we have confidence that the IPwill behave as expected when integratedwith other components. Let’s look at ourFFT example again. If we use an FPGA, wecan acquire or generate an FFT IP core andvalidate its numerical correctness for eachuse case—the same as with the software.However, the risk of intermittent failuredecreases drastically because there is noneed for all the processor middleware wewould have in a software-centric design. Assuch, there is no longer an RTOS, and theSPI driver is its own IP core—its operationdoes not directly affect the FFT.Furthermore, if we modify the SPI driver

implementation, we don’t need to revali-date the unaffected areas of the system.

Managing Buffer OverflowAnother example of how we use FPGAs tomitigate errors that typically occur in soft-ware-centric systems is in buffer overflowmanagement. Buffer overflow occurs whena program tries to store data past the end ofthe memory that you’ve allocated for thatstorage, and it ends up overwriting someadjacent data that it shouldn’t. This can bea really nasty bug to diagnose, since thememory that was overwritten could beaccessed at any time in the future, and itmay or may not cause obvious errors. Oneof the more common buffer overflows inembedded design is a result of high-speedcommunications of some sort, perhapsfrom a network, disk or A/D converter.When communications are interrupted fortoo long, buffers can overflow, so we needto account for this to avoid crashes.

We use FPGAs to manage buffer over-flow in two ways. In one example, we usethe FPGA to manage a circular or double-buffered transfer, where it can offload theburden from a real-time processor. In this

case, the FPGA serves as a coprocessor thatreduces the interrupt load on the mainprocessor. This is a common configuration,especially in high-speed A/D converters.

In a second example, we use the FPGA asa safety layer of protection, routing all of thepatient-facing I/O through the FPGA beforeit gets to the processor. In this case, you canadd extra safety logic to the FPGA so thatyou can place all your outputs in a knownand safe state in the event of a software crashon the processor. The FPGA serves as awatchdog and its logic ensures that risk tothe patient is lowered in the event of a soft-ware failure. By making the architecturaldecision to use an FPGA in your device’s pri-mary signal chain, you can use one or bothof these two methods to guard against bufferoverflow and to better handle the situation ifand when it does occur.

In fact, the more overall system func-tionality—software as well as hardware—we move into the FPGA, the faster we canexpedite our design process and ultimatelyvalidate our design running in the cus-tomer’s end product. The sooner we canvalidate the reliability of our design runningin the overall system, the sooner our cus-tomer can validate the entire end productand get it to the FDA for approvals. This ofcourse means our clients can then get theirproducts to the market faster, and in doingso improve quality of life or even save lives.

If we were to implement a design usingan ASIC process, we would have to waitmany months for a foundry to create thehardware. The extra steps of having to ver-ify the ASIC’s physical design, create masksand then manufacture the design createmore room to introduce errors and defects.If a design does pick up an error in any ofthese steps, the result can be significantdelays in getting products to market in atimely manner. Because FPGAs are alreadyfabricated and thoroughly tested, we onlyhave to worry about our design, our soft-ware and ensuring that our design runs tothe customer’s specifications. With ourscrum-and-sprint methodology, our hard-ware-centric mind-set, use of reliable toolsand choice of FPGAs over ASICs, we canmake a difference for our customers and, inturn, their customers—the patients.


FREQUENCYR

ES

OL

UT

ION

ACQUIRE ANALYZE PRESENT

SO

FTWARE VIEW

HARDWAR

E V

IEW

CO

MM

ER

CIA

L U

SE

OB

SOLE

SCEN

CE

MARKETING

Figure 7 – The late product life cycle takes the product to market, where feedback helps determine the features for the next generation of the device. This completes

the cycle and brings it back around to the concept phase.

by Agnès Faïn Senior Design EngineerWipro-NewLogic [email protected]

Wolfgang MerykSenior Product Marketing [email protected]

Every company designing a complex andhigh-speed intellectual-property (IP) corelike an IEEE 802.11 wireless LAN con-troller faces the challenge of how to provethe design is working and is of high quali-ty. A test chip is important, since only itcan show that the IP is silicon-proven. Buta test chip can capture the hardware designstate at only one point in time. However,IP needs to be flexible to support a varietyof configurations and applications. WLANbrings an additional requirement for flexi-bility and upgradability, since there are stillnew additions to market-relevant specifica-tions (Wi-Fi, ETSI/FCC) coming, andsome of them require or at least benefitfrom hardware changes.

At Wipro-NewLogic, we decided toinvestigate an FPGA-based platform tosolve this problem. We called the resultingproduct the WiLDSYS board, to reflect thename of our WLAN IP: WiLD a/b/g.

We formulated several key require-ments for this design. First, it had to sup-port running the WLAN IP at full speed(54-Mbit/second air data rate for 802.11aand 802.11g). Next, we needed a singledatabase for targeting both FPGA andASIC. Third, we wanted a very high levelof confidence that test results and bug fixesfrom the FPGA target could be applied tothe ASIC target. Finally, we needed toimplement the external interfaces to thehost and to the radio all on the FPGA, soas to minimize the number of externalcomponents and to leverage the program-mability of the FPGA to allow alternativeinterfaces in the future. During the courseof the project we found that the FPGAplatform not only met all these goals, buteven allowed us to take the WiLDSYSboard to a full Wi-Fi certification.

But let’s start at the beginning, with anoverview of the WiLDSYS design and theselection of the FPGA device.

WiLDSYS OverviewFigure 1 shows the block diagram of theWiLDSYS platform as we implemented it inthe original design. The WiLD Core to theleft comprises the 802.11 a/b/g media-accesscontroller (MAC) and modem hardware.The MAC functions are split over theWiLD Burst Processor, a buffer manage-ment unit/DMA engine and the WiLDStream Processor, which takes care of theRC4 and AES encryption. Both blocks serveas master on the central Advanced High-Performance Bus (AHB) and are not tim-ing-critical. The WiLD modem contains theCCK/DSSS and OFDM signal processingfor the 802.11 a/b/g standard. Signal pro-cessing designed for an ASIC is not easy tomap to an FPGA, and we will describe thechallenges and solutions in more detail later.

There is also a complete ARM7-basedprocessor platform that runs the Layer 2MAC software. We support both accesspoint and station mode in the software.


Implementing Wireless LANInterface in an FPGAImplementing Wireless LANInterface in an FPGAComplex, high-speed IP such as WLAN can run at full speed and even pass Wi-Fi certification when built on the platform of a Xilinx device.Complex, high-speed IP such as WLAN can run at full speed and even pass Wi-Fi certification when built on the platform of a Xilinx device.

XCEL LENCE IN WIRELESS COMMS

Placing an external ARM7 chip on theFPGA board is a help for customers whowant to use the board but are not ARMlicensees. We also added external SRAMfor applications that exceed the amount ofmemory we have inside the FPGA, as wellas a flash device for booting. We deployedan ample number of header connectors toaccess the FPGA I/O pins of all externalinterfaces and for debug. For futureupgradability, we added a footprint for asecond FPGA, though we have not hadcause to use it so far. Figure 2 is a photo ofthe WiLDSYS FPGA board.

FPGA ChoiceFor this design, we used a Xilinx® Virtex®-4FPGA. We selected this device family forits high speed and many specializedresources, including DSP blocks, internalmemories, flexible I/O pads and high-per-formance clocking support. We went forthe Virtex-4 LX200 FPGA for its size, sincethe WiLD IP including the platform wasalready in the area of 700k gates (equiva-lent to NAND2). Table 1 shows a utiliza-tion report of our design.

buffers anywhere in the FPGA, we obtainedmore options for mapping, placing androuting our design.

Challenges of a Single Database We knew that a validation done on FPGAwould be meaningful only if the results wererepresentative to the ASIC design. The codemapped to the FPGA had to follow any bugfix or new feature added to the ASIC code.In the other direction, any correction doneduring FPGA validation had to be reportedback into the ASIC design and fully simu-lated using the ASIC’s verification environ-ment. It was clear to us from the beginningthat we could handle all this in an efficientway only if we maintained a single designdatabase for targeting both FPGA andASIC. We quickly ran into several challengesin the implementation.

In some cases, the only solution was tomodify the ASIC code. We had logic pathsin our OFDM modem that did not meettiming at 80 MHz on the Virtex-4 device.So we split the combinational paths byinserting additional flip-flops. Thisincreased the overall latency of the modem,

The Virtex-4 FPGA supports a clockspeed of up to 500 MHz. We needed thatspeed to implement our custom High-SpeedSerial Interface toward the Wipro-NewlogicWLAN radio chip, which runs at 240 MHz.

Since one of our goals was to use a sin-gle database for both the ASIC and FPGAtarget, we had to map the ASIC code to theFPGA without optimizing it for the bestutilization of FPGA resources. Typically,ASIC code uses longer combinationalpaths, and we needed margin between theFPGA’s maximum speed and our design’shighest clock frequency. Xilinx has greatlyimproved the signal-processing blocks ofthe Virtex-4 FPGA, especially the multipli-ers. Dedicated blocks called DSP48 areoptimized for multiplication and addition.We took advantage of the DSP48 resourcesto run the 802.11 a/g modem at 80 MHzin the FPGA.

We were also motivated by the availabili-ty of global clock lines. The Virtex-4 LX200FPGA features 32 global clock buffers, witheight of them not constrained to a specificFPGA region. Since it’s possible to placeflip-flops linked to one of these eight clock


PLLFPGA

RC4 RAM512 Bytes

AES RAM256 Bytes

DiagSignals

FIFO -s-s

Interface

ARM7TDMIProcessor

BootROM

SystemRAM

DiagMuxes

ARM AHBBridge

AHB ROM

Interface

AHB RAM

Interface

WiLD BμP

WiLDModem

Radio Controller

Timer SPIMaster

ExternalFlash

InterruptController GPIO

RF I/O

Clock Controller

AHBMuxes

System Controller

Address Decoder &Memory Protection Registers

BusBridge

WiLD Core

WiLD Platform

External (off-chip) Interface

Internal (on-chip) Interface

Bus MasterBus Slave

BusArbiter

DefaultSlave

WiLD StreamProcessor

CardBusBridge


Figure 1 –Block diagram of the WiLDSYS design

but that was OK, since we still had sufficientmargin. We reported the modifications backto the ASIC database after we met timing onthe FPGA and ran a full regression test onthe ASIC target. After the ASIC synthesis,there was even an area benefit for the ASICimplementation of the modified design,since the relaxed timing allowed the ASICsynthesis tool to use a lower number ofbuffers and smaller combinational cells.

The clock controller is one of the mostcritical ASIC blocks, and also one of themost difficult to map on an FPGA.Ultimately we had to create a unique blockjust for the FPGA implementation. We setup the synthesis scripts for ASIC and forFPGA to select the right block for each tar-get. We used the Virtex-4 digital clock man-ager (DCM) resources to replace thephase-locked loop that would normally beused in a WLAN ASIC design. The DCMcreated the required 240-MHz frequencyfrom a 40-MHz or 50-MHz oscillatorinput, exactly as in the ASIC.

We were targeting the WiLD IP at low-power applications and therefore addedsupport for clock gating to the design,

along with separate power domains. Weknew we could not fully implement thetwo features in the FPGA, but we wantedto include at least the control and switch-ing of the clock and power domains. Withthe eight-region free global clock buffersof the Virtex-4 LX200 FPGA, we didn’thave sufficient resources to cover all theclock frequencies and clock-gating func-tionalities of the ASIC. We chose toimplement in the FPGA just the clockgating required for correct functionality ofthe system, and not the clock gating aim-ing only at saving power.

To ease the clock tree, we used theclock-gating conversion feature availablewith the Synplify Pro FPGA synthesis tool.The synthesizer removes the gating condi-tion from the clock line and places it onthe flip-flop enable input or on the datainput. We used BUFG and BUFGCEbuffers for clock gating so that the outputof this logic was still routed using the high-speed, low-skew global clock lines of theVirtex-4 FPGA. A BUFGMUX resourceprovided glitch-free multiplexing betweenactive mode and low-power clocks.

We implemented several voltage domainsin the WiLD design that could be switchedon and off at different times. We validatedthis complex and critical functionality withthe Virtex-4, even though the FPGA con-tains only one power domain. A domainthat got powered off must be reset at power-up. On the FPGA, we emulated the power-down state with a reset of the domain.Although this method does not completelyvalidate the power domain connections andmust be associated with other ASIC checksand verifications, we were able to spot func-tional errors in domain interconnections andreset generations, which we might not havedetected otherwise with the time-limitedASIC simulations.

The WiLD platform follows the AHBstandard, with four bus masters sharingaccess to peripherals such as memories andthe Advanced Peripheral Bus (APB) sub-system. We discovered that the critical partfor timing closure was the connection ofthe address and data lines. We overcamethis problem by mapping the CPU andmemories inside the FPGA, instead ofusing the external devices. For the latter wetook advantage of the large built-in mem-ory blocks of the Virtex-4 device, drastical-ly reducing our routing delays. TheSynplify Pro tool also helped, since it wasable to extract timing data from the mem-ory netlist files (EDN), rather than seeingthem only as black boxes.

Signal processing designed for an ASICis not easy to map to an FPGA, because itis optimized for area, with reuse of hard-ware modules on each clock cycle when thedata rate is slower than the clock. FPGAsynthesis goes around this constraint usingretiming and logic duplication. We usedthe Xilinx PlanAhead™ floor-planningtool to constrain critical blocks of theOFDM modem to a given FPGA zone inorder to reduce the routing delays insideand between these blocks.

External InterfacesThe WiLDSYS FPGA platform is flexibleand easy to adapt to changes in the exter-nal interfaces, thanks to the Virtex-4’s abil-ity to support different interface drivecharacteristics through its adapted pads.


RAM/ROM Utilization

Dual-port RAMs (RAM16x1D) 140

64x1 ROMs (ROM64X1) 12

256x1 ROMs (ROM256x1) 138

Number of block RAMs 211 of 336 62%

Logic Utilization

Number of slice flip-flops 52,943 out of 178,176 29%

Number of four-input LUTs 133,498 out of 178,176 74%

Logic Distribution

Number of occupied slices 83,217 out of 89,088 93%

Total number of four-input LUTs 137,384 out of 178,176 77%

Number of bonded IOBs 361 out of 960 37%

Number of BUFG/BUFGCTRLs 8 out of 32 25%

Number of FIFO16/RAMB16s 214 out of 336 63%

Number of DSP48s 17 out of 96 17%

Number of DCM_ADVs 1 out of 12 8%


Table 1 – WiLDSYS Virtex-4 LX 200 utilization report


We generated all interface signals with thecorrect electrical characteristics on theFPGA, without use of a single externalcomponent. We routed the interface sig-nals to header connectors so that all weneeded to do was to design the appropriatemechanical interface adapter.

In the original design we used CardBusas the host interface. We defined the FPGApads as “PCI 3.3V” and then linked themone-to-one to a CardBus connector. Whencustomers asked us to change the hostinterface to MII and SDIO, we leveragedthe Virtex-4’s adapted pads to also drivethese interfaces directly. We didn’t have tochange anything on the WiLDSYS boarditself. We simply designed new mechanicalconnector adapters for MII and for SDIO,and the WiLDSYS board was ready to usein our customers’ projects.

The same applied to the RF interface.We first designed the WiLDSYS board toconnect to the Wipro-NewLogic radiothrough our custom High-Speed SerialInterface. It uses differential LVDS driversand the maximum clock speed is 240MHz. We were able to implement thisvery demanding connection completely onthe FPGA thanks to the Virtex-4’s built-inLVDS pads. We packed a flip-flop insidethose pads to match the tight 240-MHztiming constraint on this data path. Whenwe wanted to port our IP to an analog I/QRF connection, all we had to do was tomap the digital I and Q signals to standardpads and to another header connector,where we then plugged in a daughterboardwith ADCs and DACs.

FPGA and ASIC Co-VerificationWith the single database for both ASIC andFPGA, we were ready to start our FPGA-ASIC co-verification. Having just one hard-ware database meant we also had to maintainjust one version of our WLAN MAC soft-ware for both targets. We could set up theFPGA board to either station or access pointmode and have it interact with other WLANnodes that were set up using commerciallyavailable equipment. We started the FPGA-based verification by running the completeASIC system test plan, which covered allfunctional features, including many of thetest cases from the Wi-Fi Alliance. Theseincluded tests for bandwidth sharing,encryption and quality of service.

Whenever we detected a problem on theFPGA platform, we first investigated it usingFPGA-specific means. The WiLDSYS board’slogic analyzer connectors offer a much broad-er debug interface than the ASIC, which isoften limited in pin count. We utilized GPIOsignals to map diagnostic signals to these con-nectors and probed internal FPGA signalswith the help of the Xilinx FPGA Editor. Thetool opens a graphical view of the routedFPGA design; designers can pick any internalwire and route it to an unused pad. FPGAEditor directly modifies the FPGA bitmap,avoiding the time-consuming steps of map-ping and routing the design again.

By probing relevant signals, we were ableto pinpoint the source of the problem withenough precision to reproduce it in the ASICRTL simulation testbench and then correctit. All we had to do now was to run a newFPGA synthesis and create a new bitmap file.

We ran a final verification of the fix on theFPGA board by repeating the test case spe-cific to the bug as well as performing aselected number of regression tests.

We then took advantage of having theWLAN IP run at full air data rate andconnecting through a real radio to anoth-er WLAN node to add test cases for stresstesting, including overnight testing. Weacquired the equipment from the Wi-FiAlliance test bed to cover all the interop-erability testing. We achieved significant-ly higher test coverage and hencerobustness of both the WLAN hardwareand the software. The highlight of all thetesting was when we submitted the FPGAboard to an official Wi-Fi lab for certifi-cation, and we passed not only the base-line test plan for 11 a/b/g, but also theoptional testing for quality of service(WMM), 802.11d and 802.11h.

We received the final proof that ourFGPA-ASIC co-verification is yielding avery high confidence for a successful ASICdevelopment when we started to use theWiLDSYS board in customer projects. Weran a complete FPGA-based verificationthat included extensive system-level and Wi-Fi testing before tapeout and achieved sever-al first-silicon successes.

Ultimately, we had to overcome severaldesign challenges to achieve our goals ofrunning the WLAN IP at full air data rateout of an FPGA and to rely on the FPGAplatform for most of the system-level verifi-cation. The Xilinx Virtex-4 FPGA and theXilinx tools made all this possible. In theend we gained with the WiLDSYS board anextremely valuable tool, not only for ourinternal development program, but also forcustomer projects.

This positive experience has led us touse the same Virtex-4 board with differentdaughterboards to develop our Bluetooth2.1EDR IP. And since there is stilldemand for 802.11 a/b/g, we are nowlooking into fitting the IP into a XilinxSpartan-6, since this new family is offer-ing the needed resources at a much lowercost than the Virtex-4. This opens a newmarket for us, since running wireless LANout of an FPGA will now become com-mercially feasible.


Figure 2 – A Virtex-4 FPGA sits at the heart of the WiLDSYS board.

by Leith JohnsonManaging MemberNokhu Systems [email protected]

Vendors of high-speed double-data-rate(DDR) SDRAMs commonly specify theirdevices by their peak data transfer rate. Forexample, if a given vendor dubs a device aDDR3-1600, it means the vendor hasspecified the SDRAM component willtransfer data at a peak rate of 1,600 mega-transfers per second.

While the devices can indeed reach theirspecified transfer rate, in practice they can’tsustain it for real-world workloads. That’sbecause row-address conflicts, data-busturnaround penalties and write recovery allcan degrade the device’s peak transfer rate.

To make matters worse, the negativeimpact from these assorted degrada-tions has increased in lockstep witheach new generation of fasterSDRAM. For this reason, memory

controllers employing simple in-orderscheduling algorithms will likely achieve

sustained throughput that’s substantiallybelow the specified peak. However, more-advanced reordering scheduling techniquescan overcome these degrading factors,ensuring that your memory will deliverexcellent sustained transfer rates for real-world applications.


Improving DDR SDRAM Efficiency with a Reordering ControllerVirtex-6 memory controller achieves excellent sustained transfer rates for real-world workloads.

XPERTS CORNER

With the Virtex®-6, Xilinx® has intro-duced the first reordering DDR SDRAMmemory controller optimized for FPGAs.The new controller enables Virtex-6 usersto capitalize on the improved capacity, per-formance and power efficiency of the lat-est-generation DDR SDRAM technology.

On the surface, a DDR SDRAM com-ponent is simply a read-write memory. Inreality, however, modern DDR SDRAMsare complex devices. DDR SDRAM con-trollers must generate very precise sequencesof addresses, commands and data whileobserving a myriad of timing requirements.High performance requires commandpipelining at the minimum allowed timings.

Ins and Outs of DDR SDRAM What is the nature of the degradations thataffect memory performance, and why doesincreasing the peak transfer rate exacerbatetheir impact?

As illustrated in Figure 1, a basic DDRSDRAM access has the memory controllersending a row address with the activatecommand to the memory, waiting for theRAS-to-CAS delay interval (defined as thenumber of clock cycles between the rowand column address strobes) and thensending the column address and either aread or a write command. Completing the

throughput will be relatively low, very sig-nificantly less than the peak transfer rate.

DDR SDRAMs are “banked,” or bro-ken into some number of equal-size, quasi-independent sections. DDR3 DRAMshave eight banks. Accesses to differentbanks may be overlapped. For example, ifthe workload contains a read to bank 1 fol-lowed immediately by a read to bank 2,the memory controller is allowed to sendthe row address and activate to bank 1,then the row address and activate to bank2. After the RAS-to-CAS delay interval,the controller can then send the columnaddress and command for the first readfollowed by the column address and com-mand for the second read, wait for theCAS latency and then transfer two seam-less data bursts. Given a workload thatsequentially steps through the banks, this

request requires waiting for the CAS-latency interval and then sampling datafor reads or supplying data for writes.When the data transfer is complete, thecontroller issues a precharge command toclose the active row. Following theprecharge interval, the controller mayissue another activate command.

After that, it may issue multiple read orwrite commands without a precharge-acti-vate sequence. This is commonly referredto as fast-page-mode access.

Fast page mode is very efficient, since itavoids the time-consuming and power-hun-gry activate-precharge sequence, but theaccesses must be to the same row address. Ifthe workload contains a pattern of accessesto different row addresses, then the activate-precharge sequence must take placebetween each access. In this case, sustained


XPERTS CORNER

ACT

B/RA

RD

B/CA

PRE

B/XBANK/ADDRESS

COMMAND

DATA

CK

ACT

0/ 0

RD

0/ 0

PRE

0/XBANK/ADDRESS

COMMAND ACT

0/ 1

RD

0/ 0

PRE

0/X

ACT

1/ 0

RD

1/ 0

PRE

1/X

DATA

ACT

0/ 0

RD

0/ 0

PRE

0/XBANK/ADDRESS

COMMAND ACT

0/ 1

RD

0/ 0

PRE

0/X

ACT

1/ 0

RD

1/ 0

PRE

1/X

DATA

CK

IN ORDER

OUT OF ORDER

Request order :

1. Read to bank 0, row 0, column 0



Figure 1 — Basic DDR DRAM read cycle

Figure 2 – Reordering to avoid page conflict

process may be extended in such a way asto sustain the peak transfer rate.

The best way to achieve good DRAMperformance is to manipulate the work-load such that it falls into these fast-pageor rolling-bank modes. Often, however,this is not possible. Processors typicallygenerate quasi-random workloads that arenot easily manipulated in this way.

If the workload contains adjacentaccesses to different row addresses on thesame bank (row-address conflict), fol-lowed by an access to a different bank, asimple in-order memory controller willserialize all three accesses, incurring a sub-stantial efficiency penalty due to theprecharge-activate for the second access.In contrast, a reordering memory con-troller is able to send the activate for thefirst access and then the activate for thethird access, overlapping these accessesand thereby improving efficiency.

Figure 2 illustrates this concept. Thepattern is shown executed in order and

then with the second request moved infront of the third. As you can see, thereordered sequence completes before thein-order sequence.

In addition to overcoming much of thedegradation associated with row-addressconflicts, a reordering memory controllercan also address degradation due to writerecovery and bus turnaround. At the end ofa write cycle, internally the DRAM is busyactually writing the data into the array.While adjacent write accesses can proceedat peak rate, a write access followed by aread access must wait for the write to com-plete in the DRAM array. This gives rise tothe write-recovery specification.

DDR SDRAMs utilize a bidirectionalshared-bus structure. This bus has thecontroller on one end and typically two tofour dual in-line memory modules(DIMMs) on the other end. The electricallength of this bus is about 5 inches. Thistranslates to a bus propagation time ofapproximately 1 nanosecond. Whenever a

bus driver stops driving, it propagates aglitch on the bus. Nominally, two buspropagation times are required for thisglitch to settle out.

Built into the DDR SDRAM protocol isa data preamble consisting of one DDRSDRAM clock or two unit intervals. Data isnot transferred during the preamble. Thepreamble can be suppressed if the same driv-er drives the data for two adjacent accesses.In fact, the preamble must be suppressed ifthe peak transfer rate is to be achieved.

The performance impact of thesewrite-recovery and data-bus degradationscan be significant. A simple in-ordermemory controller is at the mercy of theworkload. Workloads with alternatingread-write patterns will exhibit very sub-stantially less than peak transfer rates.

But a reordering memory controller isable to group accesses based on the busdriver. In a simple single-rank configura-tion, it groups reads and writes togetherand issues them in bursts. In a multirank


XPERTS CORNER

RD

0/ 0BANK/ADDRESS

COMMAND WR

1/ 0

DATA

BANK/ ADDRESS

COMMAND

DATA

CK

IN ORDER

OUT OF ORDER

Request order :


2. Write to bank 1, row 0, column 0


4. Write to bank 1, row 0, column 1

RD

0/ 1

WR

1/ 1

RD

0/0

WR

1/0

RD

0/1

WR

1/1

Activates and precharges not shown .

Figure 3 – Reordering to avoid write-recovery and bus-turnaround penalty

In addition to overcoming much of the degradation associated with row-address conflicts, a reordering memory controller can also deal with

degradation due to write recovery and bus turnaround.


configuration, it may organize accessessuch that reads for each rank are groupedand issued in a burst.

Figure 3 illustrates the impact ofrequest reordering to avoid bus-turn-around and write-recovery penalties. Inthis example, sequential reads to bank 0are interleaved with a sequential stream ofwrites to bank 1. Two DMA machinesreading and writing to memory could veryeasily generate this workload.

The in-order case loses considerablebus bandwidth with each changeoverfrom read to write and write to read. Thereordered case can avoid these penalties toattain significantly higher throughput.

Transaction reordering always raisesthe issues of data corruption and starva-tion. Promoting a write in front of a readto the same address corrupts the contentsof a memory and must be prevented.Endlessly deferring a read access leads tostarvation and must also be avoided.

Reordered data returning from thememory controller may be an issue forsome applications. It’s easy to rectify thisproblem by fitting the memory controller

with a reorder-back-to-issue order buffer.For some corner cases, reordering may

increase latency for some read accesses.However, since reordering improves effi-ciency and therefore reduces memory con-troller occupancy, the result is to improveaverage read latency.

Table 1 illustrates the progression ofDDR SDRAM technology. Some interest-ing specifications are provided for DDR1,DDR2 and DDR3. The speed grades cho-sen are intended to represent midlife,high-volume production.

The transfer rates are increasing geo-metrically, while the core device character-istics such as RAS cycle are improving at amuch slower rate or, in the case of writerecovery, not at all. The relative penaltynormalized to unit interval increases dra-matically with each generation.

Other DDR SDRAM parameters havesimilar scaling issues. The above threewere chosen for illustrative purposes.

In-Order and Reordering ModesLet’s look at two example workloads thatwill demonstrate the capability of the newVirtex-6 DDR SDRAM memory con-troller. Because this device is able to operatein both in-order and reordering modes, theimpact of the reordering is readily visible.

The first workload models two masters.One is a CPU executing a code sequencethat is stepping through a matrix. The“stride” of this stepping through thematrix generates a memory request patternthat targets a single bank, but increments

the row address with each request. This so-called “unfortunate stride” creates a streamof read requests to a single DRAM bank,but different row addresses. The othermaster is a DMA engine generating a sim-ple sequential stream of read accesses to adifferent DRAM bank.

The in-order and reordering cases willactually execute a different overall streamof requests. The row cycle time of theDRAM limits the CPU stream comple-tion rate. Meanwhile, the DMA workloadcan proceed independently. For the in-order case, the DMA stream becomes seri-alized with the CPU stream. For thereordering case, the DMA stream fills inthe otherwise unused DQ bus cycles withuseful work. Since the throughput of theCPU stream is limited by the DRAM rowcycle time, it won’t improve much.However, the DMA stream will execute ata higher rate when reordering is enabled.

The second workload models two DMAengines, one generating a sequential streamof reads, the other a sequential stream ofwrites. It is assumed that the memory con-troller receives the read and write requestson an alternating basis. To avoid other con-flicts, all reads target bank 0, row 0, and allwrites target bank 1, row 0.

We ran each workload with reorderingturned off, then reran them with reorder-ing turned on. We used default memorycontroller parameters, with the exceptionof turning off refresh, ZQ and otherDRAM maintenance functions.

Table 2 summarizes the efficiency,measured over a period of steady-stateoperation, for the two example workloads.We computed efficiency as the number ofunit intervals containing payload dividedby the total number of unit intervals inthe period. When reordering is turned on,the memory controller’s efficiencyimproves significantly.

The trend of nonscaling DDRSDRAM parameters is expected to inten-sify as peak data rates continue toincrease. Future memory controllers willneed to implement continuously improv-ing scheduling algorithms to extract theperformance potential of future DDRSDRAM generations.

XPERTS CORNER

DDR1-6 DDR2-25 DDR3-15E

Peak transfer rate (Megatransfers/s) 333 667 1333

Unit intervals (nanoseconds) 3 1.5 0.75

ns UIs ns UIs ns UIs

RAS cycle 60 20 55 37 49.5 66

Write recovery 15 5 15 10 15 20

2x tBUS_PROP 2.0 0.67 2.0 1.3 2.0 2.6

In-order Reordered

CPU-DMA streams 36% 76%

Alternating reads/writes 25% 63%

Table 1 – DDR SDRAM transfer rates are increasing geometrically; core device characteristics are not.

Table 2 – In two example workloads, memory controller efficiency improves when

reordering is turned on.

by Barry Dagan, P.E.Chief Technology OfficerCool Innovations [email protected]

In recent years, the functionality of cutting-edgeFPGAs has skyrocketed to levels never thought pos-sible. Unfortunately, the flip side of this progress isan increase in heat dissipation. As a result, designersneed more-effective heat sinks to provide sufficientcooling for these integrated circuits.

In response to this need, thermal-managementsuppliers have introduced a variety of high-perform-ance heat sink designs that provide greater cooling ina given volume. One of the more prominent tech-nologies to debut in recent years is the splayed-pinfin heat sink. Originally designed to cool FPGAs,these devices have several characteristics that makethem especially well-suited for use in commonFPGA environments.


Splayed Design Makes Pin Fin Heat Sinks Cooler Splayed Design Makes Pin Fin Heat Sinks Cooler New configuration brings in more air to keep up with FPGA cooling demands.New configuration brings in more air to keep up with FPGA cooling demands.

XP LANAT ION:FPGA 101

Better Cooling and Airflow ManagementSplayed-pin fin heat sinks consist of anarray of round pins embedded in a squareor rectangular base. The pins, which serveas the heat sink’s fins, are slanted out-ward, as shown in Figure 1. Due to theirunique physical structure, splayed-pin finheat sinks produce unprecedented cool-ing in the moderate- and low-airspeedenvironments for which they were opti-mized. Copper and aluminum versionshave footprints ranging from 0.54 x 0.54inch to 2.05 x 2.05 inches and may standless than a half inch to just over an inchhigh. These dimensions accommodateFPGAs of all sizes.

Splayed-pin fin heat sinks are a deriva-tive of traditional pin fin heat sinks, whichfeature vertical rather than slanted pins(Figure 2). To grasp the cooling character-istics of the splayed-pin variant, it’s usefulto start by considering the cooling attrib-utes of traditional pin fin heat sinks. Thelatter provide outstanding cooling per-formance, as indicated by their low thermalresistance. The thermal-resistance value,stated in degrees Centigrade per watt(°C/W), quantifies the temperature rise in°C (above ambient) for each watt thedevice dissipates.

Several characteristics are responsible forthe low thermal resistance of traditionalpin fin heat sinks. The round shape of thepins, the omnidirectional structure of thepin array and its large surface area, and thehigh thermal conductivity of the base andpins all contribute to the heat sink’s per-formance. Round pins offer less resistanceto incoming airstreams than square or rec-tangular fins. That, combined with theomnidirectional structure of the pin array,enables surrounding airstreams to easilyenter and exit the pin array.

The turbulence created when theairstream strikes the round pins furtherincreases the airflow. As a result of the lowresistance to airflow and turbulence with-in the pin array, the large surface area ofthe heat sink is exposed to a significantvolume of airflow.

The highly conductive alloys that areused to fabricate pin fin heat sinks alsocontribute to performance. Both tradition-

pin/fin density that is inherent in heatsink design.

For a heat sink to provide substantialcooling, it must possess sufficient surfacearea. Otherwise, with very limited surfacearea, the heat sink will not be able to dissi-pate much heat. At the same time, the moresurface area the heat sink features (that is,the more pins or fins it contains), the hard-er it will be for surrounding airflows toenter the pin array. And unfortunately,without the exposure to surrounding air-

al and splayed-pin fins are manufacturedin a forging process that permits use of theAL1100 and CDA110 alloys, which fea-ture higher thermal conductivities thanalloys used in other styles of heat sinks.

Splayed pins take the cooling perform-ance of traditional pin fins to an evenhigher level by increasing the spacingbetween the pins. To understand theimpact of the increased spacing on heatsink performance, we have to considerthe conflict between surface area and


Figure 1 – Splayed copper pin fin heat sink

Figure 2 – Traditional aluminum pin fin heat sink


flows, a heat sink will not be effective,regardless of its total surface area.

So, increasing surface area by packingfins closer together aids a heat sink’s cool-ing performance. Paradoxically, however, italso detracts from its performance byreducing the airflow. This is the inherentconflict suppliers face in creating any heatsink with vertically oriented fins.

But splayed-pin fins overcome this sur-face area-vs.-density conflict by bending thepins outward. This technique increases thespacing between the pins substantially for agiven footprint without giving up surfacearea. As a result, surrounding airstreams canenter and exit the pin array much more eas-ily. The surfaces of the heat sink are exposedto faster airspeeds and the amount of cool-ing the heat sink provides is much greater.This improvement is most beneficial atlower airspeeds, because the lower the air-speed, the more difficult it is for the sur-rounding air to enter a heat sink pin array.So the performance premium afforded bysplayed-pin fin heat sinks will be greatest inlow-airspeed environments.

An additional advantage of the splayed-pin design is its low pressure drop.Widening the gap between pins allows airto pass through the heat sink more easily. Sothe speed of the air exiting the heat sink iscloser to the speed of the air entering, whencompared with heat sinks on which pins arespaced more closely together. The impact ofpressure drop is especially important inboards that contain a large number of heatsinks and other components—a low pres-sure drop means more airflow is available tocool devices downstream from the fan.

We conducted two experiments to illus-trate the advantage designers can gain byswitching to the splayed-pin fin design.Both experiments compare splayed and tra-ditional pin fin heat sinks that feature thesame footprint, height, pin count, surfacearea and metallurgy. When looking at theresults, it’s important to note that tradi-tional pin fins are highly powerful by

nature and, on their own, are consideredone of the most efficient heat sink tech-nologies available today.

Experiment 1: Cooling a Single Advanced FPGA The power or heat dissipated by a givenFPGA is positively correlated with thenumber of gates it has. Naturally, themore gates in the FPGA, the more heat itwill dissipate. For our first experiment,we chose an advanced FPGA device thatdissipates 30 W in a 42.5-mm2 package.First, we powered up this FPGA with atraditional pin fin heat sink and took

temperature measurements to determinethe heat sink’s thermal resistance. Thenwe changed to a splayed-pin fin heat sink,repeated the measurements and recalcu-lated thermal resistance. Both heat sinksfeatured a footprint of 2.05 x 2.05 inch-es, a height of 0.6 inch and were com-

posed of the highly conductive AL1100aluminum alloy.

We ran this experiment three times, eachtime at a different airspeed. In the first test,the approaching airspeed was a moderate 400linear feet per minute (LFM); in the secondtest, we reduced airspeed to 200 LFM (a lowairspeed); and in the third test, we dropped itto 100 LFM (a very low airspeed).

The results, shown in Table 1, demon-strate outstanding cooling performance forboth the splayed and traditional pin fins.Nevertheless, the switch to the splayed designproduced a substantial premium in cooling

performance in the lower-airspeed environ-ments. While at 400 LFM the splayed heatsink was only slightly better than the tradi-tional model, at 200 LFM it was already 14percent more effective. And at 100 LFM, theimprovement was a dramatic 24 percentreduction in thermal resistance. These results


Heat sink style Thermal resistance (°C/W)

Heat sink 1 Heat sink 2 Heat sink 3

Traditional pin fin heat sink (model #3-202011U) 0.47 0.67 0.73

Splayed-pin fin heat sink (model #3-202011P) 0.44 0.52 0.58

Reduction in thermal resistance with splayed-pin fin 6.5% 29% 26%

Heat sink style Thermal resistance (°C/W)

@100 LFM @200 LFM @400 LFM

Traditional pin fin heat sink (model #3-202006U) 1.60 1.08 0.71

Splayed-pin fin heat sink (model #3-202006P) 1.29 0.95 0.68

Reduction in thermal resistance with splayed-pin fin 24% 14% 4%

Widening the gap between pins allows air to pass through more easily. So the speed of the air exiting the heat sink is closer to the speed of the air entering.


Table 2 – Heat sink performance when cooling three devices aligned with a fan producing 400 LFM of airflow

Table 1 – Heat sink performance when cooling a single FPGA dissipating 30 W at different airspeeds


demonstrate the increasing value of thesplayed-pin fin design at lower airspeeds.

Experiment 2: Cooling Multiple FPGAs Boards that contain a large number ofdevices present much more complex coolingchallenges than those equipped with a singleFPGA. That’s because the multidevice situa-tions require that surrounding airflows beshared among the devices on the board.When cooling multiple “hot” FPGAs, designengineers must consider not only the ther-mal resistance of the heat sink but also thepressure drop of each heat sink. The lowerthe pressure drop, the more air will be avail-able for cooling devices that are positioneddownstream from the air source.

Splayed-pin fins feature a lower pressuredrop than vertically constructed heat sinks,as there is more space for the air to enter andexit the pin array. To illustrate the dualadvantage (lower pressure drop and addedcooling) of splayed-pin fins in multi-FPGAenvironments, we constructed a simpleexperiment that involved putting three heatsinks on identical FPGAs placed in a rowbehind a fan. The fan pushed a given airflowand then we took the temperature measure-ments needed to determine heat sink ther-mal resistance. Each FPGA dissipated 30 W.We ran the experiment twice, once withthree traditional pin fins and then again withthree splayed-pin fins. The heat sinks weused measured 2.05 x 2.05 inches x 1.1 inchtall, and the fan was blowing 400 LFM in afree-air environment.

The results of this experiment demon-strate that the switch to splayed-pin fins on aboard with multiple heat sinks pays a hugedividend. The experiment showed a 26 to 29percent reduction in thermal resistance forthe second and third devices with the splayedheat sinks. This performance premium is aproduct of the lower thermal resistance of theheat sink and its lower pressure drop.

Looking ForwardAs heat loads dissipated by cutting-edgeFPGAs continue to escalate, designers willrequire even greater cooling performancefrom their heat sinks. In some cases, a passiveheat sink by itself will be insufficient anddesigners will be forced to adopt active heatsink solutions such as fan sinks, whichmount a fan directly on the heat sink. Intime, we can expect thermal-managementvendors to offer more and more fan sinksolutions.

One example of a new, high-performancefan sink solution is an integrated model thatembeds a fan within a very efficient pin finheat sink (Figure 3). Taking advantage of theadded turbulence caused by the round pinsand the large surface area achieved by the pinarray, this integrated fan sink attains out-standing cooling performance in a very low-profile package that can fit into ATCA andPCI Express applications.

Until the fan sink comes into widespreadcommercial use, however, designers will gainan edge by using the splayed-pin fin heatsink in their FPGA-based designs.

Figure 3 – Integrated copper fan sink, the next step in cooling


Xcell Online

Visit

www.xilinx.com/publications/csi/subscribe.htm

and fill in your email address, country where you are located and your role and we’ll notify you when

a new issue of Xcell is available.


Want to be one of the first toread the latestissue of Xcell?

by Karsten TrottField Application EngineerXilinx GmbH (Munich, Germany)[email protected]

There are many algorithms in the world thatcan be changed into pure hardware to accel-erate your processor. An average standard-deviation algorithm, creation of minimum ormaximum values in a given time range, filtersand FFTs are typical algorithms that can bemoved into hardware without a big effort.But even more-unusual algorithms such asbit reversing can make the transition with theright hardware accelerator.

Xilinx’s flexible embedded systems make iteasy to attach hardware accelerators to anFPGA-based solution. Given the high com-putational power of FPGAs, such a systemcan easily outperform any standard processor,controller or DSP.


Supercharge MicroBlaze Processingwith Hardware AccelerationSupercharge MicroBlaze Processingwith Hardware AccelerationDeft design around MicroBlaze core creates powerful systems that can outperform standard controllers or DSPs.

Deft design around MicroBlaze core creates powerful systems that can outperform standard controllers or DSPs.

ASK FAE -X

The MicroBlaze™ processor, one ofthe two 32-bit cores Xilinx provides in theEmbedded Development Kit (EDK), is anapt vehicle for implementing hardwareacceleration. A typical MicroBlaze designlooks like the one in Figure 1—the coreitself contains a 32-bit multiplier, but nofloating-point unit (FPU), barrel shifter orspecial hardware accelerators. ForSpartan® FPGA devices, the default sys-tem is set to contain an area-optimizedversion of the MicroBlaze (using a three-stage pipeline), but most customers willoften start with the speed-optimized ver-sion (with a five-stage pipeline) to do per-formance evaluations. It is small andsimple, but can easily be extended.

A couple of real-world examples that ourcustomers were requesting on that processordesign will show the power of theMicroBlaze route to hardware acceleration.We will focus on Spartan devices to demon-strate that we can achieve price/performanceeven if we compare FPGA solutions to stan-dard controller cores. But the same tech-niques apply to Virtex® FPGAs as well.

Implementing a Bit-Reverse Algorithm For the first example application, let’sassume that the MicroBlaze processor isrunning at only 50 MHz. This is quiteeasy to achieve in any Spartan-3 orSpartan-6 device. All internal buses, suchas local memory buses (instruction anddata LMB) as well as the processor local

algorithm for a single 32-bit word on aMicroBlaze that was optimized for speed(using the five-stage pipeline). To executethe 20,000 required operations tookapproximately 88 milliseconds on our 50-MHz MicroBlaze.

The customer now tried to optimize thealgorithm by using a slightly differentapproach, still implemented as a pure soft-ware solution:

unsigned int v=value;

unsigned int r=0;

if (v & 0x00000001) r |= 0x80000000;

if (v & 0x00000002) r |= 0x40000000;

if (v & 0x00000004) r |= 0x20000000;

if (v & 0x00000008) r |= 0x10000000;

if (v & 0x00000010) r |= 0x08000000;

if (v & 0x00000020) r |= 0x04000000;

if (v & 0x00000040) r |= 0x02000000;

if (v & 0x00000080) r |= 0x01000000;

if (v & 0x00000100) r |= 0x00800000;

if (v & 0x00000200) r |= 0x00400000;

if (v & 0x00000400) r |= 0x00200000;

if (v & 0x00000800) r |= 0x00100000;

if (v & 0x00001000) r |= 0x00080000;

if (v & 0x00002000) r |= 0x00040000;

if (v & 0x00004000) r |= 0x00020000;

if (v & 0x00008000) r |= 0x00010000;

if (v & 0x00010000) r |= 0x00008000;

if (v & 0x00020000) r |= 0x00004000;

if (v & 0x00040000) r |= 0x00002000;

if (v & 0x00080000) r |= 0x00001000;

if (v & 0x00100000) r |= 0x00000800;

if (v & 0x00200000) r |= 0x00000400;

if (v & 0x00400000) r |= 0x00000200;

if (v & 0x00800000) r |= 0x00000100;

if (v & 0x01000000) r |= 0x00000080;

if (v & 0x02000000) r |= 0x00000040;

if (v & 0x04000000) r |= 0x00000020;

if (v & 0x08000000) r |= 0x00000010;

if (v & 0x10000000) r |= 0x00000008;

if (v & 0x20000000) r |= 0x00000004;

if (v & 0x40000000) r |= 0x00000002;

if (v & 0x80000000) r |= 0x00000001;

return r;

bus (PLB), are likewise running at 50MHz. For simplicity’s sake, let’s assumeno external DDR memory is attached.

Now imagine a customer who wants toimplement a bit-reverse algorithm on thatCPU. The MicroBlaze itself does not pro-vide this functionality directly by hardware.Let’s assume further that the bit reversinghas to be done 20,000 times per second.

To solve this problem, most customerswill start first with a pure software imple-mentation, since that’s the easiest way toachieve the desired functionality. And if theperformance is sufficient, then there is noneed to change anything.

To that end, let’s start with a small andsimple solution implemented as a simplesoftware algorithm. It turns out to be small,nice and easy to understand—but quiteinefficient:

unsigned int v=value;

unsigned int r = v;

int s = sizeof(v) * CHAR_BIT - 1;

for (v >>= 1; v; v >>= 1)

{

r <<= 1;

r |= v & 1;

s--;

}

r <<= s;

return r;

This implementation worked quitewell, but it took 220 cycles to run the


ASK FAE -X

Figure 1 – Typical MicroBlaze design using only embedded memory

That code is much longer, but it is moreefficient. Code execution on the standardMicroBlaze took 130 cycles for a single 32-bit word. This is quite an improvement,but still too long. The whole executiontime for the 20,000 operations is nowroughly 52 ms.

Then the customer did some researchon the Internet to find a better algorithmand came upon this one:

unsigned x = value;

unsigned r;

x = (((x & 0xaaaaaaaa) >> 1) | ((x

& 0x55555555) << 1));

x = (((x & 0xcccccccc) >> 2) | ((x

& 0x33333333) << 2));

x = (((x & 0xf0f0f0f0) >> 4) | ((x

& 0x0f0f0f0f) << 4));

x = (((x & 0xff00ff00) >> 8) | ((x

& 0x00ff00ff) << 8));

r = ((x >> 16) | (x << 16));

return r;

This code looks quite efficient and evensmall. And it does not need any branchingthat might break the pipelining. It runs onthe core system with only 29 cycles.

However, the algorithm makes use ofshift operations of 1, 2, 4, 8 and 16 bits.Let’s activate the barrel shifter in the prop-erty window of the MicroBlaze, as seen inFigure 2. The barrel shifter allows us toexecute shift instructions in a single cycle,

independent of the length of the shiftoperation. It makes it possible to run thepure software algorithm slightly faster onour MicroBlaze.

Activating the barrel shifter in theMicroBlaze hardware reduces the numberof cycles it takes to process the algorithmto 22. This is quite an improvement fromthe first version of the software algo-rithm. It now takes the algorithm onlyabout 8.8 ms to run the whole 20,000times, a tenfold improvement—but stillnot enough for the customer.

This is where the pure software solution,even if supported by parameter changes ofthe MicroBlaze core, runs out of steam. It’sthe point when a pure hardware solutionneeds to kick in, via a new approach thatmakes full use of hardware accelerators.

To accelerate such a basic operation,only a very simple core need be attached toa MicroBlaze Fast Simplex Link (FSL). Thestandard FSL implementation uses an FSLbus, including synchronous or asynchro-nous FIFOs, to transmit data from theMicroBlaze core to the FSL hardware accel-erator intellectual property (IP). The FSLbus with the FIFOs decouples the accessesfrom both sides.

If the standard implementation with anFSL bus including FIFOs is used, then thetypical execution time is four cycles: onecycle to write the data from the MicroBlazeto the FSL bus into the FIFO, one cycle totransfer the data from that FIFO to the FSLIP, one cycle to transfer the result back fromthe FSL IP to the FSL bus FIFO and onecycle to read the result from the FSL businto the MicroBlaze.

The connection from the MicroBlaze tothe FSL bus and from the FSL bus to theFSL IP is easy to create in the graphicalview of the EDK, as seen in Figure 3.

There is still room for improvement,however. The latency in the algorithm is crit-


ASK FAE -X

Figure 2 – Activation of the barrel shifter in the MicroBlaze properties

Figure 3 – Connecting FSL hardware using the FSL bus


ical and should be as small as possible. Butour implementation with the two FSL busesis still requiring four clock cycles. We canhalve the latency, to only two clock cycles,by changing this connection to a direct linkbetween the MicroBlaze and the hardwareaccelerator. Now it takes just one cycle towrite the data to the FSL hardware accelera-tor IP and one cycle to read the result back.

There are a few things to take care ofwhen using direct links. First, thecoprocessor IP should store the inputsand provide the results in a registeredway. Note that there is no FSL bus withFIFO to handle that operation.

Moreover, running the MicroBlaze andthe FSL IP at different clock speeds is quitetricky in this case. To avoid conflicts,designers would be best served by runningthe two at the same speed.

But how to connect the MicroBlaze andthe FSL IP directly without an FSL bus inbetween? That’s quite easy to do—just con-nect the data lines of the MicroBlaze andthe hardware accelerator. Then attach thehandshaking signals when required.

For the example with the bit-reversingIP, only a write signal is required—the IP isalways fast enough to react to any requestsfrom the MicroBlaze.

The IP itself is quite simple. Here is anextract of the VHDL code:

architecture behavioral of

fsl_bitrev is

-- data value sent by microblaze:

signal data_value :

std_logic_vector(0 to 31) := (oth-

ers=>'0');

begin

-- bitreversed value to write

back:

FSL_M_Data <= data_value;

process(FSL_Clk)

begin

if rising_edge(FSL_CLK) then

if (FSL_S_Exists = '1') then

-- create the bitreversed data:

data_value(0) <= FSL_S_Data(31);



...



end if;

end if;

end process;

end architecture behavioral;

To attach that IP without any FSL bus inbetween, you must make the followingchanges to your project’s MHS file:

BEGIN microblaze

...

PARAMETER C_FSL_LINKS = 1

...

PORT FSL0_S_EXISTS = net_vcc

PORT FSL0_S_DATA = FSL0_S_DATA

PORT FSL0_M_DATA = FSL0_M_DATA

PORT FSL0_M_WRITE = FSL0_M_EXISTS

PORT FSL0_M_Full = net_gnd

END

BEGIN fsl_bitrev

PARAMETER INSTANCE = fsl_bitrev_0

PARAMETER HW_VER = 1.00.a

PORT FSL_S_DATA = FSL0_M_DATA

PORT FSL_S_EXISTS = FSL0_M_EXISTS

PORT FSL_M_Data = FSL0_S_DATA

PORT FSL_M_Full = net_gnd

PORT FSL_Clk = clk_50_0000MHz

END

Now the game has changed dramati-cally. The hardware core will execute thebit-reverse operation in only two cycles—one cycle to write data to the IP, one cycleto read it back. The processing of all20,000 bit-reverse operations now takesonly 0.8 ms.

This is an improvement over the very firstalgorithm of 110 times. To put it anotherway, it would require a processor running atabout 5.5 GHz to run the application in thesame amount of time using the first algo-rithm—quite impressive by any measure.

Comparing the approach with the last,most effective software algorithm, the sys-tem is still 11 times better. Otherwise, itwould require a CPU running at 550 MHzto execute the same algorithm in pure soft-ware (assuming you have a barrel shifter inyour CPU).

This example, of course, is valid only ifyour CPU does not provide bit-reverseaddressing at all. Most DSPs have that func-tionality, but virtually no microcontrollersdo. And having the capability to add suchfunctionality can dramatically speed up theprocessing of such an algorithm.

The change was small, but quite effi-cient. We even reduced the code size toonly two words. Of course, some addition-al slices are required for the hardware now.But this trade-off was worth the speedup ina solution that outperforms any standardmicrocontroller.

Speedy Floating-Point PerformanceIn another example of the power ofMicroBlaze acceleration for algorithms, acustomer claimed that his floating-pointprocessing was running extremely slowly onthe MicroBlaze system. The algorithm hada simple loop to create a few results at once.

for (i=0;i<512;i++) {

f_sum += farr[i];

ASK FAE -X

The hardware core will execute the bit-reverse operation in only

two cycles—one cycle to write data to the IP, one cycle to read it back.

The processing of all 20,000 bit-reverse operations now takes only 0.8 ms.

f_sum_prod += farr[i] * farr[i];

f_sum_tprod += farr[i] *

farr[i] * farr[i];

f_sqrt + =

sqrt(farr[i]);

if (min_f > farr[i]) { min_f =

farr[i]; }

if (max_f < farr[i]) { max_f =

farr[i]; }

}

All values were single-precision float-ing-point values. The first idea that cameto mind was the most basic question ofall: Is the floating-point unit activated?Examining the project settings, we foundout that the FPU was still disabled. That’swhy it took forever to calculate just thesefew numbers. Activation of the FPU takesplaces inside the MicroBlaze property set-tings (see Figure 4).

There are two versions of the FPUsupport. We chose Extended FPU to sup-port the sqrt() operation too. The wholeloop for 512 values took now 1,108,685cycles on our 50-MHz MicroBlaze. Just aquick look into the generated assemblercode showed that the math-lib function-ality was still being used to create thesquare root. That is defined in the mathfunctionality as:

double sqrt(double);

But the customer used the square-rootfunction only to process floating-point val-ues. For that reason the MicroBlaze FPUdefined a new function to work aroundthat function, resolving the issue:

float sqrtf(float);

Changing the line f_sqrt +=sqrt(farr[i]); to f_sqrt += sqrtf(farr[i]);

caused the usage of the FPU internalsquare-root functionality inside theMicroBlaze. Now the code ran in only35,336 cycles. Once again, we achieved

this improvement of 31 times by means ofonly small changes, especially comparedwith the first implementation, which didnot use the FPU at all. You would need a


ASK FAE -X

Sometimes small changes can have a dramatic e f fec t in the processing of your algori thm. And implement ing those

small changes can help your 50-MHz MicroBlaze system compete against a high-performance DSP.

Figure 4 – Activation of extended FPU support

Figure 5 – System using a parallel floating-point accelerator


CPU running at roughly 1.5 GHz to createthose results in the same execution time.

But the customer was still not satis-fied; he was looking for even more speed.In this case, changing the algorithm fromfloating- to fixed-point arithmetic wasnot a suitable option. So instead, we cre-ated a new, special hardware accelerator(new FSL IP) to speed up the processingof the loop.

The new FSL IP utilizes the COREGenerator™ module floating-point_v4_0to create nine instances for the operations4x ADD, 2x MUL, 1x GREATER, 1xLESS and 1x SQRT. All of them areinstantiated and working fully in parallelon the same input data (Figure 5).

The instances in the FSL IP are creat-ed with some internal latency, but athroughput of 1. This required someadditional slices for the controller hard-ware inside the accelerator, but makes itpossible to feed the coprocessor with newdata every clock cycle.

Only at the end of the processing loopare some additional cycles required beforethe results can be retrieved.

We connected the MicroBlaze to theFSL IP using a direct link—FIFO wasnot required. All data that is transferredwill be buffered inside the IP andprocessed immediately afterward.

The link from the FSL IP back to theMicroBlaze was created using an FSL bus.This was easier to achieve, because wehave to send back some results—andthat’s simpler to do inside the IP. Some ofthe CoreGen modules have some latencythat is being added to the execution timeand fully covered by the getfsl() call. TheMicroBlaze just waits until all results arestored into the FSL bus FIFOs. But this isno problem as long as the data rate is 1 toachieve the required throughput.

The additional delay of the FSL busdoes not cost too much (a few cyclesonly). The C code to use the FSL IP lookslike this:

for (i=0;i<512;i++) {

putfsl(farr[i],fsl0_id);

}

// get the min,max values:

getfsl(min_f,fsl0_id);

getfsl(max_f,fsl0_id);

// get the sum and products:

getfsl(f_sum,fsl0_id);

getfsl(f_sum_prod,fsl0_id);

getfsl(f_sum_tprod,fsl0_id);

getfsl(f_sqrt,fsl0_id);

The final implementation of that algo-rithm required only about 4,630 cycles.And it is still a full floating-point imple-mentation.

The hardware calculates all results inparallel at the cost of additional slices thatare used to implement the hardwareaccelerator. But in the end, we gain animprovement of factor of about 7.6 com-pared with the extended FPU implemen-tation. Otherwise, replacing this 50-MHzprocessor with a standard processorwould require a CPU running at roughly380 MHz (assuming it had a floating-point sqrt function in hardware).

Even more dramatic is the compari-son to the original version utilizing theFPU, but with the sqrt() function used:an overall improvement of approximate-ly 239 times. This would require a float-ing-point processor running at roughly12 GHz.

As these examples show, sometimessmall changes can have a dramatic effectin the processing of your algorithm. Andimplementing them can help your 50-MHz MicroBlaze system compete againsta high-performance DSP.

First, identify the core algorithm thattakes too long to execute, then try to speedit up—either by simple changes in soft-ware, by the use of hardware or by morecomplex changes using hardware accelera-tors. And your processor system will alwaysoutperform any standard controller.

Karsten Trott is a Xilinx FAE in Munich,

Germany. He holds a PhD in analog chip

design and has a strong background in chip

design and synthesis. You can reach him at

[email protected].

ASK FAE -X

GetPublished

Would you like to be published

in XcellPublications?

It's easier than you think!

Submit an article draft for our Web-based

or printed Xcell Publications and we will

assign an editor and a graphic artist

to work with you to make your work

look as good as possible.

For more information on this

exciting and highly rewarding program,

please contact:

Mike Santarini

Publisher, Xcell Publications

[email protected]

XAPP864: SEU Strategies for Virtex-5 Deviceshttp://www.xilinx.com/support/documentation/application_notes/xapp864.pdf

Single-event upsets have the potential to affect most digital elec-tronic circuits. Xilinx takes the issue of SEUs seriously, and designsits devices to have an inherently low susceptibility to these radiation-caused events. Because Xilinx also recognizes that SEUs are unavoid-able within commercial and practical constraints, the companyprovides built-in SEU detection in the Virtex®-5 and ExtendedSpartan®-3A families to simplify and improve the system design.

An application note by Ken Chapman and Les Jones discussesstrategies and representative calculations for handling SEUs with anemphasis on reliability when addressing these low-probability events.An accompanying reference design is optimized for use with theVirtex-5 FPGA ML505 evaluation platform, but can port to otherhardware. You can use its SEU controller macro in any Virtex-5 FPGAdesign to implement an SEU detection and correction scheme.

Due to the infrequent and unpredictable nature of real SEUs,small-scale testing of their effects is impractical, as is system verifica-tion. For this reason, the SEU controller macro and reference designcan emulate an SEU by deliberately injecting an error into the FPGAconfiguration so that its subsequent detection and correction can beconfirmed. You can also use the technique of injecting errors to assessSEU mitigation circuits implemented in a design.

XAPP1137: Linux Operating System Software Debugging Techniqueswith Xilinx Embedded Development Platformshttp://www.xilinx.com/support/documentation/application_notes/xapp1137.pdf

In this application note involving Linux issues, author Brian Hill dis-cusses debugging techniques for the Linux operating system, includingdebugging boot issues, kernel panics, software and hardware debug-gers, driver-application interaction and various other tools. This appli-cation note, which includes a reference system built for the XilinxML507 Rev A board, is best suited to users who are comfortable con-figuring, building and booting Linux on a Xilinx embedded platform.

XAPP1140: Embedded Platform Software and Hardware In-the-FieldUpgrade Using Linuxhttp://www.xilinx.com/support/documentation/application_notes/xapp1140.pdf

New features and bug fixes often necessitate upgrading flash images toreplace the existing FPGA bitstream, boot loader, Linux kernel or filesystem. It’s a challenge to provide a convenient mechanism withwhich end users can perform this task. This application note by BrianHill discusses an in-the-field upgrade of the Virtex-5 FXT bitstream,Linux kernel and loader flash images, using the presently runningLinux kernel. Upgrade files come from either a USB mass-storagedevice, using the XPS USB host core, or over the network, from anFTP server. The running Linux image performs the flash upgrade.The note includes a reference design and an example methodology.

XAPP1141: The Simple MicroBlaze Microcontroller Concepthttp://www.xilinx.com/support/documentation/application_notes/xapp1141.pdf

A small microcontroller programmed in C or C++ can be more effi-cient than doing the same function in HDL. An application note byChristophe Charpentier provides a new way to easily add a simpleMicroBlaze™ microcontroller to an FPGA design without havingto learn about new tools. This small-form-factor 32-bit microcon-troller based on the MicroBlaze processor is instantiated directlyinto the HDL. You can use it immediately in a standard FPGAdesign flow without special scripts or complicated steps. You needonly three files to get started.

XAPP1136: Integrating a Video Frame Buffer Controller in System Generatorhttp://www.xilinx.com/support/documentation/application_notes/xapp1136.pdf

This application note by Douang Phanthavong and Jingzhao Oudescribes how to integrate an embedded processor system with theXilinx Multi-Port Memory Controller (MPMC) and Video Frame


Application NotesApplication NotesIf you want to do a bit more reading about how our FPGAs lend themselves to a broad number of applications, we recommend these notes. If you want to do a bit more reading about how our FPGAs lend themselvesto a broad number of applications, we recommend these notes.

XAMPLES . . .

Buffer Controller (VFBC) IP cores in System Generator for DSP. Youcan then develop custom logic inside System Generator, attach it tothe imported processor system and, as the authors show, build a pow-erful validation platform through System Generator’s hardware co-simulation capability. Combined with the MPMC, the VFBC is anideal solution for applications such as motion estimation, video scal-ing, on-screen displays and video capture used in video surveillance,videoconferencing and video broadcast products.

Integrating an off-the-shelf double-data-rate or DDR2 SDRAMmemory controller into a system is a time-consuming task. Verifyingthat the design runs correctly is also not trivial. The MPMC is a para-meterizable memory controller that supports SDRAM, DDR andDDR2 memory access. It provides access to various types of externalmemory by using one to eight ports, each of which can be configuredfrom a set of Personality Interface Modules. The VFBC is one of thesePIMs. It is unique in that user IP or other interface circuits can readand write data in two-dimensional sets regardless of the size andorganization of the external memory. This allows you to access theexternal memory without having to know all the details about thecomplicated external memory-access protocols.

The note emphasizes tool and design flows rather than the tech-nology details of the IP cores. The intent is to show you how to con-nect the MPMC and the VFBC inside System Generator using theXilinx ML506 hardware platform, although you could apply thesame design flows and methodologies to other boards. An examplevideo application—a simplified version of the reference design cur-rently available in the Xilinx Video Starter Kit—illustrates how toexercise and validate the VFBC using existing System Generator andPlatform Studio integration.

It is crucial to provide seamless, transparent and easy-to-use toolflows when designing a complex system around these popular IP cores.Fortunately, the well-integrated flows between System Generator andPlatform Studio allow you to quickly put together a system in a matterof hours or days instead of weeks or months.

XAPP458: Implementing DDR2-400 Memory Interfaces in Spartan-3A FPGAshttp://www.xilinx.com/support/documentation/application_notes/xapp458.pdf

With their requirement for low-cost, high-bandwidth memory,high-end consumer products demand high-performance DDR2memory interfaces. Xilinx integrates a Memory Interface Generator(MIG) in the CORE Generator™ software for ultimate design flex-ibility and ease of use. This free, user-friendly tool is designed to cre-ate memory interfaces in unencrypted RTL. The MIG supportsmultiple memory architectures across a variety of FPGA selections,providing the flexibility you need to easily customize your designs.

In this application note, author Eric Crabill discusses the DDR2-400(200-MHz) memory interface that Xilinx has validated in Spartan-3AFPGAs with the higher speed grade. The validation results also apply toSpartan-3AN and Spartan-3A DSP FPGAs with the same speed grade.This DDR2-400 interface is derived from the MIG’s default output.

The design is fully verified in hardware using Spartan-3AFPGAs assembled on Spartan-3A Starter Kits. The validation effortincludes characterization at different process corners, as well astemperature and voltage variations that meet commercial-graderequirements.

XAPP940 (Updated): Using Xilinx CPLDs as Motor Controllershttp://www.xilinx.com/support/documentation/application_notes/xapp940.pdf

Different electronic systems use many types of motors, each with itsown advantages. Standard DC motors are often found in automotiveas well as consumer applications and even toys. Brushless DC motorsare commonplace in power tools and, along with standard DCmotors, can operate in the medium to high speed range. AC induc-tion motors are often found in white goods and transportation appli-cations, and these can operate at very high speeds. On the lower endof the operating-speed range are stepper motors. These models havea huge variety of uses in the consumer and industrial markets,including a variety of positioning systems along with printers, scan-ners, plotters, disk drives, fax machines and medical equipment.

This application note shows how to use a Xilinx CPLD as a sim-ple stepper motor controller. The Xilinx CoolRunner™-II CPLDis the perfect device with which to control stepper motors, with theadded benefit that it can be reprogrammed to accommodate a dif-ferent motor if the system specifications change.

XAPP1121: Reference System – Optimizing Performance in PowerPC 440 Processor Systemshttp://www.xilinx.com/support/documentation/application_notes/xapp1121.pdf

The PowerPC® 440 processor block has many performance advan-tages compared with previous embedded solutions. This applica-tion note by James Lucero discusses how to measure and optimizeperformance in PPC 440 systems, an important factor in ensuringthe system is optimally built.

The author describes a reference design that can improve systemperformance in the PowerPC 440 Processor Block on the Virtex-5FXT FPGA while optimizing performance for embedded DMAsolutions. In the system, he shows how to connect the XPS CentralDMA master interface to the Processor Local Bus v4.6 on eitherPLB Slave 0 or PLB Slave 1. He then modifies the XPS CentralDMA parameters for the DMA engine to allow for parallel readsand writes through the crossbar. For HDMA, Lucero discusses set-ting threshold values for interrupts and changing addresses in mainmemory for buffer descriptors and transmit/receive buffers. In this,he connects a simple loopback core to the LocalLink interface onone HDMA. In addition, he includes performance cores for PLBv.4.6 master interfaces and HDMA to the design to measure systemperformance before and after system optimizations.

The note includes two standalone software applications todemonstrate DMA transactions for XPS Central DMA andHDMA to DDR2.


XAMPLES . . .

ISE Design Suite: Logic Edition

Ultimate productivity for FPGA logic designLatest version number: 11.3Date of latest release: September 2009Previous release: 11.2URL to download the latest patch: www.xilinx.com/download

Revision highlights: This latest release of the ISE DesignSuite: Logic Edition supports the newVirtex®-6 HXT FPGA platform. TheVirtex-6 family of devices delivers theindustry’s highest-bandwidth FPGA withup to 72 serial transceivers for applica-tions such as bridging, switching andaggregation in wired telecommunicationsand data communications systems.

ChipScope™ Pro: The Integrated Bit ErrorRatio Tester (IBERT) 2.0 now supports theSpartan®-6 LXT FPGA family.

iMPACT: Reading and programming ofeFUSE registers is now supported fordevices in the Spartan-6 family, and eFUSEsupport has been extended to the Linuxoperating system in addition to the 32-bitversion of Microsoft Windows.

ISE Simulator (ISim): File names in the ISimconsole now link to the associated file. Also,ISim users now have the ability to clear theconsole for greater ease of use.

PlanAhead™ design analysis tool: PlanAheadnow supports the creation of DCI Cascadegroups and membership editing. A newSSN Predictor is available when targetingVirtex-6 FPGAs. In addition, newPlanAhead interface enhancements allowusers to label pin rows in the package viewwhen zooming. Additional DRCs are avail-able for designs targeting Virtex-6 andSpartan-6 FPGAs.

Xilinx Power Analyzer (XPA): The XilinxPower Analyzer provides greater ease of usewith new capabilities to interrupt the poweranalysis process, support for bus reconstruc-tion in the I/O view and the ability to selectand edit multiple cells within the interface.

ISE Design Suite: Embedded Edition

An integrated software solution for designingembedded processing systemsLatest version number: 11.3Date of latest release: September 2009Previous release: 11.2URL to download the latest patch: www.xilinx.com/download

Revision highlights: All ISE Design Suite editions include theenhancements listed above for the LogicEdition. The following enhancements arespecific to the Embedded Edition.

Xilinx Platform Studio (XPS) and the EmbeddedDevelopment Kit (EDK): For command lineusers, Platform Flash XL is now supported,as is contract-based IP licensing. XPS hasadded Clock Generator support for Virtex-6 and Spartan-6 FPGA families as well asClock Wizard support for the Multi-portMemory Controller (MPMC) for theVirtex-6 and Spartan-6.

ISE Design Suite: DSP Edition

Flows and IP tailored to the needs of algo-rithm, system and hardware developersLatest version number: 11.3Date of latest release: September 2009Previous release: 11.2URL to download the latest patch: www.xilinx.com/download

Revision highlights: All ISE Design Suite editions include theenhancements listed above for the LogicEdition. The following enhancements arespecific to the DSP Edition.

Support for the Virtex-6 HXT FPGA platform:This latest release of the ISE Design Suitesupports the new Virtex-6 HXT FPGA plat-form, delivering the industry’s highest-band-width FPGA with up to 72 serialtransceivers for applications such as bridg-ing, switching and aggregation in wiredtelecommunications and data communica-tions systems.

System Generator for DSP: This tool nowsupports the following new devices: Virtex-6 FPGA Lower Power (Virtex-6-1L),Virtex-6 HXT FPGA and Virtex-5QFPGA. System Generator for DSP also pro-vides support for JTAG hardware co-simu-lation for the Spartan-6 FPGA SP605Development Platform.

New operating system support: SystemGenerator for DSP has expanded its OSsupport to include Microsoft WindowsVista Business 32-bit (English), Red Hat


Xilinx Tool & IP UpdatesXTRA, XTRA

Xilinx is continually improving its products, IP and design tools as it strives tohelp designers work more effectively. Here we report on the most current updatesto the flagship FPGA development environment, the ISE® Design Suite, as well asto Xilinx IP, as of September 2009. Also, look for new tools, IP and developmentboards from Xilinx partners in the Tools of Xcellence section of this issue.

Product updates offer significant enhancements and new features to three versionsof the ISE Design Suite: the Logic, Embedded and DSP editions. Keeping yourinstallation of ISE up to date is an easy way to ensure the best results for yourdesign. Updates to the ISE Design Suite are available from the Xilinx DownloadCenter at www.xilinx.com/download. For more information or to download afree 30-day evaluation of ISE, visit www.xilinx.com/ise.

Enterprise Desktop Linux 5.2 (32- and64-bit) and SUSE Linux Enterprise 10(32- and 64-bit).

Blockset enhancements:

• DDS Compiler 4.0 is now available inSystem Generator for DSP with a newoption that makes it possible to use theblock as a phase generator or SIN/COSlookup table only. This capability allowsyou to customize the direct digital syn-thesizer to fit your individual applicationneeds. The spurious-free dynamic range(SFDR) has been increased from 120 dBto 150 dB. Also, new options include anability to configure DDS using system-level parameters (SFDR, frequency reso-lution) or hardware parameters (phaseand output width); and to trade offXtremeDSP™ slice usage for maximumperformance. Another option allowsdesigners to configure phase incrementand phase offset as constant, program-mable or dynamic for modulation.

(Note: This block supersedes the DDSCompiler 3.0 block. DDS Compiler version4.0 is not bit-accurate with respect to earli-er versions. Also, latency of phase offseteffects has been balanced with the latency ofphase increments for ease of use in thestreaming modes. This change also applies toexisting programmable and fixed modes.)

• CIC Compiler 1.3 is now available inSystem Generator for DSP, supportingVirtex-6 and Spartan-6 FPGA plat-forms. An input and output streaminginterface has been added for multiple-channel implementations. Also new isthe capability to specify hardware over-sampling specification as a sample peri-od and to leverage the oversamplingfactor to optimize resource utilization.

(Note: This block supersedes the CICCompiler 1.2 block. The RATE_WE signalno longer acts as a reset to the core; the corewill update to the new rate on the next inputsample, for a single-channel implementation,or the next input to the first channel, formultiple-channel implementations.)

• Upsample Block has added a newlatency parameter.

LAY blocks customized to the user’s inter-face requirements.

• Virtex-6 FPGA GTH Transceiver Wizardv1.1 – Generates a custom wrapper thatconfigures one or more Virtex-6 FPGAGTH transceivers according to userrequirements. In addition, it produces anexample design, testbench and scripts toallow you to observe the transceivers oper-ating under simulation and in hardware.

Video and image processing:

• Video On Screen Display v1.0 – Asophisticated module that provides threehardware-accelerated functions includingmultiple alpha-blending layer composi-tors, simplified graphics processing unit(boxes) and simplified text processingunit for video systems.

• Video Direct Memory Access v1.0 –Allows video cores to access external mem-ory via the Video Frame Buffer Controllerwithin the Multiport Memory Controllerunder control of the host processor.

Communications and networking:

• RXAUI v1.1 – The Xilinx Reduced Pin 10Gigabit Attachment Unit Interface(RXAUI) LogiCORE IP provides a two-lane high-speed serial interface, deliveringtotal throughput of up to 10 Gbits/sec-ond. Operating at an internal clock speedof 156.25 MHz, the core includes theDune Networks RXAUI implementation,the XGMII Extender Sublayers (DTE andPHY XGXS) and the 10GBASE-X sublay-er, as described in clauses 47 and 48 ofIEEE 802.3-2005. The core supports anoptional serial MDIO management inter-face for accessing the IEEE 802.3-2005clause 45 management registers.

Enhancements to the CORE Generator:

• The CORE Generator now checks for IPlicense availability before proceedingthrough the process of core generation.

• Automated core upgrade to latest-versioncapability has been added for the followingIP cores: CIC Compiler v1.3, DDSCompiler v4.0, Distributed Memory v4.2and Multiplier Generator v11.2.

• Additional IP: Xilinx has upgraded a num-ber of blocks to support the latest version ofthe LogiCORE™ (with no change in blockfunctionality). The Multiplier LogiCOREv11.2 leverages speed and area optimizationfor LUT implementation. Also, BlockMemory Generator v3.3, DistributedMemory Generator v4.2 and FIFOGenerator v5.3 have been upgraded to sup-port Virtex-6 Lower Power (Virtex-6-1L),Virtex-6 HXT and Virtex-5Q FPGA.

Xilinx IP Updates

Name of IP: ISE IP Update 11.3Type of IP: All

Targeted application: Xilinx develops IP coresand partners with third-party IP providers todecrease customer time-to-market. The pow-erful combination of Xilinx FPGAs with IPcores provides functionality and performancesimilar to ASSPs, but with flexibility not pos-sible with ASSPs.Latest version number: 11.3Date of latest release: September 2009URL to access the latest version: www.xilinx.com/download

Informational URL: www.xilinx.com/ipcenter/coregen/updates_11_3.htm

Release notes: www.xilinx.com/support/documentation/user_guides/xtp025.pdf

Installation instructions: www.xilinx.com/ipcenter/coregen/ip_update_install_instructions.htm

Listing of all IP in this release:www.xilinx.com/ipcenter/coregen/11_3_datasheets.htm

Revision highlights: Starting with 11.1, all ISECORE Generator™ IP Updates are bundledwith quarterly ISE software updates. The lat-est versions of IP products have been testedand are delivered with the current IP release.Support for Virtex-6 HXT, Virtex-6 LowerPower and Virtex-5Q have been added toselected cores in this release. There are anumber of new cores in this release, asdescribed below.

FPGA features and design:

• Spartan-6 SelectIO™ Interface Wizardv1.1 – Generates an HDL file that containsI/O logic such as IOSERDES and IODE-


XTRA, XTRA


Tools of XcellenceTools of XcellenceNews and the latest products from Xilinx partnersNews and the latest products from Xilinx partners

TOOLS OF XCELLENCE

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

Whether you are creating products that employ advanced audioand video for consumer systems, the automotive market, pro-

fessional recording studios or even next-gen AV systems for stadiums,the newly formed AVnu Alliance wants to help you link all these prod-ucts together and run them efficiently and harmoniously on the wide-ly deployed IEEE 802.1 Layer 2 networks—without having to payproprietary network royalties.

AVnu Alliance’s president and chairman, Rick Kreifeldt, saidadvanced audio and video are found in an ever-growing numberof products in an expanding number of markets, driving the needfor these devices to be able to share bandwidth efficiently in a sin-gle network.

“According to a recent consumer electronics study, there are 27 ormore devices in the home that could be networked for some kind ofaudio and video,” said Kreifeldt, who is the vice president ofHarman International. Because all these devices have to be synchro-nized, and must share data and band-width, they are susceptible to signalslowdowns, pops, hisses and even freezesas more products in the home or evenneighborhood try to access data andstream video on residential networks.Professional audio and video recordingand stadium-size AV systems are especial-ly vulnerable to these problems, Kreifeldtsaid, because in many cases they transferand share huge amounts of compressedand uncompressed data.

To get around these issues, many ven-dors face a build-or-buy decision. Vendorscan license (for a fee) proprietary networkssuch as CobraNet from Cirrus Logic, or they can build their own cus-tomized backplane/network and develop their own protocols to inter-connect components. A third way would involve establishing a base setof protocols running on an already established network standard thatall AV-related manufacturers can support—without paying royalties.The approach could greatly facilitate new-product introduction andsend those products into new markets.

That was the idea that impelled the formation, in late August, ofthe AVnu Alliance. Its goal is to promote an open network standardfor time-synchronized, low-latency streaming services through 802.1networks. Founding members include Avid Technology Inc.,Broadcom Corp., Cisco Systems Inc., Harman International, IntelCorp., Marvell, Meyer Sound Laboratories, Samsung Electronics Co.and Xilinx Inc.

The AVnu Alliance members will support and promote the adop-tion of the open IEEE 802.1 Audio/Video Bridging (AVB) standard,along with the related IEEE 1722 and IEEE 1733 networking stan-dards, combining them as a broader networking standard for nearlyall products that receive, play, record and transfer audio/video data.

To further refine the specifications, the AVnu Alliance is collabo-rating with various IEEE working groups to form foundational stan-dards. As of September, all were in draft form, with expectedcompletion in 2010-11. In particular, the IEEE 802.1 AS specifica-tion defines a master clock that all 802.1 AVB-compliant compo-nents can reference to synchronize their playback and recording.

The IEEE 802.1 QAT task group defines a simple reservation pro-tocol. This protocol allows applications on endpoint devices to notifythe various network elements and other connected devices to reservea particular amount of bandwidth for the stream.

Similarly, the IEEE 802.1 QAV task group is defining queuing andforwarding rules to ensure the stream can pass through the networkefficiently and within the latency specified when creating bandwidthreservation.

Finally, the IEEE 802.1BA standard specifies inter-relationsbetween the protocols and conditions for AVB systems.

“We expect the AVB protocols to be fully ratified in 2010 and2011,” said Kreifeldt. “It is our goal to havecompliance and interoperability ready uponratification.”

Amy Chang, IP marketing manager atXilinx, said Xilinx became an active memberof the AVnu Alliance and its board of direc-tors so as to provide customer solutions thatopen up new opportunities in segment mar-kets from automotive to professional to con-sumer. Xilinx silicon solutions bring thecustomization and flexibility needed toaddress the customer requirements, she said.

In fact, Xilinx was the first company torelease an FPGA-based Ethernet AVB

Endpoint last year. The reprogrammabilty of the FPGA allows cus-tomers to keep up with the latest changes in the specification andenables customization as the IEEE ratifies each phase of the standard.The core features time synchronization and the low-latency queuingand forwarding implementations per the 802.1 AS and 802.1 QAVspecifications. It is available now. More information can be found athttp://www.xilinx.com/products/ipcenter/DO-DI-EAVB-EPT.htm.

Chang notes that Xilinx and its IP partner network have served thewired and wireless network infrastructure, automotive and profes-sional broadcast markets for many years. They offer a broad set of IPcores supporting multiple open and proprietary networks and proto-cols serving a broad range of vertical applications. “Xilinx is not goingto dictate which protocol customers must use,” said Chang.“Ultimately, our customers decide which networks and protocols arebest. We support those customer preferences.”

To learn more about the AVnu Alliance and how it relates to vari-ous vertical markets using AV equipment, read the following whitepapers at www.AVnu.org: “AVnu for Home/CE,” “AVnu forProfessional A/V” and “AVnu for Automotive.”

TOOLS OF XCELLENCE


AVnu Audio/Video Network Alliance Promises Interoperability

TOOLS OF XCELLENCE

Mixed-signal semiconductor and IP signal-compression vendorSamplify Systems Inc., a Xilinx Alliance partner, has

released version 3.0 of its Prism decompression algorithm forimplementation in FPGAs.

The Santa Clara, Calif., company announced itself to the worldin 2008 and proceeded to garner a slew of trade publicationawards and customer accolades for user-modifiable compres-sion/decompression technology targeting a range of markets,including ISM and communications, that require the transfer oflarge amounts of data via high-speed I/O.

Samplify’s technology is largely composed of two elements: a16-channel A/D data converter compression IC that runs a 65-megasample decompression algorithm containing 16 channels in12 bits. Users typically implement this algorithm in an FPGA.

“Users can set the technology to get anywhere from four-to-one [in lossless mode] or two-to-one [in near-lossless mode]compression, or even run it without compression and still seebest-in-class results because it is solow power,” said Allen Evans, vicepresident of marketing at Samplify.“The closer compression is to theanalog domain, and the closerdecompression is to the softwaredomain, the better.”

Evans says Samplify’s technologyis ideally suited for medical ultra-sound equipment and 4G commu-nications. “As the number ofsmart-antenna systems increases in4G systems, so do the number ofdata converter channels,” saidEvans. “What’s unique about thisproduct is that it has signal com-pression built in on the back end.With each of these data convertersnow generating nearly 1 Gbit ofdata apiece, the compression solvesnot only the amount of data that needs to get out of the chip butthe number of LVDS pairs to get it off the chip. At four-to-onecompression, we need four LVDS pairs instead of 16.”

To further refine the overall offering, Samplify has now releasedversion 3.0 of the Prism algorithm. Where earlier versions were sym-metrical, sharing the compression and decompression burden even-ly between the IC (running compression) and the FPGA (handlingdecompression), Evans said that with version 3.0, the company hasalso optimized the algorithm to make it asymmetrical. That puts lessof the decompression burden on the FPGA. As a result, he said, thealgorithm still runs optimally in terms of performance and signalquality, but uses nearly half the FPGA logic resources required bythe previous, symmetric version.

At the same time, the company also optimized the encodingformat for software decoding, resulting in what it says is animprovement of 200 percent in decompression performance onx86-based CPUs or graphics processing units.

“I can say that the vast majority of our customers are imple-menting the decompression on FPGAs,” said Evans.

With Prism 3.0, the company is also introducing a new com-pression mode called SignalTrak, which helps maintain signalquality for pulsed signals common in ultrasound, sonar and radarapplications. In these cases, a signal pulse or burst contains bothstrong and weak reflections over very short intervals, on the orderof 100 microns. Evans said SignalTrak is a better alternative totraditional feedback gain control techniques, which often adaptto signal levels too slowly to reduce the pulsed signal’s dynamicrange without losing signal quality (for weak signals) or overload-ing (for strong signals). To get around this problem, SignalTrak’sredundancy remover automatically adapts to the signal, which

means users no longer have to program any signal characteristicsinto Prism.

“It adapts the signal, tries to widen it and then uses a dataencoder and various feedback loops to control how the compres-sion operates,” said Evans. “It has a fixed-rate mode calledRateTrack that monitors packet size and adjusts the parameters upfront to try to maintain that packet size.” SignalTrak also has afixed-quality mode, where it tries to maintain a specific qualityrange by adjusting the decompression ratio, he said.

Samplify Prism 3.0 is available now in netlist format for Xilinxand other FPGA devices. Prices start at $25,000 for a developmentlicense plus royalties, which are application dependent. For moreinformation, visit www.samplify.com.


Samplify Tips New Version of Prism Compression/Decompression Technology

USB-in-an-FPGA specialist Opal Kelly (Portland, Ore.) hasreleased a USB Integration Module based on the Xilinx

Virtex®-5 FPGA. The XEM5010 boasts more memory and I/Othan previous offerings, along with an API for a PC interface.

Since its founding in 2004, Opal Kelly has specialized in field-ing Xilinx-FPGA-based USB modules to help design teams devel-op USB-based systems, prototype IP and ASICs that use USB orsimply integrate the modules into a growing number of applica-tions, including retail and in-house test equipment, consumer sys-tems, medical devices, communications gear, R&D and education.

According to founder and president Jake Janovetz, OpalKelly has seen great success with these modules, especially itsSpartan®-3 FPGA-based XEM3010 product.

“A lot of our customers are interfacing to LVDS signals, high-speed cameras and ultrasound equipment as a low-cost emula-tion environment for IP or ASIC development,” said Janovetz.“Now they are running the FPGAs faster and harder, and needa bit more memory bandwidth, so we’re offering a Virtex-5-based system.”

The XEM5010 runs a Xilinx Virtex-5 XC5VLX50FPGA. Witha footprint of 3.35 x 2.4 x 0.69 inches, the system features 256-mebibyte DDR2 SDRAM using two completely independent128-Mbyte DDR2 SDRAMs (with a total memory bandwidth of2.128 Gbytes/second or 17.024 Gbits/s), 32 Mbits of serial flashand a high-speed USB 2.0 interface for downloading and control.

Janovetz said the “secret sauce” for Opal Kelly customers is thecompany’s FrontPanel software interface, which is an API for com-munication, configuration and interfacing to the PC. The

FrontPanel API works with Microsoft Windows XP and Vista, MacOS X and Linux, and with a variety of languages including C, C++,C#, Ruby, Python and Java. Opal Kelly includes the API softwarefree with its modules.

Because the XEM5010 is based on the Virtex-5 device, designerscan also program the system with the Xilinx WebPack tool suite,available at no charge from Xilinx. To facilitate overall system veri-fication, Opal Kelly also offers the simulation models for the HDLin the XEM5010, along with DLL support for a variety of third-party tools, including The Mathworks’ MATLAB® and NationalInstruments’ LabVIEW.

Among the early customers of the XEM5010 is SamplifySystems. Its engineers and sales team use the module for a variety oftasks, including IP development, regression testing and salesdemonstrations of Samplify’s Prism compression/decompressiontechnology (see related article, previous page).

Janovetz said that among the many end uses for the modules,more and more customers are tapping them for hardware-assistedverification. “We didn’t set out to do emulation, but some of ourcustomers were doing smaller design projects that couldn’t justifythe purchase of big, expensive emulation systems,” he said. “Andso they use our modules for regression testing and even simula-tion runs.” As a result, Janovetz said, “we are looking into sup-porting it as a hardware-assisted verification tool."

The XEM5010 modules are available immediately, off the shelf,from the Opal Kelly Online Store. Prices start at $1,500, with vol-ume discounts available. For more information on the XEM5010and Opal Kelly’s other offerings, visit www.opalkelly.com.

TOOLS OF XCELLENCE


Opal Kelly Releases Virtex-5-Based USB Integration Module

Opal Kelly’s latest USB Integration Module, thememory-rich Virtex-5-based XEM5010, comeswith an API for a PC interface.

TOOLS OF XCELLENCE

Anew version of the Active-HDL multilanguage simulatorfrom Aldec Inc., priced at $1,995, claims to run simulation

twice as fast as FPGA vendor-supplied RTL simulators. According to Dave Rinehart, vice president of marketing at the

Henderson, Nev., company, the Active-HDL Designer Editiontargets a price point between FPGA vendor, single-language simu-lation offerings that typically sell for less than $1,000 and multi-language simulators from third-party simulation tool vendors,which generally start at $6,000 for a single-node license.

It’s fairly common today for FPGA designers to integrate a mixof Verilog and VHDL IP blocks into their designs. Indeed, somedesigners—specially those who until recently have focused onASIC work—may even develop or use blocks in more-advancedlanguages such as SystemVerilog or SystemC.

While there are a lot of translators out there, it still behoovesdesigners to use simulators that can natively handle a mix of themost common languages. Aldec’s Active-HDL Designer Editionincludes mixed-language simulation support for VHDL, Verilog

and SystemVerilog design files. On top of that, the tool alsoincludes VHDL and Verilog encrypted IP and Xilinx SecureIP sup-port, and puts no performance limitations on FPGA design size.

The product is a pared-down version of Aldec’s Active-HDLline, which runs many times faster than Active-HDL DesignerEdition and packs several more features, including support forSystemC, code coverage, design rule checks, DSP modeling andverification. Rinehart said the company put a performance gover-nor on the Active-HDL Designer Edition, opting to omit thosehigh-end features to target the needs of mainstream design and ver-ification engineers. Winnowing out some of the more-exotic fea-tures also eases the time and cost burdens on Aldec’s support staff.

Rinehart believes that many designers will be so impressed withwhat they see in Designer Edition, they’ll upgrade to the full ver-sion of the tool and reap the rewards of its advanced feature set.

At a price point of $1,995 for a node-locked license and $2,495for a floating license, the simulator is certainly worth a test drive.To do so, visit www.aldec.com/DesignerEdition.


Aldec’s Multilanguage Simulator Takes Aim at Mainstream Users

Aldec’s Active-HDL Designer Edition targets a niche between single-language vendor offerings and more-expensive multilanguage simulators from third-party tool suppliers.

Innovative Integration Inc. has released the Virtex-5 version of itsXMC I/O module family, the X5 G12 series, featuring dual chan-

nels performing 1-Gsample/second 12-bit digitizing, 512 Mbytes ofDDR, additional SRAM memory and an eight-lane PCI Expresshost interface.

“Typically, our customers are systems companies that will takeour cards and integrate them into their larger systems,” said DanMcLane, co-owner and head designer of the 30-employee SimiValley, Calif., company. “They’ll use our card, like the X5 G12, as afront end to do digitizing for applications like radar or as 3G/4Gwireless receivers.”

For the X5 G12 module family, McLane said, InnovativeIntegration uses the Xilinx Virtex-5 SX95T or LX155T plus 512Mbytes of off-chip DDR2 in addition to 4 Mbytes of QDR-II mem-ory. The combination allowed his team to implement a very high-performance DSP core in the FPGA,giving the X5 G12 modules real-time signal-processing rates exceed-ing 300 GMACs/s. The architectureincludes tight integration with aneight-lane PCI Express interface thatprovides sustained transfer rates tothe host of better than 1 Gbyte/s. Assuch, the module family is well suit-ed for a broad number of digitizing-centric applications.

Innovative also does some verycreative things with Xilinx cores. Forexample, McLane’s group layered apacket system on top of the XilinxPCI Express core to allow its cards tosustain the 1-Gbit/s rates of a work-station, enabling users to do high-speed data logging with the system.The modification also allowed thecompany to field an 8-Tbyte datalogger. “We spend a lot of time onthe hardware so that the hardware isdoing a lot of the heavy lifting, andwe also spend a lot of time on systemintegration features,” said McLane.“That way our customers, who may not be all that familiar with logicdesign, can focus most of their time on optimizing at the systemlevel—software and algorithms—instead of logic.”

He points out that the Virtex-5 devices make it possible for usersto customize (or, for a fee, have Innovative’s team customize) theboard at a hardware level using VHDL as well as at an algorithmlevel using The Mathworks’ MATLAB, or at a software level usingtheir language of choice. Innovative also offers its own tool suite,called FrameWork Logic, which takes some of the bite out of mod-

ifying the logic in the cards and provides an organized frameworkfor modifying algorithms and software. To a certain degree, McLanesaid, FrameWork drives the Xilinx ISE® Design Suite tools. Buteven users unfamiliar with logic design should be able to implementthe modifications they want, if they are persistent. If not, Innovativeis willing to offer a helping hand.

Along with its modules, Innovative also offers users its ownFPGA-targeted IP as well as access to third-party cores to help themore HDL-savvy customers further customize the modules to theirrequirements. For example, for the X5 G12, Innovative offers cus-tomers a software-defined radio core that provides from 16 to 4,096display data channels, essentially allowing users to turn the X5 G12modules into receivers.

Innovative also provides with its modules software tools for hostdevelopment as well as libraries and drivers for Windows and Linux.

App notes and example configurations of the module, including onefor logging A/D samples to disk, are included as well.

And because Innovative offers a host of other cards developedsince its founding in the mid-1980s, the company supplies carrieradapters to cool the card and the overall system it’s going into. TheX5 G12 card also plugs into the company’s eInstrument EmbeddedPC, SBC-ComEx Single-Board Computer and Andale Data Loggers.

For more information on the X5 G12, visit www.innovative-dsp.com/products.php?product=X5-G12.

TOOLS OF XCELLENCE


Innovative Integration Launches X5 G12 XMC I/O Module Family

The Virtex-5 version of Innovative Integration’s XMC I/O module family has two 1-Gsample/schannels and 12-bit digitizing, plenty of memory and an eight-lane PCI Express host interface.


Xpress Yourself in Our Caption Contest

XCLAMAT IONS!

If you have a yen to Xercise your funny bone, here’syour opportunity. Starting with this issue, we invitereaders to step up to a verbal challenge and submit

an engineering- or technology-related caption for astrange, humorous or otherwise evocative illustration,cartoon or photograph. Here, for example, the Dali-likeimage of people and animals rubbernecking out of theircastle windows might inspire a caption like “Maybemanagement took the advice to move engineers out oftheir tech silos a bit too literally.”

Send your entries to [email protected]. Include yourname, job title, company affiliation and location, and astatement acknowledging that “I have read and agree to thefull official rules located at www.xilinx.com/xcellcontest.”After due deliberation, we will print the submissionswe like the best in the next issue of Xcell Journal andaward the winner the new Xilinx® SP601 EvaluationKit, our entry-level development environment for evalu-ating the Spartan®-6 family of FPGAs (Approximateretail value: $295; see http://www.xilinx.com/sp601). Tworunners-up will gain notoriety, fame and a cool, Xilinx-branded gift from our SWAG closet.

The deadline for submitting entries is December 11. So, get writing!

NO PURCHASE NECESSARY. You must be 18 or older and a resident of thefifty United States, the District of Columbia, or Canada (excluding Quebec)to enter. Entries must be entirely original and must be received by 5:00 pmPacific Time (PT) on December 11, 2009. Official rules available online atwww.xilinx.com/xcellcontest. Sponsored by Xilinx, Inc. 2100 Logic Drive, San Jose, CA 95124.

Defe

nse-G

rad

e F

PG

As

Vir

tex-5

Q F

PG

As

P

art

Nu

mb

er

XQ

5V

LX

30

TX

Q5

VLX

85

XQ

5V

LX

110

XQ

5V

LX

110

TX

Q5

VLX

15

5T

XQ

5V

LX

22

0T

XQ

5V

LX

33

0T

XQ

5V

SX

50

TX

Q5

VS

X9

5T

XQ

5V

SX

24

0T

XQ

5V

FX

70

TX

Q5

VF

X10

0T

XQ

5V

FX

13

0T

XQ

5V

LX

20

0T

Lo

gic

Reso

urc

es

Slic

es

(2)

4,8

00

12

,96

01

7,2

80

17,2

80

24

,32

03

4,5

60

51

,84

08

,16

01

4,7

20

37,4

40

11

,20

01

6,0

00

20

,48

03

0,7

20

Lo

gic

Cells

(3)

30

,72

08

2,9

44

110

,59

21

10

,59

21

55

,64

82

21

,18

43

31

,77

65

2,2

24

94

,20

82

39

,61

67

1,6

80

10

2,4

00

13

1,0

72

19

6,6

08

CLB

Flip

-Flo

ps

19

,20

05

1,8

40

69

,12

06

9,1

20

97,2

80

13

8,2

40

20

7,3

60

32

,64

05

8,8

80

14

9,6

70

44

,88

06

4,0

00

81

,92

01

22

,88

0

Mem

ory

Reso

urc

es

Maxi

mum

Dis

trib

ute

d R

AM

(K

bits)

32

08

40

1,1

20

1,1

20

1,6

40

2,2

80

3,4

20

78

01

,52

04

,20

08

20

1,2

40

1,5

80

2,2

80

Blo

ck

RA

M/F

IFO

w/E

CC

(3

6K

bits e

ach)

36

96

12

81

48

21

22

12

32

41

32

24

45

16

14

82

28

29

84

56

Tota

l B

lock

RA

M (

Kb

its)

1,2

96

3,4

56

4,6

08

5,3

28

7,6

32

7,6

32

11

,66

44

,75

28

,78

41

8,5

76

5,3

28

8,2

08

10

,72

81

6,4

16

Clo

ck R

eso

urc

es

Dig

ital C

lock

Manag

er

(DC

M)

41

21

21

21

21

21

21

21

21

21

21

21

21

2

I/O

Reso

urc

es

Phase L

ocke

d L

oo

p/P

MC

D2

66

66

66

66

66

66

6

Maxi

mum

Sin

gle

-End

ed

Pin

s3

60

56

08

00

68

06

80

68

09

60

48

06

40

96

06

40

68

08

40

96

0

Maxi

mum

Diffe

rential I/

O P

airs

18

02

80

40

03

40

34

03

40

48

02

40

32

04

80

32

03

40

42

04

80

Em

bed

ded

Hard

IP R

eso

urc

es

DS

P4

8E

Slic

es

32

48

64

64

12

81

28

19

22

88

64

01

,05

61

28

25

63

20

38

4

Po

werP

C® 4

40

Pro

cesso

r B

locks

––

––

––

––

––

12

22

PC

I E

xpre

ss

® Inte

rface B

locks

1–

–1

11

11

11

33

34

10

/10

0/1

00

Eth

ern

et

MA

C B

locks

et

4–

–4

44

44

44

44

68

Ro

cke

tIO

™ G

TP

Lo

w-P

ow

er

Transceiv

ers

8–

–1

61

61

62

41

21

62

4–

––

–

Ro

cke

tIO

™ G

TX

Hig

h-S

peed

Tra

nsceiv

ers

––

––

––

––

––

16

16

20

24

Co

nfig

ura

tio

nC

onfig

ura

tio

n M

em

ory

(M

bits)

9.4

21

.92

9.1

31

.24

3.1

55

.28

2.7

20

35

.87

9.7

27.1

39

.44

9.3

70

.9

Mis

cellaneo

us

Sp

eed

Gra

des

-1,

-2-1

,-2

-1,

-2-1

,-2

-1,

-2-1

-1-1

,-2

-1-1

-1,

-2-1

,-2

-1,

-2-1

Manufa

ctu

ring

Gra

des

II

II

II

II

II

II

II

Packag

e(7

)A

rea

EF

676

27

x 2

7 m

m4

40

44

0

EF

11

53

35

x 3

5 m

m8

00

FF

32

31

9 x

19

mm

17

2(4

)

EF

66

52

7 x

27

mm

36

0(8

)3

60

(8)

EF

11

36

35

x 3

5 m

m6

40

(16

)6

40

(16

)6

40

(16

)6

40

(16

)6

40

(16

)

EF

173

84

2.5

x 4

2.5

mm

68

0(1

6)

96

0(2

4)

68

0(1

6)

84

0(2

0)

FF

173

84

2.5

x 4

2.5

mm

96

0(2

4)

96

0(2

4)

Manu

factu

ring

Gra

des

http://w

ww.xilinx.c

om/produ

cts/milaero/rpt003

.pdf

Gra

de

De

scri

pti

on

Te

mp

era

ture

VQ

Pro

Ra

dia

tio

nH

ard

en

ed

QM

LC

lass

VM

ilit

ary

Ce

ram

icTc

=-5

5C

to+

12

5C

HQ

Pro

Flip

-Ch

ipR

ad

iati

on

To

lera

nt

Ce

ram

icT

j=

-55

Cto

+1

25

C

BS

MD

Ra

dia

tio

nTo

lera

nt

an

dN

on

-RT

SM

DM

ilit

ary

Ce

ram

icTc

=-5

5C

to+

12

5C

NM

ilit

ary

Pla

sti

cT

j=

-55

Cto

+1

25

C

MM

ilit

ary

Ce

ram

ico

rP

lasti

cT

j=

-55

Cto

+1

25

C(P

lastt

ic),

Tj

=-5

5C

to+

12

5C

(Ce

ram

ic)

IIn

du

str

ial

Pla

sti

cT

j=

-40

Cto

+10

0C

No

tes:

1. A

sin

gle

Virte

x-5

Q C

LB

co

mp

rises t

wo

slic

es, w

ith e

ach c

onta

inin

g f

our

6-i

np

ut

LU

Ts a

nd

four

Flip

-Flo

ps (

twic

e t

he n

um

ber

found

in a

Virte

x-4

slic

e),

for

a t

ota

l of

eig

ht

6-L

UTs

and

eig

ht

Flip

-Flo

ps p

er

CLB

; 2

. V

irte

x-5

lo

gic

cell

rating

s r

eflect

the incre

ased

lo

gic

cap

acity

off

ere

d b

y th

e n

ew

6-i

np

ut

LU

T a

rchitectu

re; 3

. D

igitally

Contr

olle

d Im

ped

ance (

DC

I) is a

vaila

ble

on I/O

s o

f all

devi

ces;

4. I/

O s

tand

ard

s s

up

port

ed

: H

T, L

VD

S, LV

DS

EX

T, R

SD

S, B

LVD

S, U

LVD

S, LV

PE

CL, LV

CM

OS

33

, LV

CM

OS

25

, LV

CM

OS

18

, LV

CM

OS

15

, LV

TT

L,

PC

I33

, P

CI6

6, P

CI-X

, G

TL, G

TL+

, H

ST

L I (

1.2

V,1

.5V,1

.8V

), H

STL II (1

.5V,1

.8V

), H

STL III (

1.5

V,1

.8V

), H

STL IV

(1

.5V,1

.8V

), S

STL2

I, S

STL2

II,

SS

TL1

8 I, S

STL1

8 II; 5

. O

ne s

yste

m m

onito

r b

lock

inclu

ded

in a

ll d

evi

ces; 6

. A

vaila

ble

I/O

fo

r each d

evi

ce-p

ackag

e

co

mb

inatio

n: num

ber

of S

ele

ctI

O p

ins (

num

ber

of R

ocke

tIO

tra

nsceiv

ers

).


Her

e is

a lis

t of o

ur A

&D

FPG

As f

rom

our

pro

duct

sele

ctio

n gu

ide.

To

see

our

entir

e po

rtfo

lio v

isit h

ttp://

ww

w.x

ilinx

.com

/pub

licat

ions

/mat

rix/

Prod

uct_

Sele

ctio

n_G

uide

.pdf

or

to o

btai

n a

hard

cop

y of

the

sele

ctio

n gu

ide,

con

tact

you

r lo

cal X

ilinx

sale

s rep

rese

ntat

ives

.

Defe

nse-G

rad

e F

PG

As

Vir

tex-4

Q F

PG

As

Vir

tex-I

I P

ro X

Q F

PG

As

Vir

tex-I

I X

Q F

PG

As

P

art

Nu

mb

er

XQ

4V

LX

25

XQ

4V

LX

40

XQ

4V

LX

60

XQ

4V

LX

10

0X

Q4

VLX

16

0X

Q4

VS

X5

5X

Q4

VF

X6

0X

Q4

VF

X10

0X

Q2

VP

40

XQ

2V

P7

0X

Q2

V10

00

XQ

2V

30

00

XQ

2V

60

00

Co

re V

oltag

e1

.2V

1.2

V1

.2V

1.2

V1

.2V

1.2

V1

.2V

1.2

V1

.5V

1.5

V1

.5V

1.5

V1

.5V

Lo

gic

Reso

urc

es

Slic

es

(1)

10

,75

21

8,4

32

26

,62

44

9,1

52

67,5

84

24

,57

62

5,2

80

42

,17

61

9.3

92

33

,08

85

,12

01

4,3

36

33

,79

2

Lo

gic

Cells

24

,19

24

1,4

72

59

.90

41

10

,59

21

52

,06

45

5,2

96

56

,88

09

4,8

96

44

,63

27

4,4

48

11

,52

03

2,2

56

76

,03

2

CLB

Flip

-Flo

ps

21

,50

43

6,8

64

53

,24

89

8,3

04

13

5,1

68

49

,15

25

0,5

60

84

,35

23

8,7

84

66

,17

610

,24

02

8,6

72

67,5

84

Mem

ory

Reso

urc

es

Maxi

mum

Dis

trib

ute

d R

AM

(K

bits)

16

82

88

41

67

68

1,0

56

38

43

95

65

96

06

1,0

34

16

04

48

1,0

56

Blo

ck

RA

M/F

IFO

w/E

CC

(3

6K

bits e

ach)

72

96

16

02

40

28

83

20

23

23

76

19

23

28

40

96

14

4

Tota

l B

lock

RA

M (

Kb

its)

1,2

96

1,7

28

2,8

80

4,3

20

5,1

84

5,7

60

4,1

76

6,7

68

3,4

56

5,9

04

72

01

,72

82

,59

2

Clo

ck R

eso

urc

es

Dig

ital C

lock

Manag

er

(DC

M)

88

81

21

28

12

12

88

81

21

2

I/O

Reso

urc

es

Maxi

mum

Sin

gle

-End

ed

I/O

s4

48

64

06

40

96

09

60

64

05

76

76

88

04

99

64

32

72

01

,10

4

Maxi

mum

Diffe

rential I/

O P

airs

22

43

20

32

04

80

48

03

20

22

83

84

39

64

92

21

63

60

55

2

Dig

itally

Co

ntr

olle

d Im

ped

ance

YE

SY

ES

YE

SY

ES

YE

SY

ES

YE

SY

ES

YE

SY

ES

YE

SY

ES

YE

S

Em

bed

ded

Hard

IP R

eso

urc

es

DS

P S

lices

48

64

64

96

96

51

21

28

16

0–

––

––

18

x 1

8 M

ultip

liers

––

––

––

––

19

23

28

40

96

14

4

Ro

cke

tIO

™ T

ransceiv

ers

––

––

––

16

20

8o

r1

22

0–

––

Po

werP

C® P

rocesso

r B

locks

––

––

––

22

22

––

–

Mis

cellaneo

us

Sp

eed

Gra

des

-10

-10

-10

-10

-10

-10

-10

-10

-5-5

-4-4

-4

Co

nfig

ura

tio

n M

em

ory

(M

bits)

4.8

12

.31

7.7

30

.74

0.3

22

.72

1.0

33

.01

5.5

25

.64

.110

.52

1.9

Manufa

ctu

ring

Gra

des

MI,

MM

II

MI,

MI

NN

NM

,N

,B

M

Packag

es

SF

36

3, F

F6

68

FF

66

8F

F6

68

, F

F1

14

8, E

F6

68

FF

11

48

FF

11

48

FF

11

48

EF6

72

, F

FG

11

52

*F

F1

15

2F

F1

15

2, F

G676

FF

17

04

FG

45

6, B

G57

5C

G7

17,

BG

72

8C

F1

14

4

Manu

factu

ring

Gra

des

http://w

ww.xilinx.c

om/produ

cts/milaero/rpt003

.pdf

Gra

de

De

scri

pti

on

Te

mp

era

ture

VQ

Pro

Ra

dia

tio

nH

ard

en

ed

QM

LC

lass

VM

ilit

ary

Ce

ram

icTc

=-5

5C

to+

12

5C

HQ

Pro

Flip

-Ch

ipR

ad

iati

on

To

lera

nt

Ce

ram

icT

j=

-55

Cto

+1

25

C

BS

MD

Ra

dia

tio

nTo

lera

nt

an

dN

on

-RT

SM

DM

ilit

ary

Ce

ram

icTc

=-5

5C

to+

12

5C

NM

ilit

ary

Pla

sti

cT

j=

-55

Cto

+1

25

C

MM

ilit

ary

Ce

ram

ico

rP

lasti

cT

j=

-55

Cto

+1

25

C(P

lastt

ic),

Tj

=-5

5C

to+

12

5C

(Ce

ram

ic)

IIn

du

str

ial

Pla

sti

cT

j=

-40

Cto

+10

0C

Note

s:

1. E

ach s

lice c

om

prises t

wo

4-i

np

ut

log

ic functio

n g

enera

tors

(LU

Ts),

tw

o s

tora

ge e

lem

ents

, w

ide-f

unction m

ultip

lexe

rs, and

carr

y lo

gic

.


Sp

ace-G

rad

e Q

Pro

• F

PG

As

Vir

tex-4

QV

FP

GA

sV

irte

x-I

I X

QR

FP

GA

sV

irte

x X

QR

FP

GA

s

P

art

Nu

mb

er

XQ

R4

VLX

20

0X

QR

4V

SX

55

XQ

R4

VF

X6

0X

QR

4V

FX

14

0X

QR

2V

30

00

XQ

VR

30

0X

QV

R6

00

Core

Voltag

e1

.2V

1.2

V1

.2V

1.2

V1

.5V

2.5

V2

.5V

Lo

gic

Reso

urc

es

Slic

es

(1)

89

,08

82

4,5

76

25

,28

06

3,1

68

14

,33

63

,07

26

,91

2

Log

ic C

ells

20

0,4

48

55

,29

65

6,8

80

14

2,1

28

32

,25

66

,91

21

5,5

52

CLB

Flip

-Flo

ps

17

8,1

76

49

,15

25

0,5

60

12

6,3

36

28

,67

26

,14

41

3,8

24

Mem

ory

Reso

urc

es

Maxi

mum

Dis

trib

ute

d R

AM

(K

bits)

1,3

92

38

43

95

98

74

48

1,7

11

3,5

23

Blo

ck

RA

M/F

IFO

w/E

CC

(1

8K

bits e

ach)

33

63

20

23

25

52

96

––

Tota

l B

lock

RA

M (

Kb

its)

6,0

48

5,7

60

4,1

76

9,9

36

1,7

28

64

96

Clo

ck R

eso

urc

es

Dig

ital C

lock

Manag

er

(DC

M)

12

81

22

01

24

4

I/O

Reso

urc

es

Maxi

mum

Sin

gle

-End

ed

I/

Os

96

06

40

57

68

96

72

03

16

31

6

Maxi

mum

Diffe

rential I/

O P

airs

48

03

20

22

44

48

36

0–

–

Dig

itally

Co

ntr

olle

d Im

ped

ance

YE

SY

ES

YE

SY

ES

YE

S–

–

Em

bed

ded

Hard

IP R

eso

urc

es

DS

P S

lices

96

51

21

28

19

2–

––

18

x 1

8 M

ultip

liers

––

––

96

––

10

/10

0/1

00

Eth

ern

et

MA

C B

locks

et

––

44

––

–

Po

werP

C® P

rocessor

Blo

cks

s–

–2

2–

––

Mis

cellaneo

us

Sp

eed

Gra

des

-10

-10

-10

-10

-4-4

-4

Co

nfig

ura

tio

n M

em

ory

(M

bits)

51

.42

2.7

21

.04

7.9

10

.51

.73

.5

Manufa

ctu

ring

Gra

des

VV

VV

M,

VM

,V

,B

M,

V,

B

Tota

l Io

niz

ing

Dose (

krad

)3

00

30

03

00

30

02

00

10

010

0

SE

L Im

munity

(MeV

-cm

2/m

g)

>1

25

>1

25

>1

25

>1

25

>1

60

>1

25

>1

25

Packag

e(2

)A

rea

Ava

ilab

le U

ser

I/O

s

CG

A P

ackag

es (

CG

): c

era

mic

co

lum

n g

rid

arr

ay

(1.2

7 m

m b

all

sp

acin

g)

CG

71

7(3

)3

5 x

35

mm

51

6

CFA

Packag

es (

CF

): fl

ip-c

hip

cera

mic

colu

mn g

rid

arr

ay

(1.0

mm

ball

sp

acin

g)

CF

11

44

(4)

35

x 3

5 m

m5

76

CF

11

40

(5)

35

x 3

5 m

m6

40

CF

15

09

(6)

40

x 4

0 m

m9

60

76

8

CQ

FP

Packag

es (

CB

): c

era

mic

bra

zed

quad

flat

pack

(0.0

25

inch lead

sp

acin

g)

CB

22

81

.55

x 1

.55

in

16

21

62

Manu

factu

ring

Gra

des

http://w

ww.xilinx.c

om/produ

cts/milaero/rpt003

.pdf

Gra

de

De

scri

pti

on

Te

mp

era

ture

VQ

Pro

Ra

dia

tio

nH

ard

en

ed

QM

LC

lass

VM

ilit

ary

Ce

ram

icTc

=-5

5C

to+

12

5C

HQ

Pro

Flip

-Ch

ipR

ad

iati

on

To

lera

nt

Ce

ram

icT

j=

-55

Cto

+1

25

C

BS

MD

Ra

dia

tio

nTo

lera

nt

an

dN

on

-RT

SM

DM

ilit

ary

Ce

ram

icTc

=-5

5C

to+

12

5C

NM

ilit

ary

Pla

sti

cT

j=

-55

Cto

+1

25

C

MM

ilit

ary

Ce

ram

ico

rP

lasti

cT

j=

-55

Cto

+1

25

C(P

lastt

ic),

Tj

=-5

5C

to+

12

5C

(Ce

ram

ic)

IIn

du

str

ial

Pla

sti

cT

j=

-40

Cto

+10

0C

No

tes:

1. E

ach s

lice c

om

prises t

wo 4

-inp

ut

log

ic f

unction g

enera

tors

(LU

Ts),

tw

o s

tora

ge e

lem

ents

, w

ide-f

unction m

ultip

lexe

rs, and

carr

y lo

gic

. 2

. Fo

r in

form

atio

n o

n D

SC

C S

MD

ava

ilab

ility

co

nta

ct

Xili

nx

.

3. The B

G7

28

and

CG

71

7 p

ackag

es a

re f

ootp

rint

/ p

in c

om

patib

le. 4

. The C

F1

14

4 a

nd

FF1

15

2 p

ackag

es a

re f

ootp

rint

/ p

in c

om

patib

le. 5

. The C

F1

14

0 a

nd

FF

11

48

packag

es a

re fo

otp

rint

/ p

in

co

mp

atib

le. 6

. Fo

r th

e X

QR

4V

LX

20

0, th

e C

F1

50

9 a

nd

FF1

51

3 p

ackag

es a

re f

ootp

rint

/ p

in c

om

patib

le. For

the X

QR

4V

FX

14

0, th

e C

F1

50

9 a

nd

the F

F1

517

are

fo

otp

rint

/ p

in c

om

patib

le.


Defe

nse-G

rad

eC

onfig

ura

tio

n P

RO

Ms

Sp

ace-G

rad

e Q

Pro

Rad

iatio

n T

ole

rant

Co

nfig

ura

tio

n P

RO

Ms

P

art

Nu

mb

er

XQ

17

01

LX

Q17

V16

XQ

18

VQ

4X

QF

32

PX

QR

17

01

LX

QR

17

V16

Core

Voltag

e(1

)3

.3V

3.3

V3

.3V

3.3

V3

.3V

3.3

V

Sto

rag

e B

its

1M

16

M4

M3

2M

1M

16

M

Manufa

ctu

ring

Gra

des

M,

NM

,N

NM

M,

VM

,V

Tota

l Io

niz

ing

Dose (

krad

)–

––

–5

05

0

Packag

es

CC

44

,V

Q4

4C

C4

4,

VQ

44

VQ

44

VQ

48

CC

44

CC

44

Packag

e(2

)A

rea

CC

44

0.6

9 x

0.6

9 in

VQ

44

12

x 1

2 m

m

VQ

48

20

x 2

0 m

m

Manu

factu

ring

Gra

des

http://w

ww.xilinx.c

om/produ

cts/milaero/rpt003

.pdf

Gra

de

De

scri

pti

on

Te

mp

era

ture

VQ

Pro

Ra

dia

tio

nH

ard

en

ed

QM

LC

lass

VM

ilit

ary

Ce

ram

icTc

=-5

5C

to+

12

5C

HQ

Pro

Flip

-Ch

ipR

ad

iati

on

To

lera

nt

Ce

ram

icT

j=

-55

Cto

+1

25

C

BS

MD

Ra

dia

tio

nTo

lera

nt

an

dN

on

-RT

SM

DM

ilit

ary

Ce

ram

icTc

=-5

5C

to+

12

5C

NM

ilit

ary

Pla

sti

cT

j=

-55

Cto

+1

25

C

MM

ilit

ary

Ce

ram

ico

rP

lasti

cT

j=

-55

Cto

+1

25

C(P

lastt

ic),

Tj

=-5

5C

to+

12

5C

(Ce

ram

ic)

IIn

du

str

ial

Pla

sti

cT

j=

-40

Cto

+10

0C

No

tes:

1. X

ilinx

config

ura

tion P

RO

Ms h

ave

ad

justa

ble

I/O

voltag

es f

or

com

patib

ility

with a

ll X

ilinx

FP

GA

s.

2. The C

C4

4 a

nd

PC

44

packag

es a

re f

ootp

rint/

pin

com

patib

le. For

info

rmation o

n D

SC

C q

ualifi

cation c

onta

ct

Xili

nx.


FASTER THAN THE

The path to true innovation is never a straight line. Only Xilinx programmable silicon,

software, IP and 3rd party support gives you the agility to stay ahead of the competition and adapt

to changing market requirements, without slowing down. So you have the freedom to innovate

without risk. Find out more about Xilinx Targeted Design Platforms at www.xilinx.com.

SPEED OF CHANGE

© Copyright 2009 Xilinx, Inc. All rights reserved. Xilinx and the Xilinx logo are registered trademarks of Xilinx in the United States and other countries. All other trademarks are property of their respective holders.

PN 2426

Documents

FPGAs Power Net-Centric - Xilinx · Letter from the Publisher Customer Collaboration, Teamwork Key to Next-Gen FPGA Development…4 Xpert OpinionWhy Software Programmability of Electronics