1
PATRICK JONAS E VERY year, the Massachusetts In- stitute of Technology’s (MIT) Tech- nology Review magazine comes up with its annual list of 10 top emerg- ing technologies. This year, there are Indian names in the list. One of the inventions that made the list is the result of work done by a team of researchers led by Prof Vivek Pai at Princeton University. It is called Hash- Cache and has drawn recognition as a revolutionary way to expand Internet access around the world. “HashCache is a system that stores Web content to disk, and reuses it when possible, so that your Internet access ap- pears to be much faster,” Prof Pai told tabla!. It has the potential to expand Inter- net use in developing regions around the world, taking advantage of cheaper components like large disks. To understand how HashCache works, see report below. There is another Indian face in Prof Pai’s team. Mr Anirudh Badam is a grad- uate student at Princeton who leads the project, working closely with Prof Pai. The other members of the team are Prof Kyoung Soo Park, now at the Uni- versity of Pittsburgh, Prof Larry Peter- son, the department chair at Princeton, and research scientist Marc Fiuczynski who helps arrange and coordinate the test deployments in various parts of the world. The new system is currently being tested at the Kokrobitey Institute in Gha- na and Obafemi Awolowo University in Nigeria. Prof Pai was born in Mumbai and moved to the United States with his par- ents in 1970, when he was just two years old. He married an Indian and of- ten visits India with her. “Part of the motivation for looking at technologies for the developing world was my experience when I was last in In- dia, trying to get my in-laws’ computer working again,” said Prof Pai. Added Mr Badam: “I was very excit- ed and enthusiastic right from the day Vivek first proposed that we find a solu- tion to this challenging problem. The re- sult of our efforts in this direction is HashCache and we are all very excit- ed about it.” The Hyderabad na- tive is a third-year PhD student at Princeton. He moved to the US in 2006 after doing his de- gree in computer science from the Indian Institute of Technology in Chen- nai. Mr Badam has been interested in high performance servers since his days at the IIT. “Being from a developing country, I have had first hand experience of using slow and inter- mittent Internet connections. Never un- til I came to the US did I realise what good Internet actually is. This was also a motivating reason behind me being very enthused to solve this problem,” Mr Badam told tabla!. He added that Prof Pai was previous- ly involved in developing the world’s fastest Web server (Flash Web Server) and later in the development of the world’s fastest Web proxy cache (iMim- ic Web Proxy Cache). “So, once I got the opportunity to work with him, there was no looking back,” he said. The Technology Re- view presented the list in New Delhi on March 2 at the inaugural EmTech India confer- ence which Prof Pai un- fortunately could not at- tend. “We recently had a new baby, and I’m in the middle of the school semester, so my travel is a bit constrained at the moment,” he said. What about Mr Badam? He is busy working on new technolo- gies at Princeton. When tabla! wrote to him at 5pm (Singapore time), he replied immediately. He was in the lab, at 4am in Princeton! [email protected] HIMACHAL PRADESH - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Schools dump painter Hussain THE Himachal Pradesh Board of School Education, on the recommendation of a committee, has decided to do away with a chapter on noted painter M.F. Hussain, reported DNA. Board chairman Chaman Lal Gupta said the chapter on Hussain “which has nothing to inspire students” would be replaced with one on painter Sobha Singh and former Indian president A.P.J. Abdul Kalam. He said Singh would make more sense to students in Himachal Pradesh because he was based in the state and incorporated Himachali culture in his paintings. GUJARAT - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Reliance creates refinery giant IN THE largest ever merger of business units in India, Reliance Industries (RIL) on March 2 decided to merge its subsidiary Reliance Petroleum Ltd (RPL) with it, through a share swop of one RIL share for 16 RPL shares. RIL’s chief financial officer Alok Agarwal told the media that, with a combined refining capacity of 1.24 million barrels per day, the merger will create the world’s largest refinery complex at any single location. TAMIL NADU - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - GPS for Chennai buses THE Metropolitan Transport Corporation is fitting 600 buses with GPS (Global Positioning System). By the end of this month 300 of them will be fitted and the rest will get it by the end of April. The GPS will help passengers waiting at bus-stops to be updated on the expected arrival time of buses. Most of these buses would be operated on Anna Salai and Poonamallee High Road, reported The Hindu. WE ASKED Prof Vivek Pai to explain it simple terms how HashCache works. This is his explanation: “Let’s say that you want to visit a Web page, like http://www.cs.princeton.edu/~vivek To your browser, that page actually looks like many different pieces, which have names like http://www.cs.princeton.edu/~vivek/- index.html http://www.cs.princeton.edu/~viv- ek/nsg.css Each of those pieces is called an object. So, to build that page, your browser is sending multiple requests, one for each object. If another colleague at your paper also visited the page, his browser would also send that same sequence of requests. All that traffic would flow from the US to Singapore twice. What a cache does is store each object as it comes in, and then checks to see whether a request can be satisfied from the object it has stored, instead of being re-fetched all the way from the US. The way that caches typically operate is that they have to store each object somewhere on disk. Since it’s hard to work with long names like http://www.cs.princeton.edu/~vivek- /index.html the cache reduces it to a number, say 125371242. This process is called hashing. However, it then has to keep in RAM some information that says that this particular file is located at a given location on the disk. That portion is called the index. As the size of your disk grows, the size of this index also grows. What HashCache does is to get rid of this index. Instead, once it calculates the hash value for a file, it uses that same number to determine what location on the disk it should use to store that file. There are some tricks involved and some complications, but that’s the basic idea. By eliminating this mapping, HashCache does not need to keep a lot of information in RAM. Since it’s easier to buy lots of disks rather than lots of RAM, HashCache allows you to build caches using much cheaper systems. HashCache is hot Princeton University invention makes Internet access faster Cache them if you can... (from left) Mr Badam, Prof Peterson, Prof Pai and Mr Fiuczynski. PHOTO: PRINCETON UNIVERSITY How you can cache in... Once I got the opportunity to work with Prof Pai, there was no looking back. – Mr Anirudh Badam regional roundup INDIA tabla! March 6, 2009 Page 6

March 6, 2009 Page 6 HashCache is hot - KAISTkyoungsoo/HashCache_article.pdfgies at Princeton. When tabla! wrote to him at 5pm (Singapore time), he replied immediately. He was in the

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: March 6, 2009 Page 6 HashCache is hot - KAISTkyoungsoo/HashCache_article.pdfgies at Princeton. When tabla! wrote to him at 5pm (Singapore time), he replied immediately. He was in the

PATRICK JONAS

E VERY year, the Massachusetts In-stitute of Technology’s (MIT) Tech-nology Review magazine comes

up with its annual list of 10 top emerg-ing technologies.

This year, there are Indian names inthe list.

One of the inventions that made thelist is the result of work done by a teamof researchers led by Prof Vivek Pai atPrinceton University. It is called Hash-Cache and has drawn recognition as arevolutionary way to expand Internetaccess around the world.

“HashCache is a system that storesWeb content to disk, and reuses it whenpossible, so that your Internet access ap-pears to be much faster,” Prof Pai toldtabla!.

It has the potential to expand Inter-net use in developing regions aroundthe world, taking advantage of cheapercomponents like large disks.

To understand how HashCacheworks, see report below.

There is another Indian face in ProfPai’s team. Mr Anirudh Badam is a grad-uate student at Princeton who leads theproject, working closely with Prof Pai.The other members of the team areProf Kyoung Soo Park, now at the Uni-versity of Pittsburgh, Prof Larry Peter-son, the department chair at Princeton,and research scientist Marc Fiuczynskiwho helps arrange and coordinate thetest deployments in various parts of theworld.

The new system is currently beingtested at the Kokrobitey Institute in Gha-na and Obafemi Awolowo University inNigeria.

Prof Pai was born in Mumbai andmoved to the United States with his par-ents in 1970, when he was just twoyears old. He married an Indian and of-ten visits India with her.

“Part of the motivation for looking at

technologies for the developing worldwas my experience when I was last in In-dia, trying to get my in-laws’ computerworking again,” said Prof Pai.

Added Mr Badam: “I was very excit-ed and enthusiastic right from the dayVivek first proposed that we find a solu-tion to this challenging problem. The re-sult of our efforts in thisdirection is HashCacheand we are all very excit-ed about it.”

The Hyderabad na-tive is a third-year PhDstudent at Princeton. Hemoved to the US in2006 after doing his de-gree in computer sciencefrom the Indian Instituteof Technology in Chen-nai.

Mr Badam has beeninterested in high performance serverssince his days at the IIT. “Being from adeveloping country, I have had firsthand experience of using slow and inter-mittent Internet connections. Never un-til I came to the US did I realise whatgood Internet actually is. This was also amotivating reason behind me being veryenthused to solve this problem,”

Mr Badam told tabla!.He added that Prof Pai was previous-

ly involved in developing the world’sfastest Web server (Flash Web Server)and later in the development of theworld’s fastest Web proxy cache (iMim-ic Web Proxy Cache). “So, once I gotthe opportunity to work with him, there

was no looking back,”he said.

The Technology Re-view presented the listin New Delhi on March2 at the inauguralEmTech India confer-ence which Prof Pai un-fortunately could not at-tend. “We recently hada new baby, and I’m inthe middle of the schoolsemester, so my travel isa bit constrained at the

moment,” he said.What about Mr Badam?He is busy working on new technolo-

gies at Princeton. When tabla! wrote tohim at 5pm (Singapore time), he repliedimmediately. He was in the lab, at 4amin Princeton!

[email protected]

HIMACHAL PRADESH------------------------------

Schools dump painterHussainTHE Himachal PradeshBoard of SchoolEducation, on therecommendation of acommittee, has decided todo away with a chapter onnoted painter M.F.Hussain, reported DNA.

Board chairmanChaman Lal Gupta saidthe chapter on Hussain“which has nothing toinspire students” would bereplaced with one onpainter Sobha Singh andformer Indian presidentA.P.J. Abdul Kalam. Hesaid Singh would makemore sense to students inHimachal Pradesh becausehe was based in the stateand incorporatedHimachali culture in hispaintings.

GUJARAT------------------------------

Reliance createsrefinery giantIN THE largest evermerger of business units inIndia, Reliance Industries(RIL) on March 2 decidedto merge its subsidiaryReliance Petroleum Ltd(RPL) with it, through ashare swop of one RILshare for 16 RPL shares.

RIL’s chief financialofficer Alok Agarwal toldthe media that, with acombined refiningcapacity of 1.24 millionbarrels per day, themerger will create theworld’s largest refinerycomplex at any singlelocation.

TAMIL NADU------------------------------

GPS for ChennaibusesTHE MetropolitanTransport Corporation isfitting 600 buses with GPS(Global PositioningSystem). By the end ofthis month 300 of themwill be fitted and the restwill get it by the end ofApril. The GPS will helppassengers waiting atbus-stops to be updated onthe expected arrival timeof buses.

Most of these buseswould be operated onAnna Salai andPoonamallee High Road,reported The Hindu.

WE ASKED Prof Vivek Pai to explainit simple terms how HashCache works.This is his explanation:

“Let’s say that you want to visit aWeb page, likehttp://www.cs.princeton.edu/~vivek

To your browser, that page actuallylooks like many different pieces, whichhave names likehttp://www.cs.princeton.edu/~vivek/-index.html

http://www.cs.princeton.edu/~viv-ek/nsg.css

Each of those pieces is called anobject. So, to build that page, yourbrowser is sending multiple requests,one for each object. If anothercolleague at your paper also visited the

page, his browser would also send thatsame sequence of requests. All thattraffic would flow from the US toSingapore twice.

What a cache does is store eachobject as it comes in, and then checksto see whether a request can besatisfied from the object it has stored,instead of being re-fetched all the wayfrom the US.

The way that caches typicallyoperate is that they have to store eachobject somewhere on disk. Since it’shard to work with long names likehttp://www.cs.princeton.edu/~vivek-/index.html the cache reduces it to anumber, say 125371242. This processis called hashing. However, it then hasto keep in RAM some information that

says that this particular file is located ata given location on the disk. Thatportion is called the index. As the sizeof your disk grows, the size of thisindex also grows.

What HashCache does is to get ridof this index. Instead, once it calculatesthe hash value for a file, it uses thatsame number to determine whatlocation on the disk it should use tostore that file.

There are some tricks involved andsome complications, but that’s the basicidea. By eliminating this mapping,HashCache does not need to keep a lotof information in RAM. Since it’s easierto buy lots of disks rather than lots ofRAM, HashCache allows you to buildcaches using much cheaper systems.

HashCache is hotPrinceton Universityinvention makesInternet access faster

Cache them ifyou can...(from left)Mr Badam,Prof Peterson,Prof Pai andMr Fiuczynski.PHOTO:PRINCETONUNIVERSITY

How you can cache in...

Once I got theopportunityto work withProf Pai, there

was no lookingback.

– Mr Anirudh Badam

regionalroundup

INDIA tabla! March 6, 2009 Page 6