Upload
batista-harahap
View
1.053
Download
0
Embed Size (px)
DESCRIPTION
Urbanesia's brief development history for Business Connect - 29 October 2012
Citation preview
URBANESIADevelopment History
Business Connect – 29 Oktober 2012
Prepared by: Batista Harahap
URBANESIA BETA V0The first public iteration of Urbanesia
PROS
• Data structures in MySQL
• Effective memory caching implementations
• Effective SEO implementations
• Effective search server implementations
• Urbanesia is successfully consumed as a Directory
CONS
• No effective separation of Backend & Frontend web applications
• Source Code = Spaghetti Code• Storing low value, high volume data in MySQL• Many queries using GROUP BY with highly populated tables• A warm boot will cause +20 seconds to generate any page• Difficult to scale horizontally & vertically• Very low concurrency
• The product’s identity is weak• So many features left unused by users
WHAT WE LEARNED
• Do NOT use MySQL as session storage• Use NoSQL database for low value, high volume
data• Separate backend & frontend web application,
create APIs for backends• Use output caching where available• When using PHP-APC, make sure apc.stat = 0• Increase concurrency by reverse proxying
requests to Apache
CHALLENGES
• Handle Google Bots traffic of over 1 TB/month with only 2 servers
• Do output caching with Codeigniter
• Achieving sub second page generation even in warm boots
• Redesign backend by creating an API for our native apps
URBANESIA V1The second iteration based on refined codes
and infrastructure design
PROS
• Achieved sub second page generation in warm boots• Aggressive & effective caching mechanism• Optimized MY_Controller• Session storage handled by Memcache• MySQL read/write access lowered from ~400 qps to only 1 qps• Lean memory usage in database server• Created an OAUTH enabled API• Concurrency increased by using nginx as reverse proxy• The same server setup can theoretically handle 10x the current traffic
without scaling horizontally• Google bots are only limited by bandwidth instead of efficient codes• Index properly with MySQL• Don’t use MySQL, used custom built MySQL alternative: Percona Server
CONS
• Source code = Spaghetti code• Unpredictable behavior of codes because of V0 inheritance,
when more rows fill, queries are bottlenecks• Subqueries still exists• Everything is still synchronous, no message queue yet• The end product fails to impress the illusion of speed (fast)
to users• New hires have a steeper learning curve because of the
inherited complexity added with V1’s own complex• Still difficult to scale horizontally & vertically
WHAT WE LEARNED
• CodeIgniter is enabling fast product delivery but optimization & efficiency of codes are questionable at best
• Need to enable asynchronous architecture• Do not do things realtime, instead offload to message queues• To impress users with the illusion of speed, JavaScript must be
thoroughly implemented• Emails should not be handled by ourselves, use third party email
solutions like AWS SES• Offload server side international bandwidth to clients, for
Facebook, use Facebook JS SDK instead of the PHP SDK• The product gains more engagements with contents that are more
focused (thematic)• Speed of content delivery is important to engagement metrics
CHALLENGES
• Build a third iteration with a strong identity based on users’ personas
• Focus more on verticals, create the illusion of a discovery/recommendation platform
• Progressive Disclosure of contents• A JavaScript framework that is light, fast and minimal
dependencies• Make everything asynchronous and message/event based• Redefine Urbanesia’s atomic data structure• Do MySQL JOINs in server side• Get the data first FAST, compute later
PRODUCTS & TECHNOLOGIESDoes the product makes the technology
or the technology makes the product?
THE PRODUCT MAKES THE TECHNOLOGY!
REAL WORLD EXAMPLES
• We need to know which part of Urbanesia will really work for users
• Store the preferences for each users’ dynamic activity
• Make calculations of other contents a user might consume
• Present the content unobtrusively
• Do it fast and almost realtime
TECHNICAL SPEAK
We need to know which part of Urbanesia will really work for users
• Mine all user’s data each time they visit, including anonymous users
• Log everything FAST and asynchronously
• Low value & high volume data
• Avoid MySQL at all cost
• Model data based on choosen NoSQL database model
TECHNICAL SPEAK
Introducing Redis
• Read/Write data from memory• Stores data on disk• Key/Value similarity with Memcache• Ability to perform atomic tasks without worrying states• Redis’ primitive data types are very simple• Ideal for low value/high volume data• Less is more!
TECHNICAL SPEAK
Store the preferences for each users’ dynamic activity
• Simple increments• Perfect for Sorted Hashmaps in Redis• Need them sorted so analytics functions is supported
primitively by Redis == High Performance• Fire & Forget – Consider using async frameworks like
Node.js & trigger using JavaScript• Why trigger with JavaScript? To make sure at the very
least that it’s actually users accessing the page
TECHNICAL SPEAK
Node.js & Socket.io
• Node.js is a Network ready daemon with Chrome’s V8 JavaScript engine inside
• Node.js is asynchronous by default (event based)• Socket.io is the transport used for data• Socket.io is abstracted to fallback gracefully between
Websocket, Flash and plain AJAX• JavaScript clients should only subscribe to onFailed
events to minimize overhead
TECHNICAL SPEAK
Make calculations of other contents a user might consume
• Use Machine Learning algorithms to learn users behaviors
• Naïve Bayes Classifier to the rescue
• Independent per keyword assumptions
• Proven algorithm used by many big websites
TECHNICAL SPEAK
Naïve Bayes Classifier
• There is no wrong or right assumptions, only accuracy
• Accuracy is increased with more data and better classifications
• Relatively easy to code
• Lots of libraries out there in different languages
TECHNICAL SPEAK
Present the content unobtrusively
• Giving users the illusion that we understand them
• Do not make this feature dominant
• Show it where you want the content look smart
TECHNICAL SPEAK
Do it fast and almost realtime
• Fast is an illusion
• Realtime is overrated
• If you don’t have enough resource to do so, schedule it and pre generate content
• Scale vertically
Talk is cheap, show me the CODES!
URBANESIA @ Github
https://github.com/Urbanesia
URBANESIA @ Github
https://github.com/Urbanesia/Simple-Naive-Bayes-Classifier-for-PHP
NAÏVE BAYES CLASSIFIER
First Iteration:
• Took ~1000 seconds to classify 1 keyword
• MySQL as storage
• No micro optimizations
NAÏVE BAYES CLASSIFIER
Second Iteration:
• Took ~400 seconds to classify 1 keyword
• MongoDB as storage
• Macro optimization trimmed 600 of 1000 seconds
• No micro optimizations
NAÏVE BAYES CLASSIFIER
Third Iteration:
• Took ~1 second to classify 1 keyword
• Redis as storage
• Insane macro optimization boost
• No micro optimizations
NAÏVE BAYES CLASSIFIER
Fourth Iteration:
• Took 0.01428 second to classify 1 keyword
• Redis as storage
• Reworked classification algorithm
• Get the data first and compute later
• More memory usage, faster execution time
NAÏVE BAYES CLASSIFIER
Fifth Iteration:
• Reworked the trainer methods
• Created deTrain method to update data
• Created helpers to do keyword blacklists
• Consistent performance from CLI or HTTP
NAÏVE BAYES CLASSIFIER
What we learned:• Always be open to new things• Geek Talk with peers from the industry• Very talented people will always come up with smarter and
better way to do something• Decide, get smart or get smarter?• Algorithms are the engine but it doesn’t mean anything
without implementation• Consider opening up source codes for others to examine,
the smarter the population, the better products we create• Focus on USERS instead of technology
NAÏVE BAYES CLASSIFIER
More insights below:
http://www.bango29.com/go/blog/2012/naive-bayes-classifier-revisited
GeekballEvery Tuesday, 17.00 – 19.00Basket Hall C, Senayan
OUR PRODUCTSUrbanesia’s product lineup
URBANESIA.COM
URBANESIA.COM SEARCH
M.URBANESIA.COM
URBAN’S NOTES
JAJAN
JAJAN
JAJAN
Jajan is Open Source, get the source codes:• Blackberry - https://github.com/Urbanesia/Jajan-Blackberry• Android - https://github.com/Urbanesia/Jajan• HTML5 - https://github.com/Urbanesia/jajan-html5
Platforms:• Blackberry - https://appworld.blackberry.com/webstore/content/54742/• Android - https://play.google.com/store/apps/details?id=com.bango.jajan• iOS - https://itunes.apple.com/us/app/jajan/id527278768?mt=8• HTML5 - https://jajan5.urbanesia.com/
WHAT’S NEXTOur third iteration of Urbanesia.com
WHAT’S NEXT
• A rework from scratch both in Product Design and Technical Implementation
• Focusing more on users and our RICH content
• A social network useful for everyday city life
• Machine learning implementation for our recommendation engine
KEY TAKEAWAYSSummary
KEY TAKEAWAYS
• Empower people working with you
• Invest in company culture
• Focus on USERS, not technology
• Macro to Micro optimizations & scaling
• Be open to new ideas (things)
• Geek Talks over whatever like Basketball or Beer
• Good is not Great
• Whatever WORKS
Hi! From Urbanesia