Upload
maarten-balliauw
View
1.365
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Ever wonder how some applications are built? Ever wonder how to combine components of the Windows Azure platform? Stop wondering and learn how we’ve built MyGet.org, a multi-tenant software-as-a-service. In this session we’ll discuss architecture, commands, events, access control, multi tenancy and how to mix and match those things together. Learn about the growing pains and misconceptions we had on the Windows Azure platform. The result just may be a reliable, cost-effective solution that scales.
Citation preview
How it’s made: MyGetMaarten Balliauw@maartenballiauw
Who am I? Maarten Balliauw Daytime: Technical Evangelist, JetBrains
Co-founder of MyGet Author – Pro NuGet http://amzn.to/pronuget AZUG Focus on web ASP.NET MVC, Windows Azure, SignalR, ... MVP Windows Azure & ASPInsider
http://blog.maartenballiauw.be @maartenballiauw
Who am I? Maarten Balliauw Daytime: Technical Evangelist, JetBrains
Co-founder of MyGet Author – Pro NuGet http://amzn.to/pronuget AZUG Focus on web ASP.NET MVC, Windows Azure, SignalR, ... MVP Windows Azure & ASPInsider
http://blog.maartenballiauw.be @maartenballiauw
Who am I? Maarten Balliauw Daytime: Technical Evangelist, JetBrains
Co-founder of MyGet Author – Pro NuGet http://amzn.to/pronuget AZUG Focus on web ASP.NET MVC, Windows Azure, SignalR, ... MVP Windows Azure & ASPInsider
http://blog.maartenballiauw.be @maartenballiauw
Agenda NuGet? MyGet? How we started What we did not know Our first architecture Our second architecture Multi-tenancy ACS Tough times (learning moments) When business meets technology Conclusion
NuGet? MyGet?
NuGet? MyGet?
NuGet? MyGet?
Why MyGet? Safely store your IP with us Creating packages is hard. We have Build Services! Granular security Activity streams Symbol server Analytics
I’m not alone! Xavier Decoster@xavierdecoster
Yves Goeleven@yvesgoeleven
Also known as @MyGetTeam
How we started
The real begin? May 09, 2011
NuPack! Using OData as their feeds Which is some sort of WCF… Multiple feeds? Exchanged some ideas with Xavier Prototyped something during TechDays Belgium, 2011
Prototype online! May 31, 2011
Technologies used? Windows Azure Windows Azure Table Storage & Blob Storage Windows Azure ACS (no way I’m typing another user registration) ASP.NET MVC 2 MEF
Here’s some code from back then… [Authorize]
public class FeedController : Controller {
public ActionResult List() {
var privateFeedTable = PrivateFeedTable.Create();
var privateFeeds = privateFeedTable.GetAll( f => f.PartitionKey == User.Identity.Name.ToBase64());
var model = new PrivateFeedListViewModel();
foreach (var privateFeed in privateFeeds.Where(f => f.IsVisible)) {
var privateFeedViewModel = new PrivateFeedViewModel();
model.Items.Add(AutoMapper.Mapper.Map(privateFeed, privateFeedViewModel));
}
return View(model);
How about this one? try
{
privateFeedNuGetPackageTable.Add(privateFeedPackage);
}
catch
{
// Omnomnom!}
Best practices used back then?
Architecture at the time? Cloud Services: one web role doing all work Storage: one storage account Windows Azure Access Control Service
What we did not know…
Users would come! Grew from 5 feeds to 70 feeds in a few weeks 10 feeds per week added thereafter
Data would come! One user pushed 1.300 packages worth 1 GB of storage Others started pushing CI packages Others seemed to be copying www.nuget.org
ReSharper time!
ReSharper time! A lot of refactoring done Direct data access -> repositories Repositories used by services Services used by controllers
Using best practices SOLID and DRY (well, not everywhere but refactoring takes time) Running on two instances (availability, yay!)
We became a startup Someone mentioned they would pay for our service Think about business model Volume of feeds and packages kept going up Users in EU and US
Our first architecture
STORAGE
Our first architecture - code
MYGET.CORE
Awesome! Best practices! Layers! Typical business application architecture!
Not so awesome… Best practices!Are they? Layers!No spaghetti code but lasagna code Typical business application architecture!Proved to be very inflexible
WEB ROLE
STORAGE
EU-WEST NORTH CENTRAL US
WEB ROLE
STORAGE
Our first architecture - infrastructure
Awesome! Datacenters nearby our users Centralizes storage Packages on CDN for faster throughput DNS fail-over if one of the DC’s went down
No so awesome… Datacenters nearby our usersOr not? Centralizes storage Speed of light! USA was slow! Packages on CDN for faster throughput Sync issues, downtime, … DNS fail-over if one of the DC’s went down Seems not every ISP follows DNS standards
We persisted! Local caching in USA added 2 instances in EU, 3 in the USA Speed of light! Syncing all data kept being slow Populating cache was a nightmare CDN kept having issues Of 3 instances, only 1 was being used with enough load (60%)
We were growing! We had public subscription plans We added enterprise tenants (multi-tenancy added) Resulting in… Architecture became complex Caching and syncing became complex
ReSharper time!
Our second architecture
We had a look at our workloads Managing feeds and packages Doesn’t matter much where (sync vs. bandwidth)
Downloading packages May matter where, let the tenant decide
Builds Who cares where!
WEB ROLE
STORAGE
EU-WEST
STORAGE
EU-NORTH
STORAGE ACCT PER TENANT
OTHER DATACENTERS
STORAGE ACCT (SOME TENANTS)
VIRTUAL MACHINE (BUILDS)
Our 2nd architecture - infrastructure
Our 2nd architecture - code
STORAGE STORAGE ACCT PER TENANT
Our first architecture… … was scaled across the globe … but as synchronous as it could be … prone to all issues with latency vs. synchrony
Event Driven Architecture?*
*disclaimer: we borrowed some concepts from EDA
EDA in MyGet Some actions put an ICommand on a queue(ground rule: if it can’t be done in 1 write, use ICommand)
All actions complete with an IEvent on a queue Handlers can subscribe to ICommand and IEvent Handlers are idempotent and not depending on others
Example: log in 2 operations: 1 read, 1 write Read the profile Store the profile with LastLogin date No use of ICommand Finishes with UserLoggedInEvent
Example: change feed owner Many operations! Read two user profiles Read current access rights Change access rights Push new privileges to SymbolSource.org
One command, one event ChangeFeedOwnerCommand FeedOwnerChangedEvent
Example: change feed owner
ChangeFeedOwner
CommandHandler
ChangeFeedOwner
Command
FeedOwnerChanged
Event
SymSrcHandler<FeedOwnerChangedEve
nt>
SymSrcEvent
ActivityLogHandler
<FeedOwnerChangedEvent>
Gain? We now run on 2 instances, mostly for redundancy Average CPU usage? 20% (across machines) Flexibility! Way easier to implement new features! New feature: activity log Simply subscribe to events we want to see in that log
Storage No relational database (why not?)
Event-driven architecture How do you store a feed’s packages and versions in an optimal way?
Three important values: feed name, package id, package version Table per feed Package id = PartitionKey Package version = RowKey
Storage Reading 1.000 rows and deserializing them is SLOW (many seconds) We cache some tables on blob storage 1.000 rows in serialized JSON = small Loading one file over HTTP = fast Searching in memory through 1.000 rows = fast
Cache update subscribed to IEvent
Multi-tenancy
How to bring this into code Just like Request, Response and User: a Tenant is contextual All those are potentially different for every request DI containers with lifetimes exist…
Resolving a tenant public interface ITenantContext {
Tenant Tenant { get; }
}
// Registration in container
builder.RegisterType<RequestTenantContext>() .As<ITenantContext>().InstancePerLifetimeScope();
public class RequestTenantContext {
public Tenant Resolve(RequestContext context, IEnumerable<Tenant> tenants) {
var hostname = context.HttpContext.Request.Url.Host;
return tenants.FirstOrDefault(t => t.HostName == hostname);
}
}
Windows Azure Access Control Service
Imagine managing this! Multiple applications www.myget.org staging.myget.org localhost:1196 Customer1.myget.org Customer2.myget.org …
Multiple identity providers Who wants Microsoft Account? Google anyone? Oh, your custom ADFS? Sure!
ACS = identity orchestration
ACS for MyGet No more user registration One single trust relationship (= less coding) Microsoft Account, Yahoo!, Google, Facebook Other IdP’s (tenants and our own)*
*We built many others and are working on a spin-off http://socialsts.com (Twitter, LinkedIn, Microsoft Account, …)
One small trick… var realm = TenantContext.Tenant.Realm;
var allowedAudienceUris = FederatedAuthentication.FederationConfiguration .IdentityConfiguration .AudienceRestriction .AllowedAudienceUris;
if (allowedAudienceUris.All( audience => audience.ToString() != TenantContext.Tenant.Realm))
{
allowedAudienceUris.Add(new Uri(TenantContext.Tenant.Realm));}
Tough times(learning moments)
Huge downtime on July 2nd, 2012 Symptoms: Users complaining about “downtime” No monitoring SMS alert Half an hour later: “site up!”, “site down!”, “site up!”, “site down!” SMS
alerts No sign of issues in the Windows Azure Management portal
But what’s the cause? We just deployed our multi-tenant architecture We just enabled storage analytics ELMAH was showing storage throttling 16.000 unprocessed commands and events in queue
Full story at http://blog.myget.org/post/2012/07/02/Site-issues-on-July-2nd-2012.aspx
Huge downtime on July 2nd, 2012 One, simple piece of code… GetHashCode() on Package object faulty GetHashCode() used to track object in data context (new vs. update) 2 objects with the same hashcode = UnhandledException
Full story at http://blog.myget.org/post/2012/07/02/Site-issues-on-July-2nd-2012.aspx
An exception killed the site? WTF?!? No. We caught any Exception and back then, blindly retry operations Resulting in 16.000 commands and events being retried continuously Causing storage throttling Causing the website to retry reads Causing more throttling Starving IIS worker threads
Lessons learned? A simple bug can halt the entire application Only retry transient errors Our monitoring wasn’t optimal Our code wasn’t optimal (code from back when MyGet was a blog post…)
Huge downtime February 23rd, 2013 Symptoms: Everything down Furious users on social media Windows Azure Management Portal Down Furious tweets about #WindowsAzure
The cause? Global outage of Windows Azure due to an expired SSL certificate
on storage
Full story at http://blog.myget.org/post/2013/02/24/We-were-down.aspx
Considerations and lessons learned Move storage to HTTP instead of HTTPS? Windows Azure down globally impacts us quite a bit Fail-over to another solution costs money and lots of effort Decided against it for now
Considering off-Windows Azure backups of at least all packages
Full story at http://blog.myget.org/post/2013/02/24/We-were-down.aspx
One more! New features… “Retention policies” introduced Seemed to be a success! 3+ million commands and events in queue
Solution: scale out 20 instances did it in a few minutes
Solution for the future: feature toggling
But overall…
When businessmeets technology
We’re constantly being bitten Introduce a new beta feature Come up with a revenue model See the feature needs serious rewriting (metering)
Lesson learned? Think revenue early on.
Measure everything, test assumptions “The Lean Startup” book says this Don’t build it yourself: Google Analytics
this is why we built username/password registration,seems a lot of people prefer typing instead of one click
we must keep investing in Build Services
feed discovery is more popular than we imagined from zero reactions on our blog and Twitterthe technical fear we had about “download as ZIP” consuming too much server resources? Seems 19 people used it this month. *yawn*
Conclusion
Conclusion NuGet? MyGet? How we started What we did not know Our architecture Multi-tenancy Though times provide learning Measurement too
Thank you!http://
blog.maartenballiauw.be@maartenballiauw
http://amzn.to/pronugethttp://www.myget.org