Upload
pnsol-slides
View
97
Download
2
Embed Size (px)
Citation preview
© Predictable Network Solutions Ltd 2017 www.pnsol.com
PEnDAR Focus Group
Performance Ensurance by Design, Assuring RequirementsSecond Webinar
© Predictable Network Solutions Ltd 2017 www.pnsol.com
RecapPrevious webinar recording at https://attendee.gotowebinar.com/register/1615742612853821699Also on SlideShare: https://www.slideshare.net/pnsol-slides
© Predictable Network Solutions Ltd 2017 www.pnsol.com
3
Goals• Consider how to enable Validation & Verification
of cost and performance • For distributed and hierarchical systems • via sophisticated but easy-to-use tools• Supporting both initial and ongoing incremental
development
• Provide early visibility of cost/performance hazards• Avoiding costly failures• Maximising the chances of successful in-budget
delivery of acceptable end-user outcomes
Partners
Supported by:
PEnDAR project
© Predictable Network Solutions Ltd 2017 www.pnsol.com
4
Our offer to youLearn what we know about the laws of the distributed world:• Where the realm of the possible lies;
• How to push the envelope in a way that benefits you most;
• How to innovate without getting your fingers burnt.
Get an insight into the future:• Understand more about achieving
sustainability and viability;• See where competition may lie in the future.
Our ‘ask’Help us to find out:• In which markets/sectors/application
domains will the benefits outweigh the costs?• Which are the major benefits?
• Which are just ‘nice to have’?
• How could this methodology be consumed?• Tools?• Services?
Role of the focus group
© Predictable Network Solutions Ltd 2017 www.pnsol.com
Outline of a coherent methodology
© Predictable Network Solutions Ltd 2017© Predictable Network Solutions Ltd 2017 www.pnsol.com
6
Out
com
es
Deliv
ered
Resources Consum
ed
Individual impact
Variability
Exception/failure
Syste
m im
pactSelf-induced
Externally created
Indi
vidua
l impa
ct Mitigation
Correlation
System impact
Propagation
Scale
Schedulability
CapacityDistance
Number
Tim
e
Space Density
© Predictable Network Solutions Ltd 2017 www.pnsol.com
7Performance/resource analysis
Sub-system
Sub-system
Sub-system
∆Q
Shared resources
Starting with a functional decomposition:• Take each subsystem in isolation
• analyse performance• modelling remainder of system as ∆Q
• quantify resource consumption• may be dependent on the ∆Q
• Examine resource sharing• within system – quantify resource costs• between systems – quantify opportunity cost
• Successive refinements• consider couplings• iterate to fixed point
Qua
ntita
tive
Tim
elin
ess
Agre
emen
t
Not
toda
y
© Predictable Network Solutions Ltd 2017
8
www.pnsol.com
Recap of ∆Q
• ∆Q is a measure of the ‘quality impairment’ of an outcome• The extent of deviation from ‘instantaneous and infallible’• Nothing in the real world is perfect so ∆Q always exists
• ∆Q is conserved• A delayed outcome can’t be ‘undelayed’• A failed outcome can’t be ‘unfailed’
• ∆Q can be traded• E.g. accept more delay in return for more certainty of completion
• ∆Q can be represented as an improper random variable• Combines continuous and discrete probabilities• Thus encompasses normal behaviour and exceptions/failures in one model
© Predictable Network Solutions Ltd 2017 www.pnsol.com
9
• Decompose the performance requirement following system structure• Using engineering judgment/best
practice/cosmic constraints• Creates initial subsystem
requirements
• Validate the decomposition by re-combining via the behaviour• Formally and automatically checkable
• Can be part of continuous integration • Captures interactions and couplings
• Necessary and sufficient:• IF all subsystems function correctly
and integrate properly • AND all subsystems satisfy their
performance requirements• THEN the overall system will meet its
performance requirement
• Apply this hierarchically until• Behaviour is trivially provable OR• Have a complete set of testable
subsystem verification/acceptance criteria
Validating performance requirements
© Predictable Network Solutions Ltd 2017 www.pnsol.com
Quantitative worked example
© Predictable Network Solutions Ltd 2017 www.pnsol.com
11Generic RPC
Front end Back endNetwork
6 Transmit Response
2 Send
request
4 Process request
5 Commit results
1 Construct
transaction
7 Receive
Response
0 Idle
8 Complete
transaction
3 Receive request
© Predictable Network Solutions Ltd 2017 www.pnsol.com
12Generic RPC - observables
Front end Back endNetwork
6 Transmit Response
2 Send
request
4 Process request
5 Commit results
1 Construct
transaction
7 Receive
Response
0 Idle
8 Complete
transaction
3 Receive request
A
ED
CB
F
∆Q(A B)⇝ ∆Q(B C)⇝
∆Q(C
D)⇝
∆Q(D E)⇝∆Q(E F)⇝
© Predictable Network Solutions Ltd 2017 www.pnsol.com
13Generic RPC – abstract performance
Front end
1 Construct
transaction
0 Idle
8 Complete
transaction
A
E
B
F
Network
6 Transmit Response
2 Send
request
7 Receive
Response
3 Receive request
D
C
∆Q(C D)⇝
© Predictable Network Solutions Ltd 2017 www.pnsol.com
14Generic RPC – abstract performance
Front end
1 Construct
transaction
0 Idle
8 Complete
transaction
A
E
B
F
∆Q(B E)⇝
© Predictable Network Solutions Ltd 2017 www.pnsol.com
15Generic RPC – abstract performance
0 Idle
A
F
∆Q(A F)⇝
© Predictable Network Solutions Ltd 2017 www.pnsol.com
16
• User performance requirement is a bound on ∆Q(A F), e.g.⇝• 50% within 500ms • 95% within 750ms• etc.
• Decompose this as ∆Q(A F) = ∆Q(A B) ∆Q(B E) ⇝ ⇝ ⊕ ⇝ ⊕∆Q(E F)⇝• How these terms combine depends
on the behaviour• Will examine the interaction of
performance and behaviour next
Performance requirement
© Predictable Network Solutions Ltd 2017 www.pnsol.com
17
• Since the request or its acknowledgement may be lost, the process retries after a 333ms timeout• Failure mitigation
• After three attempts the transaction is deemed to have failed• Failure propagation
• Can unroll the loop for a clearer picture
Front-end behaviour (1)
A
E
B
F
© Predictable Network Solutions Ltd 2017 www.pnsol.com
18Front-end behaviour (2)
© Predictable Network Solutions Ltd 2017 www.pnsol.com
19
• We have a non-preemptive first-to-finish synchronisation between receiving the response and the timeout• Which execution path is taken is a
probabilistic choice depending on the ∆Q(B E)⇝• We combine the ∆Qs of the
different paths with these probabilities
Assumptions:For an initial customer deployment over UK broadband with the server in the US East Coast:• ∆Q(B C) and ∆Q(D E) the same:⇝ ⇝• Minimum delay 95ms• 50ms variability• 1-2% loss
• ∆Q(C D) distributed between ⇝25ms and 60ms
Performance analysis (1)
© Predictable Network Solutions Ltd 2017 www.pnsol.com
20Performance analysis (2)
∆Q(C D) ⇝
∆Q(B E) ⇝
∆Q(A F) ⇝
Requirement
© Predictable Network Solutions Ltd 2017
21
www.pnsol.com
Verification
• So far, the verification task seems straightforward:• Provided the network and server performance are within the given bounds,
then the front end will deliver the required end-user experience
• However, this seems to have plenty of slack• Can we validate a different set of requirements?• How far can we relax them?
• With this model we can explore:• Varying the network performance • Varying the server performance (e.g. as a result of load)
© Predictable Network Solutions Ltd 2017 www.pnsol.com
22Performance analysis (3)
∆Q(C D) ⇝
∆Q(B E) ⇝
∆Q(A F) ⇝
Requirement
© Predictable Network Solutions Ltd 2017 www.pnsol.com
23
• Virtualised server has a limit on the rate of I/O operations• So that the underlying infrastructure
can be reasonably shared• Such parameters affect the OPEX
• Typically access is controlled via a token-bucket shaper• Allows small bursts• Limit long-term average use• This constitutes a QTA
• Need to verify both sides
• Delivered performance of the virtualised I/O thus depends on recent usage history• Load is a function of number of front-
ends accessing the service and the proportion of re-tries
• Re-tries depend on the ∆Q(B E) – ⇝which in turn depends on the performance of the I/O system!
• This has been modeled here via a probability of taking a ‘slow path’ in the server
Server resource consumption
© Predictable Network Solutions Ltd 2017 www.pnsol.com
24Performance analysis (4)
∆Q(C D) ⇝
∆Q(B E) ⇝
Requirement
∆Q(A F) ⇝
© Predictable Network Solutions Ltd 2017© Predictable Network Solutions Ltd 2017 www.pnsol.com
25
Out
com
es
Deliv
ered
Resources Consum
ed
Individual impact
Variability
Exception/failure
Syste
m im
pactSelf-induced
Externally created
Indi
vidua
l impa
ct Mitigation
Correlation
System impact
Propagation
Scale
Schedulability
CapacityDistance
Number
Tim
e
Space Density
Impact of retry behaviour on delivered ∆Q
Impact of retry behaviour on server load
Effect of server loading on
delivered ∆Q
Server I/O constraint
triggered by load
Coupling of retries/load/resp
onse delay
© Predictable Network Solutions Ltd 2017 www.pnsol.com
Relation to challenges in ICTE.g. virtualisation
© Predictable Network Solutions Ltd 2017 www.pnsol.com
27
Creates:• New digital supply chain hazards
• Loss of mitigation opportunities• Aspects of performance are no longer
under direct control
• Additional ∆Q• As seen in example• This will be ‘traded’ to extract value
• Who benefits? The user or the infrastructure provider?
Now need to consider:• Ongoing resource costs
• OPEX rather than (sunk) CAPEX
• Opportunity costs• Amazon Web Service pricing of compute
instances, SQS and lambda micro-services shows engagement with opportunity cost issues
Virtualisation
© Predictable Network Solutions Ltd 2017 www.pnsol.com
28
Need quantitative intent• At the very least, some indication
of what is really not acceptable• Comparison with something else
(that is measurable) will do• But not “the best possible” or
“whatever will sell”
Can approximate best possible ∆Q• Use to quickly check degree of
feasibility• Add up minimum times• Means and variances also add
• Puts bounds on future improvement• Early visibility of performance
hazards• Early mitigation
General ICT application
© Predictable Network Solutions Ltd 2017
29
www.pnsol.com
Summary
• System performance validation consists of:• Analysing the interaction of system behaviour and subsystem performance
requirements • Showing that this ‘adds up’ to meet the quantified user requirements (QTA)
• System performance verification consists of showing that subsystems meet their QTAs• By analysis and/or measurement of ∆Q of the subsystems’ observable behaviour
• This provides acceptance and/or contractual criteria for third-party subsystems or services• Substantially reduces performance integration risks and hence re-work
© Predictable Network Solutions Ltd 2017 www.pnsol.com
What now ?
© Predictable Network Solutions Ltd 2017
31
www.pnsol.com
The questions
Given the right inputs, this sort of analysis takes relatively little time and effort, however:• Quantitative intent is fundamental• Where can this be found?
• Which parts of the system decomposition are forced?• E.g. use of existing infrastructure, such as a network
• How to handle subsystems with undocumented characteristics?• Can mitigate this with in-life measurement of ∆Q driven by the validation
• Forewarned is forearmed• Inexpensive, focused data rather than wide-angle ‘big data’ approach
© Predictable Network Solutions Ltd 2017 www.pnsol.com
Thank you!