Scaling
Scaling allows you to adjust the resources you are using to meet varying demand. You are able to configure rules to control your scaling such that user experience is optimal without wasting money on unused resources.
Scaling can always be performed manually, but Azure also provides two ways of scaling automatically.
- Autoscale - automatic scaling based on user defined rules (75% CPU)
- Automatic scaling - higher level of automation where only the param is specified (CPU)
All Azure autoscaling is performed in-out, rather than up-down.
As always, the best options are reserved for more expensive services:
| Factor | Autoscale | Automatic scaling |
|---|---|---|
| Tier | Standard+ | Premium+ |
| Rule-based | Yes | No |
| Schedule-based | Yes | No |
| Always-ready | No | Yes |
| Prewarming | No | Yes |
| Scaling maximum | No | Yes |
Although autoscaling can help to eliminate downtime as demand increases, it is not a silver bullet. If your app sees a surge in usage, when it is currently only using a small number of instances, your app may still experience downtime while Azure spins up more instances.
Autoscaling
Autoscaling is the less premium of Azure's automatic scaling options. Azure offers two ways of automatically scaling with this option:
Rule-based: scaling based on a resource metricSchedule-based: scaling based on particular dates, times or days of the week
You can combine these two methods together. For example, scaling if the CPU hits a specific level of utilisation, but only at specific hours of the day.
Rules
Rules can be defined to control both when an app scales out and when it scales in. Rule-based scaling allows you to scale based on the following metrics:
CPU %Memory %Disk queue length: pending I/O tasksHttp queue length: pending HTTP requestsData in: bytes received across all instancesData out: bytes sent across all instances
Autoscale trend analysis is a two step aggregation process.
timeaggregation
Chosen metric is aggregated across an intrinsic metric time-grain (usually 1 minute)- avg | min | max | sum | last | count
durationaggregation
Output fromtimeaggregation is further aggregated across a chosen duration (min 5 minutes)- avg | min | max | sum | last | count
time: avg(cpu)
duration: min(), 10 minutes
threshold: > 75%
If the minimum average CPU utilisation over last 10 minutes exceeded 75% then the app will scale out.
There is a configurable cooldown period following all scaling events which prevent further scaling from occurring for a period. This is important because starting and stopping instances takes time, so there is likely to be some time where the effect of the scaling cannot be seen in the measured metrics. The minimum cooldown is 5 minutes.
Combining rules
Different rules which control both scale-outs and scale-ins can be defined in the same autoscale condition. One common way of using this feature is to define a single condition which includes both the scale in and out rules for a single metric.
One condition can also combine rules which measure on different metrics:
- HTTP queue exceeds 10, scale out
- HTTP queue is zero, scale in
- CPU % exceeds 70%, scale out
- CPU % subceeds 50%, scale in
The scale-out condition is met if any of the scale out rules are met.
The scale-in condition is met if all of the scale in rules are met.
Automatic scaling vs. autoscaling
Azure's premium "automatic scaling" offers a few benefits over autoscaling.
- Individual apps in a plan are scaled differently and independently
- You do not need to specify the scaling rules
- You can set a per-app maximum number of instances
- Useful when you app is connected to a system which cannot scale as fast
Configuration
App Service plan > Scale out > Settings > Rules Based > Configure
Select Custom autoscale to start configuring. A default scale condition is present already and cannot be deleted. This condition is met when no other scale conditions are met. Scale conditions can either scale to a specific number of instances or scale based on a metric.
A metric-based condition can specify instance count minimums and maximums that it is allowed to scale between. The max number of instances cannot exceed the max allowed by the App Service plan tier.
You can add multiple rules to a scale condition. When creating custom rules, you define:
- Metric
- Aggregator function
- Duration
- Threshold
- Action (scale in/out)
Monitoring
Azure portal allows you to monitor when autoscaling has occurred through the Run history chart. This chart tracks the number of instances over time, along with the scale conditions that triggered each change.
Best practice
Choose your thresholds wisely to avoid "flapping". Flapping is the term for repetitive scale-in/out behaviour when thresholds are placed too close together. Example:
Scale-out on thread count > 600
Scale-in on thread count <= 600
- Average thread count reaches 625
- App scales out
- Average thread count drops to 575 due to shared load
- App scales in
- Average thread count rises to 625 due to higher traffic share
- Ad nauseam
![NOTE]
Interestingly, Azure actually does some clever stuff to stop autoscaling rules from "flapping".