Scaling

+
~

Scaling allows you to adjust the resources you are using to meet varying demand. You are able to configure rules to control your scaling such that user experience is optimal without wasting money on unused resources.

Scaling can always be performed manually, but Azure also provides two ways of scaling automatically.

All Azure autoscaling is performed in-out, rather than up-down.

As always, the best options are reserved for more expensive services:

Factor Autoscale Automatic scaling
Tier Standard+ Premium+
Rule-based Yes No
Schedule-based Yes No
Always-ready No Yes
Prewarming No Yes
Scaling maximum No Yes
Bear in mind

Although autoscaling can help to eliminate downtime as demand increases, it is not a silver bullet. If your app sees a surge in usage, when it is currently only using a small number of instances, your app may still experience downtime while Azure spins up more instances.

Autoscaling

Autoscaling is the less premium of Azure's automatic scaling options. Azure offers two ways of automatically scaling with this option:

You can combine these two methods together. For example, scaling if the CPU hits a specific level of utilisation, but only at specific hours of the day.

Rules

Rules can be defined to control both when an app scales out and when it scales in. Rule-based scaling allows you to scale based on the following metrics:

Autoscale trend analysis is a two step aggregation process.

  1. time aggregation
    Chosen metric is aggregated across an intrinsic metric time-grain (usually 1 minute)
    • avg | min | max | sum | last | count
  2. duration aggregation
    Output from time aggregation is further aggregated across a chosen duration (min 5 minutes)
    • avg | min | max | sum | last | count
Example

time: avg(cpu)
duration: min(), 10 minutes
threshold: > 75%

If the minimum average CPU utilisation over last 10 minutes exceeded 75% then the app will scale out.

There is a configurable cooldown period following all scaling events which prevent further scaling from occurring for a period. This is important because starting and stopping instances takes time, so there is likely to be some time where the effect of the scaling cannot be seen in the measured metrics. The minimum cooldown is 5 minutes.

Combining rules

Different rules which control both scale-outs and scale-ins can be defined in the same autoscale condition. One common way of using this feature is to define a single condition which includes both the scale in and out rules for a single metric.

One condition can also combine rules which measure on different metrics:

Note

The scale-out condition is met if any of the scale out rules are met.
The scale-in condition is met if all of the scale in rules are met.

Automatic scaling vs. autoscaling

Azure's premium "automatic scaling" offers a few benefits over autoscaling.

Configuration

App Service plan > Scale out > Settings > Rules Based > Configure

Select Custom autoscale to start configuring. A default scale condition is present already and cannot be deleted. This condition is met when no other scale conditions are met. Scale conditions can either scale to a specific number of instances or scale based on a metric.

A metric-based condition can specify instance count minimums and maximums that it is allowed to scale between. The max number of instances cannot exceed the max allowed by the App Service plan tier.

You can add multiple rules to a scale condition. When creating custom rules, you define:

Monitoring

Azure portal allows you to monitor when autoscaling has occurred through the Run history chart. This chart tracks the number of instances over time, along with the scale conditions that triggered each change.

Best practice

Choose your thresholds wisely to avoid "flapping". Flapping is the term for repetitive scale-in/out behaviour when thresholds are placed too close together. Example:

Scale-out on thread count > 600 
Scale-in on thread count <= 600

![NOTE]
Interestingly, Azure actually does some clever stuff to stop autoscaling rules from "flapping".