Scaling

Scaling allows you to adjust the resources you are using to meet varying demand. You are able to configure rules to control your scaling such that user experience is optimal without wasting money on unused resources.

Scaling can always be performed manually, but Azure also provides two ways of scaling automatically.

Autoscale - automatic scaling based on user defined rules (75% CPU)
Automatic scaling - higher level of automation where only the param is specified (CPU)

All Azure autoscaling is performed in-out, rather than up-down.

As always, the best options are reserved for more expensive services:

Factor	Autoscale	Automatic scaling
Tier	Standard+	Premium+
Rule-based	Yes	No
Schedule-based	Yes	No
Always-ready	No	Yes
Prewarming	No	Yes
Scaling maximum	No	Yes

Bear in mind

Although autoscaling can help to eliminate downtime as demand increases, it is not a silver bullet. If your app sees a surge in usage, when it is currently only using a small number of instances, your app may still experience downtime while Azure spins up more instances.

Autoscaling

Autoscaling is the less premium of Azure's automatic scaling options. Azure offers two ways of automatically scaling with this option:

Rule-based: scaling based on a resource metric
Schedule-based: scaling based on particular dates, times or days of the week

You can combine these two methods together. For example, scaling if the CPU hits a specific level of utilisation, but only at specific hours of the day.

Rules

Rules can be defined to control both when an app scales out and when it scales in. Rule-based scaling allows you to scale based on the following metrics:

CPU %
Memory %
Disk queue length: pending I/O tasks
Http queue length: pending HTTP requests
Data in: bytes received across all instances
Data out: bytes sent across all instances

Autoscale trend analysis is a two step aggregation process.

time aggregation
Chosen metric is aggregated across an intrinsic metric time-grain (usually 1 minute)
- avg | min | max | sum | last | count
duration aggregation
Output from time aggregation is further aggregated across a chosen duration (min 5 minutes)
- avg | min | max | sum | last | count

Example

time: avg(cpu)
duration: min(), 10 minutes
threshold: > 75%

If the minimum average CPU utilisation over last 10 minutes exceeded 75% then the app will scale out.

There is a configurable cooldown period following all scaling events which prevent further scaling from occurring for a period. This is important because starting and stopping instances takes time, so there is likely to be some time where the effect of the scaling cannot be seen in the measured metrics. The minimum cooldown is 5 minutes.

Combining rules

Different rules which control both scale-outs and scale-ins can be defined in the same autoscale condition. One common way of using this feature is to define a single condition which includes both the scale in and out rules for a single metric.

One condition can also combine rules which measure on different metrics:

HTTP queue exceeds 10, scale out
HTTP queue is zero, scale in
CPU % exceeds 70%, scale out
CPU % subceeds 50%, scale in

Note

The scale-out condition is met if any of the scale out rules are met.
The scale-in condition is met if all of the scale in rules are met.

Automatic scaling vs. autoscaling

Azure's premium "automatic scaling" offers a few benefits over autoscaling.

Individual apps in a plan are scaled differently and independently
You do not need to specify the scaling rules
You can set a per-app maximum number of instances
- Useful when you app is connected to a system which cannot scale as fast

Configuration

App Service plan > Scale out > Settings > Rules Based > Configure

Select Custom autoscale to start configuring. A default scale condition is present already and cannot be deleted. This condition is met when no other scale conditions are met. Scale conditions can either scale to a specific number of instances or scale based on a metric.

A metric-based condition can specify instance count minimums and maximums that it is allowed to scale between. The max number of instances cannot exceed the max allowed by the App Service plan tier.

You can add multiple rules to a scale condition. When creating custom rules, you define:

Metric
Aggregator function
Duration
Threshold
Action (scale in/out)

Monitoring

Azure portal allows you to monitor when autoscaling has occurred through the Run history chart. This chart tracks the number of instances over time, along with the scale conditions that triggered each change.

Best practice

Choose your thresholds wisely to avoid "flapping". Flapping is the term for repetitive scale-in/out behaviour when thresholds are placed too close together. Example:

Scale-out on thread count > 600 
Scale-in on thread count <= 600

Average thread count reaches 625
App scales out
Average thread count drops to 575 due to shared load
App scales in
Average thread count rises to 625 due to higher traffic share
Ad nauseam

![NOTE]
Interestingly, Azure actually does some clever stuff to stop autoscaling rules from "flapping".