Switch Redundancy: STP, RSTP, and Link Aggregation
In production networks, a single switch failure can bring down your entire infrastructure. That's why we build redundancy into our networks. But redundancy at Layer 2 creates a dangerous problem: loops. Let's learn how to build redundant switch networks safely.
Imagine a city with multiple roads between neighborhoods. If you send a car without a destination, it could drive in circles forever, clogging all the roads. That's what happens with network loops. STP is like a traffic system that blocks some roads to prevent endless loops while keeping alternate routes ready.
The Problem: Layer 2 Loops
Why Are Loops Dangerous?
Unlike IP packets (which have a TTL that expires), Ethernet frames have no TTL. If a frame enters a loop, it circulates forever, multiplying exponentially.
[Switch A]
/ \
/ \
[Switch B]----[Switch C]
\ /
\ /
[Server]
Frame sent by Server:
- Goes to Switch B AND Switch C
- Switch B forwards to A and C
- Switch C forwards to A and B
- Each switch forwards again...
- Network saturates in seconds!
Broadcast Storm: A single broadcast frame in a loop can generate millions of copies per second, causing complete network failure.
Symptoms of a Layer 2 Loop
- Network becomes extremely slow or unresponsive
- Switch CPU at 100%
- Port LEDs blinking frantically
- Duplicate packets on Wireshark
- MAC address table constantly changing (MAC flapping)
Solution 1: Spanning Tree Protocol (STP)
What is STP?
STP (IEEE 802.1D) prevents loops by logically blocking redundant paths. It keeps one active path and blocks others, activating them only if the primary path fails.
STP Facts:
- Operates at Layer 2
- Uses BPDUs (Bridge Protocol Data Units) to communicate
- Convergence time: 30-50 seconds (slow!)
- Creates a loop-free tree topology
How STP Works
- Elect a Root Bridge: One switch becomes the "center" of the tree
- Calculate Path Costs: Each switch calculates the cost to reach the root
- Select Ports: Each switch selects the best port toward the root
- Block Redundant Ports: Other ports are blocked to prevent loops
STP Port States
| State | Duration | Function |
|---|---|---|
| Blocking | 20 sec | Receives BPDUs only, no data forwarding |
| Listening | 15 sec | Processes BPDUs, determines role |
| Learning | 15 sec | Builds MAC address table, no forwarding |
| Forwarding | - | Normal operation, forwards frames |
| Disabled | - | Administratively shut down |
Total convergence time: 30-50 seconds. During this time, no traffic passes through the port!
STP Port Roles
| Role | Description |
|---|---|
| Root Port (RP) | Best path to root bridge (one per non-root switch) |
| Designated Port (DP) | Best port on each segment toward the root |
| Blocked Port | Redundant port, does not forward traffic |
Root Bridge Election
The switch with the lowest Bridge ID becomes the root. Bridge ID = Priority (default 32768) + MAC Address.
# Bridge ID comparison:
Switch A: 32768 + AA:AA:AA:AA:AA:AA
Switch B: 32768 + BB:BB:BB:BB:BB:BB
Switch C: 32768 + CC:CC:CC:CC:CC:CC
→ Switch A wins (lowest MAC)
Always manually configure your root bridge! Set a lower priority on your core switch.
STP Configuration (Cisco)
! Set this switch as root bridge (lowest priority)
Switch(config)# spanning-tree vlan 1 root primary
! Or manually set priority (must be multiple of 4096)
Switch(config)# spanning-tree vlan 1 priority 4096
! Set secondary root bridge
Switch(config)# spanning-tree vlan 1 root secondary
! View STP status
Switch# show spanning-tree
Switch# show spanning-tree vlan 1
! View root bridge info
Switch# show spanning-tree root
STP Configuration (Linux Bridge)
# Enable STP on a bridge
sudo brctl stp br0 on
# Set bridge priority (lower = more likely to be root)
sudo brctl setbridgeprio br0 4096
# View STP status
brctl showstp br0
# Using ip command (modern)
sudo ip link set br0 type bridge stp_state 1
bridge link show
Solution 2: Rapid Spanning Tree (RSTP)
Why RSTP?
RSTP (IEEE 802.1w) dramatically improves convergence time from 30-50 seconds to 1-2 seconds.
RSTP Improvements:
- Faster convergence (1-2 seconds)
- Backward compatible with STP
- New port roles: Alternate and Backup
- Active topology change detection
RSTP Port States
RSTP simplifies port states to three:
| RSTP State | STP Equivalent | Learns MAC? | Forwards? |
|---|---|---|---|
| Discarding | Blocking, Listening | No | No |
| Learning | Learning | Yes | No |
| Forwarding | Forwarding | Yes | Yes |
RSTP Port Roles
| Role | Description |
|---|---|
| Root Port | Best path to root (same as STP) |
| Designated Port | Best port on segment (same as STP) |
| Alternate Port | Backup path to root (instant failover!) |
| Backup Port | Backup to designated port on same segment |
RSTP Configuration (Cisco)
! Enable RSTP (Rapid PVST+ on Cisco)
Switch(config)# spanning-tree mode rapid-pvst
! Verify
Switch# show spanning-tree summary
Solution 3: Multiple Spanning Tree (MSTP)
Why MSTP?
In networks with many VLANs, running STP per VLAN wastes resources. MSTP (IEEE 802.1s) groups VLANs into instances.
# Instead of:
VLAN 1 → STP Instance 1
VLAN 2 → STP Instance 2
VLAN 3 → STP Instance 3
... (100 VLANs = 100 STP instances!)
# MSTP does:
VLANs 1-50 → MST Instance 1
VLANs 51-100 → MST Instance 2
(2 instances for 100 VLANs)
MSTP Configuration (Cisco)
! Enable MSTP
Switch(config)# spanning-tree mode mst
! Configure MST region
Switch(config)# spanning-tree mst configuration
Switch(config-mst)# name MYREGION
Switch(config-mst)# revision 1
Switch(config-mst)# instance 1 vlan 1-50
Switch(config-mst)# instance 2 vlan 51-100
Switch(config-mst)# exit
! Set root for instance
Switch(config)# spanning-tree mst 1 root primary
Switch(config)# spanning-tree mst 2 root secondary
! Verify
Switch# show spanning-tree mst configuration
Switch# show spanning-tree mst
Solution 4: Link Aggregation (LACP)
What is Link Aggregation?
Link Aggregation combines multiple physical links into one logical link, providing:
- Increased Bandwidth: 2x 1Gbps = 2Gbps logical link
- Redundancy: If one link fails, traffic continues on others
- Load Balancing: Traffic distributed across links
Link aggregation is like having multiple lanes on a highway. If one lane is closed, traffic uses the other lanes. And with all lanes open, more cars can travel at once.
LACP vs Static LAG
| Feature | Static LAG | LACP (802.3ad) |
|---|---|---|
| Configuration | Manual on both ends | Automatic negotiation |
| Failure Detection | Slow (link down only) | Fast (LACP PDUs) |
| Misconfiguration | Can cause loops | Safe, won't form if mismatched |
| Recommendation | Avoid | Always use LACP |
LACP Configuration (Cisco)
! Create port-channel
Switch(config)# interface range GigabitEthernet0/1-2
Switch(config-if-range)# channel-group 1 mode active
Switch(config-if-range)# exit
! Configure the port-channel interface
Switch(config)# interface Port-channel1
Switch(config-if)# switchport mode trunk
Switch(config-if)# switchport trunk allowed vlan 1,10,20
! Verify
Switch# show etherchannel summary
Switch# show etherchannel port-channel
Switch# show lacp neighbor
LACP Modes
| Mode | Description | Pairs With |
|---|---|---|
| Active | Actively sends LACP packets | Active or Passive |
| Passive | Responds to LACP packets only | Active only |
| On | Static, no LACP | On (not recommended) |
Best practice: Use Active mode on both sides for fastest negotiation and failure detection.
LACP Configuration (Linux)
# Install bonding module
sudo modprobe bonding
# Create bond interface
sudo ip link add bond0 type bond mode 802.3ad
# Add slave interfaces
sudo ip link set eth0 master bond0
sudo ip link set eth1 master bond0
# Bring up interfaces
sudo ip link set eth0 up
sudo ip link set eth1 up
sudo ip link set bond0 up
# Assign IP
sudo ip addr add 192.168.1.10/24 dev bond0
Persistent Configuration (Netplan - Ubuntu)
# /etc/netplan/01-bond.yaml
network:
version: 2
renderer: networkd
ethernets:
eth0:
dhcp4: no
eth1:
dhcp4: no
bonds:
bond0:
dhcp4: no
interfaces:
- eth0
- eth1
addresses:
- 192.168.1.10/24
gateway4: 192.168.1.1
parameters:
mode: 802.3ad
lacp-rate: fast
mii-monitor-interval: 100
transmit-hash-policy: layer3+4
# Apply configuration
sudo netplan apply
# Verify
cat /proc/net/bonding/bond0
Persistent Configuration (RHEL/CentOS)
# /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=none
IPADDR=192.168.1.10
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
ONBOOT=yes
BONDING_OPTS="mode=802.3ad miimon=100 lacp_rate=fast xmit_hash_policy=layer3+4"
# /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
# /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
# Restart networking
sudo systemctl restart NetworkManager
# or
sudo systemctl restart network
Solution 5: Switch Stacking
What is Stacking?
Switch stacking connects multiple physical switches to act as one logical switch. They share:
- One management IP
- One configuration
- One MAC address table
- High-speed stack links (proprietary)
[Stack Master]
║ (stack cable)
[Stack Member]
║ (stack cable)
[Stack Member]
All three switches = ONE logical switch
Stack Benefits:
- Simplified management (one config for all)
- No STP needed between stack members
- Cross-stack link aggregation possible
- Automatic failover if master fails
Stacking is vendor-specific. Cisco StackWise, HPE IRF, Juniper Virtual Chassis, etc. Cannot mix vendors!
Best Practices
Design Recommendations
- Always use RSTP or MSTP (never classic STP)
- Manually set root bridge on your core switch
- Use LACP for link aggregation (not static)
- Enable PortFast on access ports (single hosts only)
- Enable BPDU Guard on access ports (prevents rogue switches)
- Document your spanning tree topology
PortFast and BPDU Guard
! Enable PortFast on access port (skip STP states)
Switch(config)# interface GigabitEthernet0/1
Switch(config-if)# spanning-tree portfast
! Enable BPDU Guard (shuts port if BPDU received)
Switch(config-if)# spanning-tree bpduguard enable
! Global PortFast for all access ports
Switch(config)# spanning-tree portfast default
! Global BPDU Guard
Switch(config)# spanning-tree portfast bpduguard default
Troubleshooting
Common Issues
| Symptom | Possible Cause | Solution |
|---|---|---|
| Network slow/down | Broadcast storm (loop) | Check STP, find the loop |
| Port stuck in blocking | STP topology issue | Check root bridge, priorities |
| Slow convergence | Using classic STP | Upgrade to RSTP |
| LACP not forming | Mode mismatch | Use Active/Active |
| Port err-disabled | BPDU Guard triggered | Check for rogue switch |
Diagnostic Commands
# Cisco
show spanning-tree
show spanning-tree summary
show spanning-tree interface Gi0/1
show spanning-tree blockedports
show etherchannel summary
show lacp neighbor
# Linux
brctl showstp br0
bridge link show
cat /proc/net/bonding/bond0
Summary
Key Takeaways:
- Layer 2 loops are catastrophic - always use STP/RSTP
- RSTP converges in 1-2 seconds vs 30-50 for classic STP
- MSTP groups VLANs to reduce STP instances
- LACP provides redundancy AND bandwidth with automatic negotiation
- Switch stacking creates one logical switch from multiple physical switches
- Use PortFast + BPDU Guard on access ports
A well-designed redundant switch network can survive hardware failures without dropping a single packet. Take the time to implement these protocols correctly!