How to fix Linux kernel: neighbour table overflow
ARP cache is overflowing. Most likely reason - too much traffic on the network.
What is a neighbour table overflow?
The neighbour table is a data structure in the Linux kernel that keeps track of the network devices connected to a specific network interface. It is used to maintain information about the IP addresses and MAC addresses of the devices on the network. When a device wants to communicate with another device on the same network, it consults the neighbour table to find the MAC address of the device.
However, if the number of devices on the network exceeds the size of the neighbour table, the table can overflow. When this happens, the kernel is unable to keep track of all the devices on the network, which can result in dropped packets, network congestion, and even network failure.
kernel error
dmesg |grep "neighbor table overflow"
check gc_thresh
sysctl -a | grep net.ipv4.neigh.default.gc_thresh
default value
/proc/sys/net/ipv4/neigh/default/gc_stale_time:60
/proc/sys/net/ipv4/neigh/default/gc_thresh1:128
/proc/sys/net/ipv4/neigh/default/gc_thresh2:512
/proc/sys/net/ipv4/neigh/default/gc_thresh3:1024
gc_stale_time
determines the period of validity check for adjacent layer records. When adjacent layer records expire, they will be parsed again before sending data to them. The default value is 60 seconds.gc_thresh1
is the minimum number of layers in theARP
cache. If it is less than this number, the garbage collector will not run. The default value is 128.gc_thresh2
is the maximum number of records that can be stored in theARP
cache. The garbage collector allows the number of records to exceed this number for 5 seconds before starting to collect. The default value is 512.gc_thresh3
is the hard limit of the maximum number of records that can be stored in theARP
cache. Once the number of records in the cache exceeds this number, the garbage collector will run immediately. The default value is 1024.
gc_stale_time (since Linux 2.2)
Determines how often to check for stale neighbor entries. When a neighbor entry is considered stale, it is resolved again before sending data to it.
Defaults to 60 seconds.
gc_thresh1 (since Linux 2.2)
The minimum number of entries to keep in the ARP cache. The garbage collector will not run if there are fewer than this number of entries in the
cache. Defaults to 128.
gc_thresh2 (since Linux 2.2)
The soft maximum number of entries to keep in the ARP cache. The garbage collector will allow the number of entries to exceed this for 5 seconds be‐
fore collection will be performed. Defaults to 512.
gc_thresh3 (since Linux 2.2)
The hard maximum number of entries to keep in the ARP cache. The garbage collector will always run if there are more than this number of entries in
the cache. Defaults to 1024.
analysis
arp -v
## sum the arp record number
arp -an | wc -l
best practice
/etc/sysctl.conf
## works best with <= 500 client computers ##
# Force gc to clean-up quickly
net.ipv4.neigh.default.gc_interval = 3600
# Set ARP cache entry timeout
net.ipv4.neigh.default.gc_stale_time = 3600
# Setup DNS threshold for arp
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh1 = 1024
sysctl -p
reference
man 7 arp
https://openai.com/blog/scaling-kubernetes-to-2500-nodes
https://www.cyberciti.biz/faq/centos-redhat-debian-linux-neighbor-table-overflow
Leave a message
Disclaimer
- Welcome to visit the knowledge base of SRE and DevOps!
- License under CC BY-NC 4.0
- Made with Material for MkDocs and improve writing by generative AI tools
- Copyright issue feedback me#imzye.com, replace # with @