What is Haproxy

homepage-banner

Introduction

HAProxy is a popular open-source load balancer and proxy server that is used by millions of websites and web applications across the globe. It is designed to distribute incoming traffic across multiple servers, ensuring that the load is balanced and no single server is overloaded. HAProxy is known for its high performance, reliability, and flexibility, making it an ideal choice for mission-critical applications that require high availability and scalability.

HAProxy Scheduling Algorithms

Round-Robin (`roundrobin`)

This algorithm distributes each request sequentially around the pool of real servers. All the real servers are treated as equals without regard to capacity or load. This scheduling model resembles round-robin DNS but is more granular due to the fact that it is network-connection based and not host-based. Load Balancer round-robin scheduling also does not suffer the imbalances caused by cached DNS queries. However, in HAProxy, since configuration of server weights can be done on the fly using this scheduler, the number of active servers is limited to 4095 per back end.

Static Round-Robin (`static-rr`)

This algorithm distributes each request sequentially around a pool of real servers as does Round-Robin, but does not allow configuration of server weight dynamically. However, because of the static nature of server weight, there is no limitation on the number of active servers in the back end.

Least-Connection (`leastconn`)

This algorithm distributes more requests to real servers with fewer active connections. Administrators with a dynamic environment with varying session or connection lengths may find this scheduler a better fit for their environments. It is also ideal for an environment where a group of servers has different capacities, as administrators can adjust weight on the fly using this scheduler.

Source (`source`)

This algorithm distributes requests to servers by hashing requesting source IP address and dividing by the weight of all the running servers to determine which server will get the request. In a scenario where all servers are running, the source IP request will be consistently served by the same real server. If there is a change in the number or weight of the running servers, the session may be moved to another server because the hash/weight result has changed.

URI (`uri`)

This algorithm distributes requests to servers by hashing the entire URI (or a configurable portion of a URI) and divides by the weight of all the running servers to determine which server will get the request. In a scenario where all active servers are running, the destination IP request will be consistently served by the same real server. This scheduler can be further configured by the length of characters at the start of a directory part of a URI to compute the hash result and the depth of directories in a URI (designated by forward slashes in the URI) to compute the hash result.

URL Parameter (`url_param`)

This algorithm distributes requests to servers by looking up a particular parameter string in a source URL request and performing a hash calculation divided by the weight of all running servers. If the parameter is missing from the URL, the scheduler defaults to Round-robin scheduling. Modifiers may be used based on POST parameters as well as wait limits based on the number of maximum octets an administrator assigns to the weight for a certain parameter before computing the hash result.

Header Name (`hdr`)

This algorithm distributes requests to servers by checking a particular header name in each source HTTP request and performing a hash calculation divided by the weight of all running servers. If the header is absent, the scheduler defaults to Round-robin scheduling.

This algorithm distributes requests to servers by looking up the RDP cookie for every TCP request and performing a hash calculation divided by the weight of all running servers. If the header is absent, the scheduler defaults to Round-robin scheduling. This method is ideal for persistence as it maintains session integrity.

Installation

## CentOS/RHEL
yum install haproxy
## Ubuntu/Debian
apt-get install haproxy

## config file
vim /etc/haproxy/haproxy.cfg

## start
systemctl enable haproxy
systemctl start haproxy

Configuration of HAProxy

Configuring HAProxy can be a daunting task, especially for those who are new to the platform. However, once you get the hang of it, you will find that it is quite easy to set up and configure. The first step in configuring HAProxy is to create a configuration file that defines how the load balancer should behave. The configuration file is written in a simple text format and contains a set of directives that specify various settings, such as the IP address and port number of the backend servers, the load balancing algorithm to use, and the maximum number of connections that can be handled at once.

One of the most important aspects of HAProxy configuration is the use of ACLs (Access Control Lists). ACLs allow you to define conditions under which a particular action should be taken. For example, you can create an ACL that matches requests coming from a particular IP address or user agent, and then specify that those requests should be forwarded to a specific backend server. This gives you a great deal of control over how traffic is routed through your load balancer.

Examples of HAProxy Configuration

Here are some examples of common HAProxy configurations that you can use as a starting point for your own setup:

Simple Load Balancing

This configuration sets up a basic load balancer that distributes incoming traffic across two backend servers:

frontend http-in
    bind *:80
    default_backend servers

backend servers
    balance roundrobin
    server server1 192.168.1.10:80 check
    server server2 192.168.1.11:80 check

SSL Termination

This configuration sets up HAProxy to terminate SSL connections, decrypting the traffic and passing it on to the backend servers in plain text:

frontend http-in
    bind *:443 ssl crt /etc/haproxy/certs/mydomain.pem
    default_backend servers

backend servers
    balance roundrobin
    server server1 192.168.1.10:80 check
    server server2 192.168.1.11:80 check

URL-Based Routing

This configuration sets up HAProxy to route traffic to different backend servers based on the URL of the incoming request:

frontend http-in
    bind *:80
    acl is_api path_beg /api
    use_backend api_servers if is_api
    default_backend web_servers

backend api_servers
    balance roundrobin
    server server1 192.168.1.12:80 check
    server server2 192.168.1.13:80 check

backend web_servers
    balance roundrobin
    server server3 192.168.1.14:80 check
    server server4 192.168.1.15:80 check

Conclusion

HAProxy is a powerful and flexible load balancer and proxy server that can help you achieve high availability and scalability for your web applications. With its simple configuration format and powerful ACLs, you can easily set up complex routing rules and load balancing algorithms to suit your specific needs. So if you’re looking for a reliable and efficient way to manage your web traffic, give HAProxy a try!

Reference

https://www.haproxy.org/
https://www.haproxy.com/blog/the-four-essential-sections-of-an-haproxy-configuration
https://docs.haproxy.org/2.6/configuration.html
https://www.digitalocean.com/community/tutorials/an-introduction-to-haproxy-and-load-balancing-concepts

Leave a message

What is Haproxy

Introduction

HAProxy Scheduling Algorithms

Round-Robin (roundrobin)

Static Round-Robin (static-rr)

Least-Connection (leastconn)

Source (source)

URI (uri)

URL Parameter (url_param)

Header Name (hdr)

RDP Cookie (rdp-cookie)