Cloud Load Balancers - The Missing Manual


Posted:   |  More posts about rackspace cloud cloud load balancers

The Rackspace Cloud Load Balancers service is pretty simple and for the most part "just works". In using the service however I have found a few items that are not obvious, (e.g. 500 Internal Server Error but nodes are working?), or not given enough attention, (e.g. Node Service Events).

A few things we're going to review.

Health Monitoring/5XX Status

So what happens if all or a few of your nodes in the LB pool are failing? A 500 error, that's what! This of course is a bad thing...

For example, you may see the load balancer give this error:

../galleries/clb_service_unavailable.png

The 500 Internal Server Error [1] means that:

A generic error message, given when an unexpected condition was encountered and no more specific message is suitable.
[1]Wikipedia - List of HTTP status codes

With Cloud Load Balancers this means:

  • Node health monitoring is not enabled for the cloud load balancer.
  • A node or nodes are failing to respond to requests coming from the CLB service.

As a result of no health monitoring, failing nodes will continue to remain in rotation until either the nodes recover or fail completely.

So what can you do?

  • Enable health monitoring, (more info: here and here).
  • Verify the nodes are working as expected, (e.g. the node is online, service running, etc).

There are a few things that could trip you up as well in regard to health monitoring. A few being:

  • Unsupported body regex. Meaning the CLB's regex matching is pretty simple, so don't try to be too fancy with response pattern matching.
  • The pattern match has to be within the first 2048 bytes of the response. Thus if you're attempting to match a pattern at the bottom of a complex page, the patten won't get matched and the node gets marked as offline. Remember, keep it simple.
  • Unset "host" header. This usually will affect HTTP servers that require the "host" header be set when making a request, (e.g. "host: www.example.com"). For instance, you may find yourself enabling the health monitor and nodes are marked offline immediately. If this happens, update the health monitors configuration and set the "hostHeader" option, (more on this in the Using The CLB API section).
  • Not allowing access to the CLB services source IP addresses. For a list of internal subnets used, see this Rackspace KB.

Update Node Logging

Now that traffic is coming from the load balancer, the nodes logs will be have the source IP of the load balancer. Meaning that rather than seeing the IP of the end-user/client, the IP of the load balancer will be logged. The following provides information on logging the X-Forwarded-For HTTP header that has the end-users IP. The X-Forward-Proto HTTP header may be of some interest as well.

Add Node Hostname Header

This is optional however I recommend it, especially when you're trying to figure out what node in the load balancing pool is having a problem.

Using The CLB API

The Cloud Load Balancers API provides a lot of functionality that isn't provided via the MyCloud Panel. A few we're going to look at are:

API Authentication

The flow for working with any of the Rackspace API's is:

  • Authenticate
  • Token is returned
  • Use token to interact with whatever service

As I mentioned, before moving forward we will need to authenticate to the Rackspace Identity service in order to get a token to use for the API calls. See this Rackspace KB on locating API credentials.

AUTH=https://identity.api.rackspacecloud.com
curl -s -X POST $AUTH/v2.0/tokens -d '{ "auth": "RAX-KSKEY:apiKeyCredentials":{ "username":"USERNAME", "apiKey":"APIKEY" } } }' -H "content-type: application/json" | python -m json.tool

From the output reponse, take note of the following:

  • publicURL
  • token id

For example, for if you're using Cloud Load Balancers in Virgina:

Service Catalog

"serviceCatalog": [
    {
    "publicURL": "https://iad.loadbalancers.api.rackspacecloud.com/v1.0/000042",
    "region": "IAD",
    "tenantId": "000042"
    },
]

Token

"token": {
    "RAX-AUTH:authenticatedBy": [
        "APIKEY"
            ],
    "expires": "2012-04-13T13:15:00.000-05:00",
    "id": "aaaaa-bbbbb-ccccc-dddd"
    "tenant": {
        "id": "000042",
        "name": "000042"
    }
    },

Now that we have the needed information, let's move forward and gather some data from the Cloud Load Balancers API.

List Load Balancers

Before we can actually do anything with our cloud load balancer, we have to have its instance ID. This can be obtained from the MyCloud Panel or from the API. For example, using the information we obtained when we authenticated to the Identity service, get a list of load balancers in IAD, (Virginia).

From the example output reponse, take note of the following:

  • name (e.g. "name": "lb-site1")
  • id (e.g. "id": 71)
ENDPOINT=https://iad.loadbalancers.api.rackspacecloud.com/v1.0/000042
TOKEN=aaaaa-bbbbb-ccccc-dddd
curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers | python -m json.tool

{
"loadBalancers":[
    {
        "name":"lb-site1",
        "id":71,
        "protocol":"HTTP",
        "port":80,
        "algorithm":"RANDOM",
        "status":"ACTIVE",
        "nodeCount":3,
        "virtualIps":[
            {
                "id":403,
                "address":"206.55.130.1",
                "type":"PUBLIC",
                "ipVersion":"IPV4"
            }
        ],
        "created":{
            "time":"2010-11-30T03:23:42Z"
        },
        "updated":{
            "time":"2010-11-30T03:23:44Z"
        }
        },
    ]
}

Load Balancer Statistics

Load balancer stats provide a brief overview of how the nodes in the pool are performing. From the documentation:

  • connectTimeOut – Connections closed by this load balancer because the 'connect_timeout' interval was exceeded.
  • connectError – Number of transaction or protocol errors in this load balancer.
  • connectFailure – Number of connection failures in this load balancer.
  • dataTimedOut – Connections closed by this load balancer because the 'timeout' interval was exceeded.
  • keepAliveTimedOut – Connections closed by this load balancer because the 'keepalive_timeout' interval was exceeded.
  • maxConn – Maximum number of simultaneous TCP connections this load balancer has processed at any one time.
curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/stats | python -m json.tool

{
"connectTimeOut":10,
"connectError":20,
"connectFailure":30,
"dataTimedOut":40,
"keepAliveTimedOut":50,
"maxConn":60
}

Node Service Events

Note

Health monitoring must be enabled in order to capture node events!

Node events can be extremely useful when diagnosing outages experienced with the nodes.

For example:

curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/nodes/events | python -m json.tool

   {
    "nodeServiceEvents": [
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "12-12-2013 23:07:01",
            "description": "Node '373' status changed to 'OFFLINE' for load balancer '71'",
            "detailedMessage": "Write failed: No route to host",
            "id": 95901,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        },
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "01-02-2014 21:40:18",
            "description": "Node '373' status changed to 'OFFLINE' for load balancer '71'",
            "detailedMessage": "Timeout while waiting for valid server response",
            "id": 125649,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        },
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "01-02-2014 21:48:44",
            "description": "Node '373' status changed to 'ONLINE' for load balancer '71'",
            "detailedMessage": "Node is working",
            "id": 125675,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        },
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "01-19-2014 23:11:40",
            "description": "Node '373' status changed to 'OFFLINE' for load balancer '71'",
            "detailedMessage": "Write failed: Connection refused",
            "id": 139491,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        },
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "01-19-2014 23:11:52",
            "description": "Node '373' status changed to 'ONLINE' for load balancer '71'",
            "detailedMessage": "Node is working",
            "id": 139497,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        }
    ]
}

Updating Load Balancer Configuration

There are a few configurational items that are not exposed via the MyCloud Panel. I found the following useful in certain cases such as:

  • IIS and/or strictly configured HTTPD services will give a unexpected response when a request is made directly to the IP address of the server.
  • HTTPS only load balancing if the service will not use HTTP.

Set hostHeader for Health Monitoring

The easiest method to udpate a Health Monitor is to create it via the MyCloud Panel, pull it from the API, then update the response payload to have the hostHeader value.

Pull Health Monitoring Configuration

curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/healthmonitor | python -m json.tool

 {
     "healthMonitor": {
         "attemptsBeforeDeactivation": 2,
         "delay": 10,
         "path": "/",
         "statusRegex": "^[234][0-9][0-9]$",
         "timeout": 5,
         "type": "HTTP"
     }
 }

Update Health Monitoring Configuration

curl -s -X PUT -H "content-type: application/json" -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/healthmonitor -d '{"healthMonitor": { "attemptsBeforeDeactivation": 2, "delay": 10, "hostHeader": "www.virtualdisaster.net", "path": "/", "statusRegex": "^[234][0-9][0-9]$", "timeout": 5, "type": "HTTP" }}' -i

HTTP/1.1 202 Accepted

Updated Health Monitor

curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/healthmonitor | python -m json.tool

 {
 "healthMonitor": {
     "attemptsBeforeDeactivation": 2,
     "delay": 10,
     "hostHeader": "www.virtualdisaster.net",
     "path": "/",
     "statusRegex": "^[234][0-9][0-9]$",
     "timeout": 5,
     "type": "HTTP"
     }
 }

HTTPS only load balancers

If you plan on only offering HTTPS enabled services, the load balancing service has functionality to 301 redirect HTTP requests to HTTPS.

Enable httpsRedirect on the load balancer.

curl -s -X PUT -H "content-type: application/json" -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71 -d '{ "loadBalancer": { "httpsRedirect": true } }' | python -m json.tool

HTTP/1.1 202 Accepted

HTTPS redirect enabled load balancer (output truncated for brevity)

curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71 | python -m json.tool

{
"loadBalancer":{
"name":"lb-site1",
"algorithm": "RANDOM",
"protocol": "HTTP",
"port": 80,
"timeout": 60,
"connectionLogging": true,
"httpsRedirect": true
 }
}

Primer

Initially this was going to be a "troubleshooting" guide however it is more of a quick reference on practical use of the service. Check it out here.

Comments powered by Disqus
Share