What's running that? Load Balancer ID from Hostname II • Scott's Ramblings

This post follows on from what’s running that - cloud load balancer ID from hostname - an exploration of how we might detect the cloud load runner serving a particular hostname. In the last episode we managed to differentiate between AWS API Gateway, Application Load Balancer, and Network Load Balancers, but relied a lot on patterns in the DNS records AWS uses. In some cases this isn’t possible so we want something a bit more robust.

Let’s try and do better! Once again, this’ll be an exploration into what else we can find out, and may not end up anywhere particularly “useful” 🙂

Prior Art

For inspiration I turned to nmap which can do a bunch of things, including fingerprinting servers. I ran a few different incantations across a HTTPS server running on my LAN to see what it’s it up to:

# OS detection
sudo nmap -O -p 80,443,8080 -Pn -vv $MY_API_ENDPOINT

# Ciphers
nmap -sV --script ssl-enum-ciphers -p 443 $MY_API_ENDPOINT

# Server headers
nmap -p 443 --script=http-server-header --script-trace $MY_API_ENDPOINT

… this produces a lot of output. But one thing caught my eye - one of the things http-server-header does it to send a HTTP/1.0 request to the server without a Host header set:

NSE: TCP 192.168.1.112:64249 > 1.2.3.4:443 | CONNECT
NSE: TCP 192.168.1.112:64249 > 1.2.3.4:443 | 00000000: 47 45 54 20 2f 20 48 54 54 50 2f 31 2e 30 0d 0a GET / HTTP/1.0
....
NSE: TCP 192.168.1.112:64249 < 3.165.190.77:443 | HTTP/1.1 400 Bad Request
Server: nginx

… I wonder if we get anything interesting back in the response headers from the AWS load balancers if we do this? We can try this pretty easily with curl:

curl -v --http1.0 -H "Host:" $MY_API_ENDPOINT

Interestingly enough an edge API Gateway API will return a Server: Cloudfront header! Every other combination of load balancer and HTTP version never returns a Server header. I reckon we know have enough to identify Edge and Regional API Gateway endpoints without having to use hostnames:

HTTP/1.1 response includes x-amz-apigw-id and HTTP/1.0 response includes Server: Cloudfront header? —> Edge API Gateway
HTTP/1.1 response includes apigw-requestid and HTTP/1.0 response does not include Server: Cloudfront header ? —> Regional API Gateway

Adding the HTTP/1.0 Probe

By looking at nmap, we’ve learnt that we can get more signal for our load balancer fingerprinting by making different sorts of HTTP requests. Lets modify our golang app so that we have a pattern to easily add new probes in the same fashion as we add different load balancer classifiers; extractDomainInfo is already getting different to deal and this seems like a good moment.

First I’ll add an interface for the probes themselves:

type ProbeFunc func(domain string, debug bool) (interface{}, error)

Then we can take extractDomainInfo, break it apart, and make the existing probes it performs (HTTP/1.1 and CNAME) fit this interface. Something like this:

type HttpProbeData struct {
    IPv4                []string
    IPv6                []string
    CertIssuer          string
    HttpResponseHeaders map[string]string
}

// HttpProbe probe function
func HttpProbe(domain string, debug bool) (interface{}, error) { /* */ }

… and for our CNAME probe:

type CnameProbeData struct {
    WasCname       bool
    ResolvedDomain string
}

// CnameProbe checks if the given domain is a CNAME and resolves itfunc
CnameProbe(domain string, debug bool) (interface{}, error) { /* */ }

… and our new HTTP/1.0 probe will follow the same pattern:

type Http10ProbeData struct {
    Http10ResponseHeaders map[string]string
}

// Http10Probe Runs a HTTP/1.0 probe without a `Host` header.func
Http10Probe(domain string, debug bool) (interface{}, error) { /* */ }

Finally, we modify our classifiers to pick out what they need from an aggregated collection of all the probe results. Here we can see how ClassifyRegionalAPI implements the logic outlined earlier and can identify a Regional API Gateway endpoint without relying on hostname !

// A classifier now needs to take a map from probeName -> probeData
type ClassifierFunc func(probeResults map[string]interface{}) string

// ... and can then pick out the pieces it needs
func ClassifyRegionalAPI(probeResults map[string]interface{}) string {
    // Retrieve the HTTP probe data from the map
    httpData, ok := probeResults["HTTP"].(*probes.HttpProbeData)

    /// ...

  // 1. Should have an apigw-requestid on HTTP/1.1
  if httpData.HttpResponseHeaders["Apigw-Requestid"] == "" {
      return ""
  }

  // 2. Should not have a Cloudfront header on HTTP/1.0
  if http10Data.Http10ResponseHeaders["Server"] == "CloudFront" {
      return ""
  }

  // Check if there are at most 2 IPs and specific CloudFront headers are NOT present
  if len(httpData.IPv4) <= 2 && (httpData.HttpResponseHeaders["X-Amz-Cf-Pop"] == "" && httpData.HttpResponseHeaders["Via"] == "") {
      return "API Gateway: Regional API"
  }

Now that we’ve got a nice structure to add separate probes, let’s see if we can remove the DNS dependency for the API gateway - so we can identify one without having to look at the DNS at all. This will make us robust in the face of APIs that use aliases to “hide” Amazon’s DNS endpoints. Here’s what that looks like for our API Gateway - Edge API classifier:

// 1. Needs a request ID on HTTP1.1
if httpData.HttpResponseHeaders["X-Amz-Apigw-Id"] == "" {
    return ""
}

// 2. Should have a Cloudfront header on HTTP1.0
if http10Data.Http10ResponseHeaders["Server"] != "CloudFront" {
    return ""
}

// 3. Check if there are at least 4 IPs and the required CloudFront headers are present
if len(httpData.IPv4) >= 4 && httpData.HttpResponseHeaders["X-Amz-Cf-Pop"] != "" && httpData.HttpResponseHeaders["Via"] != "" {
    return "API Gateway: Edge API"
}

What’s next?

This doesn’t help us at all with ALBs or NLBs - we get no extra signal from the HTTP/1.0 probe - we are still stuck on the CNAME rules here. There’s some other hints in the NMAP scripts from earlier that might be useful - for instance, trying to cluster around the TCP TTLs delivered back by the server. But that feels a bit thin here ¹, and we’ll leave it for another day!

As always, you can find the complete app on GitHub.

Footnotes

Maybe it is enough to differentiate between load balancers if you know that you have an LB, but it is hard to imagine a pattern in TTLs is going to be enough to say “of all the things on the internet, this is definitely an AWS NLB ↩