Let’s see how the ndots option works in Kubernetes.
In Kubernetes, we connect to running pods either directly or via a Kubernetes Service. This post focuses on how service DNS resolution actually works under the hood.
The resolv.conf file
In Linux, /etc/resolv.conf is used by the system resolver for domain name resolution. Kubernetes automatically injects this file into every pod with the configuration needed to resolve services inside the cluster.
Here is an example from a CNPG cluster pod running in the database namespace:
$ cat /etc/resolv.conf
nameserver 10.43.0.10
search database.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Three things are happening here:
nameserver is the internal CoreDNS IP. You can confirm it in your cluster:
kubectl get svc -n kube-system | grep dns
kube-dns ClusterIP 10.43.0.10 ...
search is a list of domain suffixes the resolver will append to your hostname when trying to build a valid FQDN. For example, querying backend-svc from inside the cluster will result in these attempts:
backend-svc.database.svc.cluster.local
backend-svc.svc.cluster.local
backend-svc.cluster.local
options holds additional resolver config like ndots, timeout, and retries. The ndots value is the one we care about here.
What is ndots?
ndots defines how many dots need to be present in a hostname before the resolver treats it as an absolute name. If the number of dots in your hostname is less than the ndots value, the resolver will append each entry from the search list and try to resolve those first, one by one, before falling back to the original name.
In short: ndots controls whether your hostname goes through the search list or gets sent to DNS directly.
How resolution actually works
Let’s say our backend pod in the apps namespace wants to connect to data-svc, also in the apps namespace.
Pod: backend (namespace: apps)
Target: data-svc (namespace: apps)
The resolv.conf inside the backend pod looks like this:
$ cat /etc/resolv.conf
nameserver 10.43.0.10
search apps.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
With ndots:5, any hostname with fewer than 5 dots will go through the search list. Let’s walk through each case.
Case 1: data-svc (0 dots)
0 < 5, so the resolver appends the search list from the top:
try: data-svc.apps.svc.cluster.local -> RESOLVED
Queries: 1
Wasted: 0
Because both the pod and the service are in the apps namespace, the very first search domain expansion produces the correct FQDN. This is actually the most efficient case, short names within the same namespace resolve in a single query.
Case 2: data-svc.apps (1 dot)
1 < 5, so the search list is still used:
try: data-svc.apps.apps.svc.cluster.local -> NXDOMAIN (namespace duplicated)
try: data-svc.apps.svc.cluster.local -> RESOLVED
Queries: 2
Wasted: 1
The first attempt fails because apps appears twice, once from our hostname and once from the first search domain. It resolves correctly on the second try.
Case 3: data-svc.apps.svc.cluster.local (4 dots)
This is the big trap. Writing out the full FQDN feels like it should resolve immediately, but 4 < 5, so the search list still kicks in:
try: data-svc.apps.svc.cluster.local.apps.svc.cluster.local -> NXDOMAIN
try: data-svc.apps.svc.cluster.local.svc.cluster.local -> NXDOMAIN
try: data-svc.apps.svc.cluster.local.cluster.local -> NXDOMAIN
try: data-svc.apps.svc.cluster.local -> RESOLVED
Queries: 4
Wasted: 3
The resolver exhausts the entire search list before finally trying the original name as-is. This is the most expensive form, despite looking the most explicit.
Case 4: data-svc.apps.svc.cluster.local. (trailing dot, 5 dots)
The trailing dot is the key. It brings the dot count to 5, which satisfies >= ndots:5, so the resolver skips the search list entirely and queries the name directly:
try: data-svc.apps.svc.cluster.local. -> RESOLVED
Queries: 1
Wasted: 0
Summary
| Hostname | Dots | Queries | Wasted |
|---|---|---|---|
data-svc |
0 | 1 | 0 |
data-svc.apps |
1 | 2 | 1 |
data-svc.apps.svc.cluster.local |
4 | 4 | 3 |
data-svc.apps.svc.cluster.local. |
5 | 1 | 0 |
For same-namespace communication, just use the short name data-svc. It is the simplest form and the most efficient. The full FQDN without a trailing dot looks precise but is actually the slowest option because it still goes through the search list first.
The trailing dot is the only way to guarantee a single query, it tells the resolver the name is complete and needs no expansion.
And finally, my backend was able to connect to my database. 🎉