-
Notifications
You must be signed in to change notification settings - Fork 41
Intermittent DNS failures when running Alpine containers in user-defined docker-compose network #303
Description
This is a cross-post from moby/libnetwork#2371 as I don't know where the bug lies.
In my environment, I am able to reproduce DNS resolution failures minimally with the following compose file when running LCOW.
version: '3'
services:
foo:
image: alpine:latest
dns_search: internal
entrypoint: sh -c "while true; do nslookup bar.internal && sleep 1s; done"
networks:
default:
aliases:
- foo.internal
bar:
image: alpine:latest
dns_search: internal
entrypoint: sh -c "while true; do nslookup foo.internal && sleep 1s; done"
networks:
default:
aliases:
- bar.internaldocker-compose up will yield something like the following, noting failures like bar_1 | nslookup: can't resolve 'foo.internal': Name does not resolve and foo_1 | nslookup: can't resolve 'bar.internal': Name does not resolve mixed in with successful resolutions:
PS C:\source\alpine-test> docker-compose -f .\docker-compose-bad.yml up
Creating network "alpine-test_default" with the default driver
Creating alpine-test_bar_1 ... done
Creating alpine-test_foo_1 ... done
Attaching to alpine-test_foo_1, alpine-test_bar_1
foo_1 |
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1 |
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 |
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 |
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 |
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 |
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1 |
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 |
bar_1 | nslookup: can't resolve 'foo.internal': Name does not resolve
foo_1 |
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 |
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 |
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1 |
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1 |
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1 |
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 |
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 | nslookup: can't resolve 'foo.internal': Name does not resolve
bar_1 |
foo_1 |
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 |
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 |
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1 |
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 | nslookup: can't resolve 'bar.internal': Name does not resolve
foo_1 |
bar_1 |
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1 |
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1 |
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25
foo_1 |
foo_1 | nslookup: can't resolve '(null)': Name does not resolve
foo_1 | Name: bar.internal
foo_1 | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1 |
bar_1 | Name: foo.internal
bar_1 | Address 1: 172.18.67.25
bar_1 | nslookup: can't resolve '(null)': Name does not resolve
Gracefully stopping... (press Ctrl+C again to force)
I can run this compose stack on OSX and it does not fail. If I switch to an ubuntu container from Alpine, the resolutions don't fail.
I can at least workaround the problem a bit by modifying the compose file to first perform a dig against the host like this:
version: '3'
services:
foo:
image: alpine:latest
dns_search: internal
entrypoint: sh -c "apk add bind-tools; dig bar.internal; while true; do nslookup bar.internal; sleep 2s; done"
networks:
default:
aliases:
- foo.internal
bar:
image: alpine:latest
dns_search: internal
entrypoint: sh -c "apk add bind-tools; dig foo.internal; while true; do nslookup foo.internal; sleep 2s; done"
networks:
default:
aliases:
- bar.internalThe nslookup: can't resolve '(null)': Name does not resolve in the original case is reported to be unnecessary per gliderlabs/docker-alpine#476 (comment), but after performing a dig that message changes and resolutions look like:
bar_1 | Server: 172.25.128.1
bar_1 | Address: 172.25.128.1#53
bar_1 |
bar_1 | Non-authoritative answer:
bar_1 | Name: foo.internal
bar_1 | Address: 172.25.139.149
bar_1 |
My host is as follows
Client:
Debug Mode: false
Plugins:
app: Docker Application (Docker Inc., v0.8.0-beta2)
buildx: Build with BuildKit (Docker Inc., v0.2.0-6-g509c4b6-tp)
Server:
Containers: 2
Running: 0
Paused: 0
Stopped: 2
Images: 138
Server Version: master-dockerproject-2019-04-28
Storage Driver: windowsfilter (windows) lcow (linux)
Windows:
LCOW:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: hyperv
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows 10 Enterprise Version 1809 (OS Build 17763.437)
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 16GiB
Name: ci-lcow-prod-1
ID: 0ac02c9d-aaba-42f4-8749-5a64af3068d8
Docker Root Dir: C:\ProgramData\docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
The LCOW image is built from linuxkit/lcow@d5dfdbc - I tried the latest merged PR, but it didn't launch containers and I had to revert (more info in linuxkit/lcow#45 (comment))
There are some further details in the original issue I filed at moby/libnetwork#2371