Skip to content
This repository was archived by the owner on Jul 28, 2021. It is now read-only.
This repository was archived by the owner on Jul 28, 2021. It is now read-only.

Intermittent DNS failures when running Alpine containers in user-defined docker-compose network #303

@Iristyle

Description

@Iristyle

This is a cross-post from moby/libnetwork#2371 as I don't know where the bug lies.

In my environment, I am able to reproduce DNS resolution failures minimally with the following compose file when running LCOW.

version: '3'

services:
  foo:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup bar.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup foo.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - bar.internal

docker-compose up will yield something like the following, noting failures like bar_1 | nslookup: can't resolve 'foo.internal': Name does not resolve and foo_1 | nslookup: can't resolve 'bar.internal': Name does not resolve mixed in with successful resolutions:

PS C:\source\alpine-test> docker-compose -f .\docker-compose-bad.yml up
Creating network "alpine-test_default" with the default driver
Creating alpine-test_bar_1 ... done
Creating alpine-test_foo_1 ... done
Attaching to alpine-test_foo_1, alpine-test_bar_1
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve 'foo.internal': Name does not resolve
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | nslookup: can't resolve 'foo.internal': Name does not resolve
bar_1  |
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
foo_1  |
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
Gracefully stopping... (press Ctrl+C again to force)

I can run this compose stack on OSX and it does not fail. If I switch to an ubuntu container from Alpine, the resolutions don't fail.

I can at least workaround the problem a bit by modifying the compose file to first perform a dig against the host like this:

version: '3'

services:
  foo:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "apk add bind-tools; dig bar.internal; while true; do nslookup bar.internal; sleep 2s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "apk add bind-tools; dig foo.internal; while true; do nslookup foo.internal; sleep 2s; done"
    networks:
      default:
        aliases:
         - bar.internal

The nslookup: can't resolve '(null)': Name does not resolve in the original case is reported to be unnecessary per gliderlabs/docker-alpine#476 (comment), but after performing a dig that message changes and resolutions look like:

bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |

My host is as follows

Client:
 Debug Mode: false
 Plugins:
  app: Docker Application (Docker Inc., v0.8.0-beta2)
  buildx: Build with BuildKit (Docker Inc., v0.2.0-6-g509c4b6-tp)

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 138
 Server Version: master-dockerproject-2019-04-28
 Storage Driver: windowsfilter (windows) lcow (linux)
  Windows:
  LCOW:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics l2bridge l2tunnel nat null overlay transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
 Operating System: Windows 10 Enterprise Version 1809 (OS Build 17763.437)
 OSType: windows
 Architecture: x86_64
 CPUs: 2
 Total Memory: 16GiB
 Name: ci-lcow-prod-1
 ID: 0ac02c9d-aaba-42f4-8749-5a64af3068d8
 Docker Root Dir: C:\ProgramData\docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The LCOW image is built from linuxkit/lcow@d5dfdbc - I tried the latest merged PR, but it didn't launch containers and I had to revert (more info in linuxkit/lcow#45 (comment))

There are some further details in the original issue I filed at moby/libnetwork#2371

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions