linux networking – Why are “Relayed” multicast packets are not received?

I wrote a test program to diagnose multicast routing problems.
The program has several modes of operation:

  • sender (send out a number of multicast packets)
  • receiver (receive a number of multicast packets)
  • requester (send a multicast packet, then time-wait for a response, repeat a number of times)
  • responder (receive a multicast packet, then send a response, repeat a number of times)
  • relay (like responder, but don’t respond to the sending address, but to the multicast address)

The “relay” mode was added most recently, and all the other modes work as expected, but “relay” does not (even though doing more or less the same as the other modes do):
The relay only receives it’s own responses, but the requester does not receive any response.

I compared a combination of (requester, responder) with (requester, relay) on the same host:

Requester 1

~/src/C/multicast> ./mc-tester -l 224.7.7.7/123 -d 224.7.7.7/1234 -m requester -c100 -v1
(1) verbosity = 1
(1) Sending 100 requests on 3...
(1) Sending "224.7.7.7/1234: v04 request #1/100" size 39 to 224.7.7.7/1234 on 3
(1) Receiving message #1 on 3...
(1) v04 received #1/1 from 172.20.16.35/60248 (TTL -1): "172.20.16.35/35949 v04: response #1/10 for #1"
(1) Sending "224.7.7.7/1234: v04 request #2/100" size 39 to 224.7.7.7/1234 on 3
(1) Receiving message #1 on 3...
(1) v04 received #1/2 from 172.20.16.35/60248 (TTL -1): "172.20.16.35/35949 v04: response #2/10 for #2"
(1) Sending "224.7.7.7/1234: v04 request #3/100" size 39 to 224.7.7.7/1234 on 3
(1) Receiving message #1 on 3...
(1) v04 received #1/3 from 172.20.16.35/60248 (TTL -1): "172.20.16.35/35949 v04: response #3/10 for #3"
^C

(“TTL -1) means the received TTL is unknown)

Responder 1

~windl/src/C/multicast/mc-tester -v3 -l 224.7.7.7/1234 -m responder -d 224.7.7.7/1234 -c10
(1) verbosity = 3
(2) /home/windl/src/C/multicast/mc-tester: 224.7.7.7/1234 -> 224.7.7.7/1234 (16)
(1) op_mode = 3
(2) /home/windl/src/C/multicast/mc-tester: 224.7.7.7/1234 -> 224.7.7.7/1234 (16)
(1) msg_count = 10
(2) socket(PF_INET, SOCK_DGRAM, 0)...
(2) setsockopt(3, SO_REUSEADDR, 1)...
(2) socket(PF_INET, SOCK_DGRAM, 0)...
(2) setsockopt(4, SO_REUSEADDR, 0)...
(2) bind(3, 224.7.7.7/1234)...
(2) recv_socket: getsockname(3) returned 224.7.7.7/1234
(2) setsockopt(3, IP_MULTICAST_LOOP, 0)...
(2) setsockopt(3, IP_RECVTTL, 1)...
(2) setsockopt(3, IP_ADD_MEMBERSHIP, 224.7.7.7/1234)...
(2) setsockopt(4, IP_MULTICAST_TTL, 3)...
(1) Receiving 10 messages on 3...
(1) v04 received #1/10 from 172.20.16.35/35949 (TTL 3): "224.7.7.7/1234: v04 request #1/100"
(1) Sending "172.20.16.35/35949 v04: response #1/10 for #1" size 50 to 172.20.16.35/35949 on 4
(1) v04 received #2/10 from 172.20.16.35/35949 (TTL 3): "224.7.7.7/1234: v04 request #2/100"
(1) Sending "172.20.16.35/35949 v04: response #2/10 for #2" size 50 to 172.20.16.35/35949 on 4
(1) v04 received #3/10 from 172.20.16.35/35949 (TTL 3): "224.7.7.7/1234: v04 request #3/100"
(1) Sending "172.20.16.35/35949 v04: response #3/10 for #3" size 50 to 172.20.16.35/35949 on 4
^C

So that combination worked as expected.
Now the combination that did not:

Requester 2

/src/C/multicast> ./mc-tester -l 224.7.7.7/123 -d 224.7.7.7/1234 -m requester -c100 -v1
(1) verbosity = 1
(1) Sending 100 requests on 3...
(1) Sending "224.7.7.7/1234: v04 request #1/100" size 39 to 224.7.7.7/1234 on 3
select timed out
(1) Sending "224.7.7.7/1234: v04 request #2/100" size 39 to 224.7.7.7/1234 on 3
select timed out
(1) Sending "224.7.7.7/1234: v04 request #3/100" size 39 to 224.7.7.7/1234 on 3
^C

(“select timed out” refers to receiving, not sending)

Relay

~windl/src/C/multicast/mc-tester -v3 -l 224.7.7.7/1234 -m relay -d 224.7.7.7/1234 -c10
(1) verbosity = 3
(2) /home/windl/src/C/multicast/mc-tester: 224.7.7.7/1234 -> 224.7.7.7/1234 (16)
(1) op_mode = 4
(2) /home/windl/src/C/multicast/mc-tester: 224.7.7.7/1234 -> 224.7.7.7/1234 (16)
(1) msg_count = 10
(2) socket(PF_INET, SOCK_DGRAM, 0)...
(2) setsockopt(3, SO_REUSEADDR, 0)...
(2) socket(PF_INET, SOCK_DGRAM, 0)...
(2) setsockopt(4, SO_REUSEADDR, 1)...
(2) bind(3, 224.7.7.7/1234)...
(2) recv_socket: getsockname(3) returned 224.7.7.7/1234
(2) setsockopt(3, IP_MULTICAST_LOOP, 0)...
(2) setsockopt(3, IP_RECVTTL, 1)...
(2) setsockopt(3, IP_ADD_MEMBERSHIP, 224.7.7.7/1234)...
(2) setsockopt(4, IP_MULTICAST_TTL, 3)...
(2) setsockopt(4, IP_ADD_MEMBERSHIP, 224.7.7.7/1234)...
(1) Relaying 10 messages on 3...
(1) v04 received #1/10 from 172.20.16.35/33488 (TTL 3): "224.7.7.7/1234: v04 request #1/100"
(1) Sending "224.7.7.7/1234 v04: relay #1/10 for #1" size 43 to 224.7.7.7/1234 on 4
(1) v04 received #2/10 from 172.20.16.35/44217 (TTL 3): "224.7.7.7/1234 v04: relay #1/10 for #1"
(1) Sending "224.7.7.7/1234 v04: relay #2/10 for #1" size 43 to 224.7.7.7/1234 on 4
(1) v04 received #3/10 from 172.20.16.35/44217 (TTL 3): "224.7.7.7/1234 v04: relay #2/10 for #1"
(1) Sending "224.7.7.7/1234 v04: relay #3/10 for #2" size 43 to 224.7.7.7/1234 on 4
(1) v04 received #4/10 from 172.20.16.35/44217 (TTL 3): "224.7.7.7/1234 v04: relay #3/10 for #2"
(1) Sending "224.7.7.7/1234 v04: relay #4/10 for #3" size 43 to 224.7.7.7/1234 on 4

(1) v04 received #9/10 from 172.20.16.35/44217 (TTL 3): "224.7.7.7/1234 v04: relay #8/10 for #7"
(1) Sending "224.7.7.7/1234 v04: relay #9/10 for #8" size 43 to 224.7.7.7/1234 on 4
(1) v04 received #10/10 from 172.20.16.35/44217 (TTL 3): "224.7.7.7/1234 v04: relay #9/10 for #8"
(1) Sending "224.7.7.7/1234 v04: relay #10/10 for #9" size 44 to 224.7.7.7/1234 on 4
(1) Received 10 messages
(2) setsockopt(3, IP_DROP_MEMBERSHIP)...
(2) setsockopt(4, IP_DROP_MEMBERSHIP)...
(2) close(4)...
(2) close(3)...

So the messages in the relay are looping locally.
The test was done with Linux (SLES12 SP4).

I decided not to lengthen the question with the C source of the program, but when requested I can present the relevant parts or an ltrace/strace of the relay.