Linux internals and network programming: Broadcast and Local Multicast in Networking

I. Introduction

This article will discuss about multicast, including how link-layer addressing can be used to send multicast or broadcast traffic efficiently from one computer to several others. It also examine the Internet Group Management Protocol (IGMP) [RGC3376] used in IPv4 and Multicast Listener Discovery (MLD) [RFC3810] used in IPv6, which are used to inform IPv4 and IPv6 multicast routers which multicast addresses are in use on a subnetwork. This article doesn't cover how multicast routing is implemented in wide area networks such as the global Internet.

There are four kind of IP addresses that being used in Internet: unicast, anycast, multicast and broadcast.

Boardcasting and multicast provide two services for application: delivery of packets to multiple destinations and solicitation/discovery of servers by clients.

Delivery to multiple destinations

There are many applications that deliver information to multiple recipients: interactive conferencing and dissemination of mail or news to multiple recipients, for example. Without broadcasting or multicasting, these types of services tend to use TCP today (delivering a separate copy to each destination, which can be very inefficient).

Solicitation of servers by client

Using broadcasting or multicasting, an application can send a request for a server without knowingany particular server’s IP address. This capability is very useful during configuration when little is known about the local networking environment. A laptop, for example, might need to get its initial IP address and find its nearest router using DHCP.

Although both broadcasting and multicasting can provide these important capabilities, multicasting is generally preferable to broadcasting because multicasting involves only those systems that support or use a particular service or protocol, and broadcasting does not. Thus, a broadcast request affects all hosts that are reachable within the scope of the broadcast, whereas multicast affects only those hosts that are likely to be interested in the request. There is a trade-off between the higher overhead and simplicity of broadcast and the improved efficiency but greater complexity associated with multicast.

Generally, only user applications that use the UDP transport protocol take advantage of broadcasting and multicasting, where it makes sense for an application to send a single message to multiple recipients. TCP is a connectionoriented protocol that implies a connection between two hosts (specified by IP addresses) and one process on each host (specified by port numbers). TCP can use unicast and anycast addresses (recall that anycast addresses behave like unicast addresses), but not broadcast or multicast addresses.

II. Boardcasting

Broadcasting refers to sending a message to all possible receivers in a network. In principle, this is simple: a router simply forwards a copy of any message it receives out of every interface other than the one on which the message arrived.

On an Ethernet network, a multicast MAC address has the loworder bit of the highorder byte turned on.

In hexadecimal this looks like 01:00:00:00:00:00. We may consider the Ethernet broadcast address ff:ff:ff:ff:ff:ff as a special case of the Ethernet multicast address 255.255.255.255 corresponds to a local network (also called “limited”) broadcast.

III. Multicast

To reduce the amount of overhead involved in broadcasting, it is possible to send traffic only to those receivers that are interested in it. This is called multicasting. Fundamentally, this is accomplished by either having the sender indicate the receivers, or instead having the receivers independently indicate their interest. The network then becomes responsible for sending traffic only to intended/interested recipients. Implementing multicast is considerably more challenging than broadcast because multicast state (information) must be maintained by hosts and routers as to what traffic is of interest to what receivers.

1. Converting IP Multicast Addresses to 802 MAC/Ethernet Address

To carry IP multicast efficiently on a link-layer network, there should be a one to one mapping between packets and addresses at the IP layer and frames at the link layer. The IANA organization give multicast group MAC addresses in the range 01:00:5e:00:00:00 through 01:00:5e:7f:ff:ff. All IPv4 multicast addresses are contained within the address space from 224.0.0.0 to 239.255.255.255 (formerly known as class D address space). All such addresses share a common 4bit sequence of 1110 in the high order bits. Thus, there are 32 – 4 = 28 bits available to encode the entire space of 228 = 268,435,456 multicast IPv4 addresses (also called group IDs). For IPv4, all 268,435,456 IPv4 multicast group IDs need to be mapped into a link-layer address space containing only 223 = 8,388,608 unique entries. The mapping therefore is nonunique. That is, more than one IPv4 group ID is mapped to the same MAC layer group address. Specifically, 228/223 = 25 = 32 distinct IPv4 multicast group IDs are mapped to each group address. For example, both the multicast addresses 224.128.64.32 (hexadecimal e0.80.40.20) and 224.0.64.32 (hexadecimal e0.00.40.20) are mapped into the Ethernet address 01:00:5e:00:40:20.

The IPv4-to-IEEE-802 MAC multicast address mapping uses the lowerorder 23 bits of the IPv4 group address as the suffix of a MAC address starting with 01:00:5e. Because only 23 of the 28 group address bits are used, 32 groups are mapped to the same MAC-layer address.

For IPv6, the 16-bit hexadecimal prefix is 33:33. This means that the last 32 bits of the IPv6 address can be used to form the link-layer address. Thus, any address ending with the same 32 bits maps to the same MAC address .Given that all IPv6 multicast addresses begin with ff, and the subsequent 8 bits are used for flags and scope information, this leaves 128 – 16 = 112 bits for representing 2112 groups. Thus, with the 32 bits of MAC-layer address available to encode these groups, there can be as many as 2112/232 = 280 groups that map to the same MAC address!

The IPv6-to-IEEE-802 MAC multicast address mapping uses the loworder 32 bits of the IPv6 multicast address as the suffix of a MAC address starting with 33:33. Because only 32 of the 112 multicast address bits are used, 280 groups are mapped to the same MAC-layer address.

2. Receiving Multicast Datagrams

Fundamental to multicasting is the concept of a process joining or leaving one or more multicast groups on a given interface on a host. (We use the term process to mean a program being executed by the operating system, often on

behalf of a user.) Membership in a multicast group on a given interface is dynamic—it changes over time as processes join and leave groups. In addition to joining or leaving groups, additional methods are needed if a process wishes to specify sources it cares to hear from or exclude. These are required parts of any API on a host that supports multicasting. For more information about API that host required to support, refer to RFC 3376 document. We use the qualifier “interface” because membership in a group is associated with an interface. A process can join the same group on multiple interfaces, multiple groups on the same interface, or any combination thereof

3. Host Address Filtering

To understand how the operating system processes received multicast datagrams for multicast groups that programs have joined, remember that that filtering takes place on each host’s network interface card (NIC), each time a frame is presented to it (e.g., by a bridge or switch) for possible reception.

Each layer implements filtering on some portion of the received message. MAC address filtering can take place in either software or hardware. Cheaper NICs tend to impose a larger processing burden on software because they perform fewer functions in hardware.

In a typical switched Ethernet environment, broadcast and multicast frames are replicated on all segments within a VLAN, along a spanning tree formed among the switches. Such frames are delivered to the NIC on each host which checks the correctness of the frame (using the CRC) and makes a decision about whether to receive the frame and deliver it to the device driver and network stack. Normally the NIC receives only those frames whose destination address is either the hardware address of the interface or the broadcast address.

However, when multicast frames are involved, the situation is somewhat more complicated.

NICs tend to come in two varieties. One type performs filtering based on the hash values of the multicast hardware addresses in which the host software has expressed interest, which means that some unwanted frames can always get through because of hash collisions. The other type listens for a finite table of multicast addresses, meaning that if the host needs to receive frames destined for more multicast addresses than can fit in the table, the NIC is put into a “multicast promiscuous” mode, in which case all multicast traffic is given to the host software. Hence, both types of interfaces require that the device driver or higher layer software perform checking that the received frame is really wanted. Even if the interface performs perfect multicast filtering (based on the 48bit hardware address), because the mapping from a multicast IPv4 or IPv6 address to a 48bit hardware address is not unique, filtering is still required. Despite this imperfect address mapping and hardware filtering, multicasting is still more efficient than broadcasting.

For NICs that support a multi-entry address table, the destination address on each received frame is compared against this table, and if the address is found in the table, the frame is received and processed by the device driver. The entries of this table are managed by the device driver software in combination with other layers of the protocol stack (such as the IPv4 and IPv6 implementations). Once the NIC hardware has verified a frame as acceptable (i.e., the CRC is correct, any VLAN tags match, and the destination MAC address matches an address entry in one or more of the NIC’s tables), the frame is passed to the device driver, where additional filtering is performed. First, the frame type must specify a protocol that is supported (e.g., IPv4, IPv6, ARP, etc.). Second, additional multicast filtering may be performed to check whether the host belongs to the addressed multicast group (indicated by the destination IP address). This is necessary for NICs that may generate false positives. The device driver then passes the frame to the next layer, such as IP, if the frame type specifies an IP datagram. IP performs more filtering, based on the source and destination IP addresses, and passes the datagram up to the next layer (such as TCP or UDP) if all is well. Each time UDP receives a datagram from IP, it performs filtering based on the destination port number, and sometimes the source port number, too. If no process is currently using the destination port number, the datagram is discarded and an ICMPv4 or ICMPv6 Port Unreachable message is normally generated. (TCP performs similar filtering based on its port numbers.) If the UDP datagram has a checksum error, UDP silently discards it. One of the primary motivations behind the development of the multicast addressing features was to avoid the overhead of broadcasting. Consider an application that is designed to use UDP broadcasts. If there are 50 hosts on the network (or VLAN), but only 20 are participating in the application, every time one of the 20 sends a UDP broadcast, the other 30 nonparticipating hosts have to process the broadcast, all the way up through the UDP layer, before the UDP datagram is discarded. The UDP datagram is discarded by these 30 hosts because the destination port number is not in use. The intent of multicasting is to reduce this load on hosts with no interest in the application. With multicasting, a host specifically joins one or more multicast groups. If possible, the NIC is told which multicast groups the host belongs to, and only those multicast frames associated with the IPlayer multicast groups are allowed through the filter in the NIC. All of this machinery offers less overhead imposed on the host, in exchange for additional complexity in managing multicast addresses and group memberships.

4. The Internet Group Management Protocol (IGMP) and Multicast Listener Discovery Protocol (MLD)

Two major protocols are used to allow multicast routers to learn the groups in which nearby hosts are interested: the Internet Group Management Protocol (IGMP) used by IPv4 and the Multicast Listener Discovery (MLD) protocol used by IPv6. Both are used by hosts and routers that support multicasting, and the protocols are very similar. These protocols let the multicast routers on a LAN (VLAN) know which hosts currently belong to which multicast groups. This information is required by the routers so that they know which multicast datagrams to forward on to which interfaces. In most cases, a multicast router only requires knowledge that at least one listening host is reachable by a particular interface, as link-layer multicast addressing (assuming it is supported) permits the multicast router to send link-layer multicast frames that will be received by all interested listeners. This allows a multicast router to do its job without keeping track of every individual host on each interface that might be interested in multicast traffic for a particular group. IGMP has evolved over time, and [RFC3376] defines version 3 (the most current one at the time of writing). MLD has evolved in parallel, and its current version (2) is defined in [RFC3810]. IGMPv3 and/or MLDv2 are required for supporting SSM( Source Specific Multicast). See [RFC4604] for more details on how these protocols are restricted when using only a single source per multicast group. Version 1 of IGMP was the first commonly used version of IGMP. Version 2 added the ability to leave groups more quickly (also supported by MLDv1). IGMPv3 and MLDv2 add the ability to select the sources of multicast traffic and are required for deployment of SSM. While IGMP is a separate protocol used with IPv4, MLD is really part of ICMPv6.

Multicast routers send IGMP (MLD) requests to each attached subnet periodically to determine which groups and sources are of interest to the attached hosts. Hosts respond with reports indicating which groups and sources are of interest. Hosts may also send unsolicited reports if membership changes occur.

Such routers are interested in ascertaining which multicast groups are of interest on each of its attached interfaces.

These routers require this information in order to avoid simply broadcasting all traffic out of every interface.

In Figure above, we can see how IGMP (MLD) queries are sent by multicast routers. These are sent to the All Hosts multicast address, 224.0.0.1 (IGMP), or the All Nodes link-scope multicast address, ff02::1 (MLD), and processed by every host implementing IP multicast. Membership report messages are sent by group members (hosts) in response to the queries but may also be sent in an unsolicited way from hosts that wish to inform multicast routers that group membership(s) and/or interest in particular sources has changed. IGMPv3 reports are sent to the IGMPv3-capable multicast router address 224.0.0.22. MLDv2 reports are sent to the corresponding MLDv2 Listeners IPv6 multicast address ff02::16. Note that multicast routers themselves may also act as members when they join multicast groups.

The encapsulations for IGMP and MLD are shown in Figure below

IGMP is encapsulated as a separate protocol in IPv4. MLD is a type of ICMPv6 message.

IGMP and MLD define two sets of protocol processing rules: those performed by hosts that are group members and those performed by multicast routers. Generally speaking, the job of the member hosts (which we will call “group members”) is to spontaneously report changes in interest in multicast groups and sources and to respond to periodic queries. Multicast routers send queries to ascertain whether any interest is present on an attached link for any groups, or for a specific multicast group and source. Routers also interact with wide area multicast protocols to bring the desired traffic to the interested hosts or prohibit traffic from flowing to uninterested hosts.

4.1 IGMP and MLD Processing by Group Members (“Group Member Part”)

The group members’ portion of IGMP and MLD is designed to allow hosts to specify what groups they are interested in and whether traffic sent from particular sources should be accepted or filtered out. This is accomplished by sending reports to one or more multicast routers (and participating hosts) attached to the same subnet. Reports may be sent as a result of receiving a query, or spontaneously (unsolicited) because of a local change in reception state (e.g., an application joins or leaves a group). IGMP reports take the form shown in Figure below

The IGMPv3 membership report contains group records for N groups. Each group record indicates a multicast address and optional list of sources.

Report messages are fairly simple. They contain a vector of group records , each of which provides information about a particular multicast group, including the address of the subject group, and an optional list of sources used for establishing filters

An IGMPv3 group record includes a multicast address (group) and an optional list of sources. Groups of sources are either allowed as senders (include mode) or filtered out (exclude mode). Previous versions of IGMP reports did not include a list of sources.

Each group record contains a type, the address of the subject group, and a list of source addresses to either include or exclude. There is also support for including auxiliary data, but this feature is not used by IGMPv3. Table below reveals the significant flexibility that can be achieved using IGMPv3 report record types.

Type values for IGMP and MLD source lists indicate the filtering mode (include or exclude) and whether the source list has changed

MLD uses the same values. A list of sources is said to refer to include mode or exclude mode. In include mode, the sources in the list are the only sources from which traffic should be accepted. In exclude mode, the sources in the list are the ones to be filtered out (all others are allowed). Leaving a group can be expressed as using an include mode filter with no sources, and a simple join of a group (i.e., for any source) can be expressed as using the exclude mode filter with no sources.The first two message types (0x01, 0x02) are known as current-state records and are used to report the current filter state in response to a query. The next two (0x03, 0x04) are known as filter-mode-change records, which indicate a change from include to exclude mode or vice versa. The last two (0x05, 0x06) are known as source-list-change records and indicate a change to the sources being handled in either exclude or include mode. The last four types are also described more generally as state-change records or state-change reports. These are sent as a result of some local state change such as a new application being started or stopped, or a running application changing its group/source interests. Note that IGMP and MLD queries/reports themselves are never filtered. MLD reports use a structure similar to IGMP reports but accommodate larger addresses and use an ICMPv6 type code of 143.

When receiving a query, group members do not respond immediately. Instead, they set a random (bounded) timer to determine when to respond. During this delay interval, processes may alter their group/source interests. Any such modifications can be processed together before a timer expires to trigger the report. In this way, once the timer does expire, the status of multiple groups can more likely be merged into a single report, saving overhead. The source address used for IGMP is the primary or preferred IPv4 address of the sending interface. For MLD, the source address is a link-local IPv6 address.

4.2 IGMP and MLD Processing by Multicast Routers (“Multicast Router Part”)

In IGMP and MLD, the job of the multicast router is to determine, for each multicast group, interface, and source list, whether at least one group member is present to receive corresponding traffic. This is accomplished by sending queries and building state describing the existence of such members based on the reports they send. This state is soft state, meaning that it is cleared after a certain amount of time if not refreshed. To build this state, multicast routers send IGMPv3 queries of the form depicted in Figure below

The IGMPv3 query includes the multicast group address and optional list of sources. General queries use a group address of 0 and are sent to the All Hosts multicast address, 224.0.0.1. The QRV value encodes the maximum number of retransmissions the sender will use, and the QQIC field encodes the periodic query interval. Specific queries are used before terminating traffic flow for a group or source/group combination. In this case (and all cases with IGMPv2 or IGMPv1), the query is sent to the address of the subject group.

The IGMP query message is very similar to the ICMPv6 MLD query. In this case, the group (multicast) address is 32 bits in length and the Max Resp Code field is 8 bits instead of 16. The Max Resp Code field encodes the maximum amount of time the receiver of the query should delay before sending a report, encoded in 100ms units for values below 128. For values above 127, the field is encoded as shown in Figure below

The Max Resp Code field encodes the maximum time to delay responses in 100ms units. For values above 127, an exponential value can be used to accommodate larger values.

This encoding provides for a possible range of (16)(8) = 128 to (31)(1024) = 31,744 (i.e., about 13s to 53 minutes). Using smaller values for the Max Resp Code field allows for tuning the leave latency (the elapsed time from when the last group member leaves to the time corresponding traffic ceases to be forwarded). Larger values of this field reduce the traffic load of the IGMP messages generated by members by increasing the likelihood of longer periods for reporting. The remaining fields in a query include an Internet-style checksum across the whole message, the address of the subject group, a list of sources, and the S, QRV, and QQIC fields with MLD. In cases where the multicast router wishes to know about interest in all multicast groups, the Group Address field is set to 0 (such queries are called “general queries”). The S and QRV fields are used for fault tolerance and retransmission of reports. The QQIC field is the Querier’s Query Interval Code. This value is the query sending period, in units of seconds and encoded using the same method as the Max Resp Code field (i.e., a range from 0 to 31,744). There are three variants of the query message that can be sent by a multicast router: general query, group-specific query, and group-and-source-specific query. The first form is used by the multicast router to update information regarding any multicast group, and for such queries the group list is empty.

Group-specific queries are similar to general queries but are specific to the identified group. The last type is essentially a group-specific query with a set of sources included. The specific queries are sent to the destination IP address of the subject group, as opposed to general queries that are sent to the All Systems multicast address (for IPv4) or the link-scope All Nodes multicast address for IPv6 (ff02::1).

The specific queries are sent in response to state-change reports in order to verify that it is appropriate for the router to take some action (e.g., to ensure that no interest remains in a particular group before constructing a filter). When receiving either filter-mode-change records or source-list-change records, the multicast router arranges to add new traffic sources and may be able to filter out traffic from certain sources. In cases where the multicast router is prepared to begin filtering out traffic that was flowing previously, it uses the group-specific query and group-and-source-specific query first. If these queries elicit no reports, the router is free to begin filtering out the corresponding traffic. Because such changes can significantly affect the flow of multicast traffic, state-change reports and specific queries are retransmitted.

Reference
1. TCP/IP Illustrate - Volume I
2. RFC 3376 IGMPv3.

Linux internals and network programming

My Blog List

Wednesday, August 24, 2016

Broadcast and Local Multicast in Networking

No comments:

Post a Comment

Blog Archive