Latency and Application Performance (1999)

Network Latency, or Network Delay, is generally described as the amount of time it will take a signal, or a data packet, to traverse a particular network path, starting from the device that initiated the transmission and ending at the destination device.

The latency from the source to the destination is often called end-to-end latency or one-way latency. Round-trip latency is that one-way latency plus the time it will take for a response signal to come back.

Network latency is usually measured in milliseconds (ms=1/1000 sec).

Network Latency is composed of 3 major elements that can each have a significant impact on the over all end-to-end latency. The three are:

1.Propagation delay

2.Serialization delay, and

3.Queuing delay

Propagation delay

Propagation delay is the time it takes a signal to physically traverse the path. This delay is directly linked with the distance between sender and receiver and the speed of light. It is common to assume that signals passing through wires or fibers travel at two thirds the speed of light. For high-performance networks, the overall end-to-end latency is mostly affected by the propagation delay, and therefore cannot be significantly reduced.

The consistency of the propagation delay depends on the route signals have to traverse. A static route (direct connection) would usually have fairly constant latency as there will be no route changes. A dynamic route will usually result in more latency variation (see Jitter)

Serialization delay

Serialization delay is the delay required from the sender to transmit a signal or packet onto the outgoing queue. For a given packet size and available bandwidth, serialization delay is a constant and can be easily calculated by dividing the packet size (in bits) by the available bandwidth (in bits per second)

The following table demonstrates serialization delays for different packet sizes through various bandwidths (approximate delay values in ms):

Packet Size (in bytes)	Bandwidth (in Kbps)
Packet Size (in bytes)	56	128	256	512	1024	1544	2048
256	37	16	8	4	2	1	1
512	73	32	16	8	4	3	2
1024	149	64	32	16	8	5	4
1500	214	94	47	23	12	8	6

Queuing delay

Jitter:

Queuing delay is the total of the delays caused whenever a packet is received, stored and transmitted by queues. It is the most variable delay component in modern network, and depends on the number of queues in the route and the queue lengths: In a heavily loaded and congested network the queues will be long and the queuing delay will be high accordingly. In many cases this will be the main component in the overall latency. In networks that are not congested this delay would often be negligible.

Jitter refers to variation in the network latency, usually described as the standard deviation from the average delay. In most cases it is caused by fast changing network conditions, high buffer congestions or route changes.

Latency and Bandwidth

What is Bandwidth?

Bandwidth is the amount of information that can be transferred over a connection in a given period of time. Bandwidth is always measured in bits per second (bps, Kbps, Mbps etc.).

Note –bandwidth units and throughput units are quite confusing, as throughput is always measured in bytes per second (Bps, KBps, MBps, with uppercase ‘B’). The conversion simply requires dividing the bps rate in 8, to get to Bps (8bps = 1Bps, 1024Kbps=128KBps etc.).

People often wrongly believe that having more bandwidth can reduce the network latency. As described above, higher bandwidth does mean less serialization delay, but serialization delay is, in many cases, a negligible component contributing to the overall delay. Therefore adding bandwidth can result in an oversized (and overpriced) pipe that has very little or no effect at all on the latency.

Latency, Bandwidth and Link ‘Speed’

Latency and bandwidth together determine the conceived “speed” of a connection. The latency determines how long it takes for the information to get from the sender to the receiver, while the bandwidth determines how much information can be served at each moment in time. Both will affect the end-user experience, but each one will have a different impact.

To demonstrate the difference between bandwidth and latency, we can think of the network as a means to deliver ‘things’. These ‘things’ can be data packets when we talk about IT, or they can be actual packages shipped from one place to the other if we take an example of a package delivery company.

In the package delivery network, latency can be considered as the amount of time it will take a truck to drive from one station to the other, to deliver a pile of ‘things’. Since the distance between the two stations is a constant, and since there is a maximum limit to the speed that this truck can travel, therefore (assuming no traffic jams) this time will usually be quite a known and irremovable limit. Bandwidth in this case will be the size of the pile, or, in other words, how much load this truck can carry on each trip.

In order to improve the level of service in the package delivery network we can increase the bandwidth by buying a larger truck. This will allow us to send more ‘things’ in a single trip, but still each shipment will have to go through the same distance and speed limitation between the two stations. If it took 2 hours for a small truck to get from point A to B, it will still take 2 hours with the new, bigger truck, although this time more goods will be delivered.

So, coming back to IT, the conclusion from this is that if we need to send big amounts of data from one point to the other we should buy and use more bandwidth. However this will not reduce the time it will physically take the data to get there. The amount of bandwidth required is strictly related to the amount of data we send, and has very little to do with the ‘trip’ time. Latency, in its simplest form, will always remain unchanged, even if we could theoretically get unlimited bandwidth.

It is important to mention here that latency does have a direct correlation to the amount of congestion on a network link. Therefore, while adding bandwidth will not reduce the network latency, the opposite may and will probably happen – having less than the required bandwidth will have a negative impact on the network as it will create higher congestions and higher latencies and therefore reduce the throughput (see below, Latency and Bandwidth Utilization). This highlights the importance of right-sizing network capacity – too much bandwidth can be extremely expensive and unnecessary, while too little bandwidth can have a catastrophic impact on network, application performance and end-users’ experience. When it comes to bandwidth requirements, the magic numbers should be carefully calculated, considering the existing latencies and expected throughput, to make sure the right decisions are made.

Latency and Bandwidth Utilization

What is Bandwidth Utilization

Bandwidth utilization is a number indicating the usage of a certain network resource at a given moment. The number is a percentage, corresponding to how much of the theoretical maximum available bandwidth is currently utilized by existing network traffic.

High levels of bandwidth utilization can cause congestion on network devices, which may result in the build up of long queues and possibly dropped packets. At some point, as the level of theoretical maximum bandwidth is approached, further negative factors (such as increased serialization delay) can kick in and deteriorate throughput even further.

Latency during Congestion

As mentioned above, latency can change as a result of network congestion. With the increase in traffic load and bandwidth utilization, it is possible that latency will also increase as buffers begin to populate on the path between the sender and receiver. Measuring latency while considering network load can get complicated; to fully characterize the latency versus load, continuous measurements must be taken at various network loads.

Bandwidth Utilization Patterns

Bandwidth utilization usually follows certain patterns. For example, many enterprise networks follow a daily pattern where bandwidth utilization is low from late afternoon (about 1-2 hours before office closing hours) until the beginning of the next business day. Network traffic then increases during the workday except for a slight decline in utilization during lunchtime. On a monthly scale, there may be an increase in utilization towards the end of the month and in the first few days, as sales people and accounting are using more enterprise resources. And of course, on a yearly scale, the same patterns apply for the end of sales quarters and the end of the fiscal year.

When considering bandwidth requirements and latency impact it is always important to consider all different combinations. It is important to know what the best case scenario is, when there is very little bandwidth utilization and congestion and therefore predictable latency, but it is even more important to consider all possible worst case scenarios, when bandwidth utilization reaches maximum levels, the probability for congestion rises and latency peaks.

The Effects of Latency on Network Performance

The TCP Protocol

TCP is an IP protocol based on several guaranteed delivery mechanisms. A TCP transmission begins with the sending device initiating a session with the receiving device. Then it waits until an acknowledgement is received before the actual transmission starts. After the session is set up and the transmission begins, the receiver keeps informing the sender which packets were received by sending an acknowledgement packet. Acknowledgements are sent after a certain number of packets have been received, and if an acknowledgement is not received after some time, the packets are resent.

TCP includes another mechanism called ‘Windows Size’. This dynamic mechanism allows the communicating parties to adjust to the connection’s current capacity. This is achieved by adjusting the number of packets sent before an acknowledgement is required. The window size is gradually increased until the sending rate exceeds the optimal rate for the network connection. At that point some packets will be dropped by the network (Packet Loss), and when the sender finds out some data was lost it reacts by cutting the sending rate in half. This process of reducing and increasing the windows size continues until an optimal rate is achieved.

TCP and Latency

These mechanisms cause TCP to be extremely sensitive to latency. As network latency increases, the sender keeps idle waiting for acknowledgements instead of constantly sending packets. High latency also causes the windows size process to work much slower, because it also depends on the rate at which acknowledgements are received. This results in a direct inverse relationship between latency and TCP throughput. As network latency increases, TCP throughput decreases, as illustrated by table 1 below.

See what happens to the TCP throughput over a Fast Ethernet connection (100Mbps), when latency is introduced between the sender and receiver:

Latency	Throughput	Throughput reduction due to latency
0 ms	93.5 Mbps	0 %
30 ms	16.2 Mbps	82 %
60 ms	8.1 Mbps	91 %
90 ms	5.3 Mbps	94 %

Table 1 – Effect of Latency on TCP Throughput

As described above, with the increase in latency the sender is left idle for long periods waiting for the receiver to acknowledge the data before the transmission can continue. During that time, the receiver also waits and buffers packets until it can assemble entire messages. When there is a large number of sessions being handled concurrently (for example, when the receiver is a server) this requires constant increase in buffer use, and may eventually cause overall performance degradation on the receiver.

TCP, Latency and Packet Loss

Packet Loss makes TCP performance even worse. As described above, packet loss usually occurs as a result of network congestion and buffer overflows. When a packet loss is detected on the network, the TCP window immediately shrinks, causing the sender to sit idle even longer while waiting for acknowledgements for even smaller chunks of data. Even worse – the acknowledgement packets themselves may be lost, and that results in the sender having to wait for the defined timeout period. When this happens, the packets that should have been acknowledged by the lost acknowledgement will be retransmitted even though they were most probably transmitted properly in the first place. Of course, this kind of retransmissions increases the load on the already congested link, and results in a severe degradation of the network performance.

The following table illustrates the effect of latency and packet loss on TCP throughput over a fast Ethernet (100Mbps) link. Note how much lower the TCP throughput may become when only 2 % packet loss is introduced (the throughput reduction percentage is calculated based on the values from Table 1, and exclude the effect of latency).

Latency	Packet Loss	Throughput	Throughput reduction due to packet loss
0 ms	2 %	3.7 Mbps	96 %
30 ms	2 %	1.6 Mbps	90 %
60 ms	2 %	1.3 Mbps	83 %
90 ms	2 %	0.9 Mbps	83 %

Table 2 – Effect of Latency and 2% Packet Loss on TCP Throughput

The UDP Protocol

UDP is another popular IP protocol that offers a direct way of sending and receiving packets. Unlike TCP it does not include any delivery assurance mechanism. It does not provide sequencing of packets and does not involve dividing and reassembling messages. A device that sends UDP packets assumes that they reach the destination, and there is no alert if packets have arrived or not. The UDP protocol is very efficient in saving on processing resources because it involves exchanging small data units and very little reassembling. The UDP protocol is typically used for streaming media applications where an occasional lost packet does not matter.

UDP and Fixed Pattern Latency

Unlike TCP, Latency and throughput are completely independent with UDP traffic. In other words, if latency goes up or down, UDP throughput remains the same. As long as the latency pattern remains fixed (no Jitter), the only effect that latency may have on a UDP stream is an increased delay of the entire stream.

Of course, high latency, even if it has a fixed pattern, will have a significant effect on interactive real-time application. Clearly it would be very difficult to have a coherent voice over IP conversation over a high-latency link, simply because each side will have to wait a few seconds until the voice packets cross the wire. The human ear can ‘accept’ round trip latencies of up to 300 milliseconds or so, but with any higher latency users will probably experience annoying talk-over effects, and a conversation will become nearly impossible.

UDP and Jitter

UDP’s relative immunity to latency ends when it comes to Jitter. Jitter may have severe effects on UDP applications, especially voice and video over IP. Jitter of approximately 50 milliseconds, and often lower, can result in both increased latency (as a result of buffering) and packet loss. When humans communicate they need to hear what the others say in the same order in which it was said; otherwise it would not make much sense. Unfortunately, jitter causes packets to arrive at their destination with different timing than the way they originated, and sometimes even in a different order (a phenomenon

known as Reordering, where some packets arrive faster and some slower than they should). The words may sound intermittent and distorted, to a degree that the human voice may become completely incomprehensible.

Jitter can sometimes be overcome with a unique buffering method known as Jitter-Buffering. Some voice and video over IP devices implement a “jitter buffer” on the receiving device (either by means of software or hardware), that allows them to collect packets in a buffer and put them back together in the proper timing and order before the receiver gets them. This works, but in a balancing act, as it tends to produce overall higher fixed pattern latency and could hurt interactivity. And of course, it is also important to keep in mind that due to the limited size of the buffer, it may well happen that while normally it will help achieve a better voice/video quality, once it gets full it could become yet another source for packet loss…

September 2, 1999 zivcohen.info

2021 – White Paper

What is Latency?