The transmission systems of image, keyboard and mouse or KVM were created to make it easy for users to operate multiple computers at the same time.
Initially, they were local switches that allowed the sources located in the operator station to be viewed and managed using a single keyboard and mouse, but later the computers were moved to Data Processing Centers where there is better cooling and greater noise tolerance. This technology has had a great evolution, in which image and video compression have been two determining factors.
Over time, the resolution of these computers has progressively increased to 1920 x 1080 pixels, nowadays even 4k, which has led to two current design hardware proposals:
- Equipment that uses high compression to adapt to a 1Gb / s network (twisted pair cabling in Cat-5e or 6) in a compression ratio of 20 to 1.
- Equipment that requires 10 Gb / s networks to eliminate this compression or minimize it (cat-7 or 8) to a ratio of 3 to 1.
In both cases, the HDMI or DP signal is collected through encoders from each of the source outputs in addition to the USB, generating a flow to the local network. In the operator station, a processor is in charge of receiving and graphical representation on the monitors and also communicates in the opposite direction the coordinates of the position of the mouse and the character pressed on the keyboard of the operator, connected through the USB port to the processor.
With the appearance of packet switching networks, equipment that allow this transmission to be carried out through the IP network have been developed, providing the user with remote access not only in the room but from any point where access to it is possible.
The process of encoding, decoding and navigating the network takes time which is defined as latency. So that usability is not compromised, latency must be kept as low as possible which is why the digital signal compression process is used. At this point, it is important to highlight the differences between image compression and video compression, which can be combined simultaneously when sending a digital signal in order to obtain the necessary quality for each application.
Por un lado, en la compresión de imagen se reduce la información innecesaria que puede ser imperceptible para el ojo humano en cada fotograma, como por ejemplo el formato Motion JPEG (conjunto de imágenes JPEG secuenciadas). Por otro lado, en la compresión de video, se emplea la predicción interfotograma, como por ejemplo en los formatos MPEG o H.264, con el que se reduce el número de valores de pixeles codificados y enviados solamente a aquellos que han cambiado respecto a un fotograma anterior. Teniendo en cuenta que el vídeo es una secuencia de fotos que mostradas de modo secuencial nos dan la apariencia de movimiento, los algoritmos de compresión comparan las imágenes consecutivas y solo transmiten una imagen completa a unos intervalos regulares (I), mientras que entre estos solo mandan las diferencias entre la imagen anterior y siguiente.
On one hand, image compression reduces unnecessary information that may be imperceptible to the human eye in each frame, such as the Motion JPEG format (set of sequenced JPEG images). On the other hand, in video compression interframe prediction is used, as for example in the MPEG or H.264 formats, with which the number of encoded pixel values is reduced and sent only to those that have changed with respect to a previous frame. Considering that the video is a sequence of photos that are shown sequentially give us the appearance of movement, the compression algorithms compare consecutive images and only transmit a complete image at regular intervals (I), while between these only send the differences between the previous and next image.

The set of packets sent to the network or video stream is a sequence of information related to the group of pictures (Group of Pictures, GOP). Type I images (Intracoded frames) are the least compressed and are encoded using JPEG; then, those of type P (Predictive frames) are based on the differences with respect to a previous type I frame; and finally those of type B (Bidirectional frames) are based on the interpolation of a previous and a subsequent frame in the sequence.
The bandwidth consumed by the transmission of a video signal is directly related to the quality of the image to be transmitted and how changeable it is. Thus, if, for example, a full-screen text editor is used, the codec will emit very little information, while, if an action movie is viewed, it will generate a much larger stream, consuming more bandwidth. Encoders typically have dynamic bandwidths that can range from about 20Kb to 900Mb / s. The compression processing has been progressively improved until it is possible to transmit a 4k signal faithful to the original in color and definition and with continuous reproduction at an average bandwidth of 500 Mb / s.
At the same time, network electronics have evolved towards L3 or L2 + layer switches that have igmvp multicast functionalities. In a unicast data transfer, the connection is point-to-point, so if you want to view the bandwidth occupied in the network on several end devices, it will be the result of a signal by the number of unicast connections made. On the contrary, in multicast connections the signal is broadcast only once to the entire network without penalizing the bandwidth. The end devices pick up the signal that is circulating on the network.
In conclusion, the combination of existing technologies, both in encoders and network electronics, allows us to transmit high quality signals with low latency. The purpose of the parameterization of the devices is to reduce this time to a few milliseconds, maintaining the balance between the definition of the signal and its continuous representation.