Terminology
This glossary explains the concepts, terms, and abbreviations used throughout the VASTreaming documentation. Understanding these terms will help you work more effectively with the VASTreaming libraries.
Audio and Video Fundamentals
Audio renderer is a component responsible for playing audio samples through speakers or other audio output devices. VASTreaming provides audio renderers that handle sample format conversion, resampling, buffering, and synchronization with video playback.
Bitrate is the amount of data processed per unit of time, typically measured in bits per second (bps), kilobits per second (kbps), or megabits per second (Mbps). Higher bitrates generally result in better quality but require more bandwidth and storage.
Codec is a portmanteau of "coder-decoder" or "compressor-decompressor." A codec is software or hardware that compresses and/or decompresses digital media. Examples include H.264 (video) and AAC (audio).
Container (also called wrapper or format) is a file format that can contain multiple types of data compressed using different codecs. Examples include MP4, Transport Stream (TS), and Matroska (MKV).
Elementary stream is a stream containing a single type of data (video, audio, or metadata) encoded with a specific codec and transferred at a consistent bitrate.
Frame types (I, P, B) are the three types of frames used in video compression. I-frames (Intra-coded frames) are complete images that can be decoded independently. P-frames (Predicted frames) contain only differences from previous frames. B-frames (Bidirectional frames) reference both previous and future frames for maximum compression.
GOP (Group of Pictures) is a sequence of video frames starting with an I-frame (keyframe) followed by P-frames and optionally B-frames. GOP size affects both compression efficiency and random access capability—larger GOPs provide better compression but make seeking less precise.
Keyframe is a video frame that contains a complete image and can be decoded without reference to other frames. Also known as an I-frame or IDR frame in H.264/H.265. Keyframes are essential for seeking and stream switching.
Latency is the delay between when media is captured and when it is displayed to the viewer. Low latency is critical for real-time applications like video conferencing and live broadcasting.
Muxing (multiplexing) is the process of combining multiple streams (video, audio, subtitles) into a single container format. Demuxing is the reverse process of extracting individual streams from a container.
Pixel format describes how pixel color data is stored in memory for uncompressed video frames. Common formats include planar YUV (I420, NV12), packed YUV (YUY2, UYVY), and RGB variants (RGB24, BGRA32). The choice of pixel format affects memory layout, processing efficiency, and compatibility with hardware accelerators.
Sample in media processing refers to a discrete unit of media data. For audio, a sample is a single measurement of the audio signal. For video, a sample typically refers to a single frame. In VASTreaming, samples are delivered through the NewSample event.
Sample format describes how individual audio samples are represented in memory. Common formats include 16-bit signed integer (S16), 32-bit floating point (F32), and 24-bit packed integer. The sample format determines dynamic range, precision, and compatibility with audio processing pipelines.
Timescale is the number of time units per second used to express timestamps and durations in media containers. For example, a timescale of 90000 (common in MPEG) means each time unit represents 1/90000th of a second. Timescales enable precise timing without floating-point arithmetic.
Transcoding is the process of converting media from one codec or format to another. This typically involves decoding the source media and re-encoding it with different parameters or a different codec.
Video renderer is a component responsible for displaying video frames on screen. VASTreaming provides video renderers for various platforms and UI frameworks, handling color space conversion, scaling, and synchronization with the display refresh rate.
Video Codecs
H.264 (ISO/IEC 14496-10, ITU-T Rec. H.264, MPEG-4 Part 10 AVC) is a widely-used block-oriented motion-compensation-based video compression standard. It offers excellent compression efficiency and is supported by virtually all devices and platforms.
H.265 (ISO/IEC 23008-2, ITU-T Rec. H.265, HEVC) or High Efficiency Video Coding is a video compression standard that provides approximately double the compression ratio of H.264 at equivalent quality. It is increasingly used for 4K and higher resolution content.
MJPEG (Motion JPEG) is a video compression format where each frame is independently compressed as a JPEG image. While less efficient than inter-frame codecs like H.264, MJPEG is simple to implement and allows easy frame-by-frame editing.
MPEG-1 (ISO/IEC 11172-2) is the first video compression standard in the MPEG family. It was designed for Video CD and early digital video applications, providing reasonable quality at bitrates around 1.5 Mbps.
MPEG-2 (ISO/IEC 13818-2, ITU-T Rec. H.262) is a video compression standard widely used in DVD, digital television broadcasting (DVB, ATSC), and transport streams. It supports interlaced video and higher bitrates than MPEG-1.
MPEG-4 Part 2 (ISO/IEC 14496-2) is a video compression standard that improved upon MPEG-2 with better compression efficiency. It includes profiles like Simple Profile and Advanced Simple Profile (ASP), used in early web video and DivX/Xvid codecs.
Audio Codecs
AAC (ISO/IEC 14496-3, MPEG-4 Part 3 Audio) stands for Advanced Audio Coding, a lossy digital audio compression standard. AAC provides better sound quality than MP3 at the same bit rate and is widely used in streaming applications.
AMR (Adaptive Multi-Rate) is a speech audio codec optimized for voice encoding. AMR-NB (Narrowband) operates at 8 kHz sample rate for telephony, while AMR-WB (Wideband) operates at 16 kHz for improved voice quality. Both are widely used in mobile networks and VoIP.
G.711 is an ITU-T standard for audio companding used in telephony. It defines two encoding laws: μ-law (used in North America and Japan) and A-law (used in Europe and most other regions). G.711 provides toll-quality voice at 64 kbps and is commonly used in VoIP and IP cameras.
MP1/MP2/MP3 (ISO/IEC 11172-3) are MPEG audio compression formats. MP1 (MPEG-1 Audio Layer I), MP2 (MPEG-1 Audio Layer II), and MP3 (MPEG-1 Audio Layer III) use lossy compression to significantly reduce file sizes while maintaining acceptable audio quality.
Opus is an open, royalty-free audio codec designed for interactive speech and music transmission over the Internet. It offers low latency and excellent quality across a wide range of bitrates, making it ideal for WebRTC applications.
PCM (Pulse Code Modulation) is uncompressed digital audio where analog signals are sampled at regular intervals and quantized to the nearest value. PCM is used as an intermediate format for processing and as a final format when quality is paramount.
Streaming Protocols
HLS (RFC 8216) stands for HTTP Live Streaming, a media streaming protocol developed by Apple. It segments media into small files delivered over standard HTTP, making it firewall-friendly and easy to deploy with CDNs.
LL-HLS (Low-Latency HLS) extends HLS to enable low-latency streaming while maintaining compatibility. It reduces latency from the typical 20-30 seconds to 2-5 seconds through partial segment delivery and playlist delta updates.
MPEG-DASH (Dynamic Adaptive Streaming over HTTP) is an adaptive bitrate streaming standard that enables high-quality streaming from conventional HTTP web servers. Like HLS, it uses segmented delivery but is codec-agnostic.
NDI (Network Device Interface) is a royalty-free standard developed by NewTek enabling video-compatible products to communicate, deliver, and receive high-quality video over a local network with low latency.
RTP (Real-time Transport Protocol) is a network protocol for delivering audio and video over IP networks. It handles timing reconstruction, loss detection, and payload identification for real-time media.
RTMP (Real Time Messaging Protocol) is a streaming protocol developed by Adobe. It maintains persistent TCP connections for low-latency communication and is commonly used for ingesting live streams to servers.
RTSP (RFC 2326) stands for Real Time Streaming Protocol. It is used for establishing and controlling media sessions between endpoints, typically paired with RTP for actual media delivery. RTSP supports a wide variety of codecs and is commonly used for IP cameras.
SDP (Session Description Protocol) is a format for describing multimedia communication sessions. It specifies media types, codecs, transport addresses, and other parameters needed to establish streaming sessions.
SRT (Secure Reliable Transport) is an open-source protocol optimized for low-latency live video streaming. It provides encryption, error correction, and congestion control, making it suitable for contribution feeds over unpredictable networks.
WebRTC (Web Real-Time Communication) is a standard enabling real-time, peer-to-peer communication directly in web browsers and mobile applications. It provides ultra-low latency (typically under 500ms) and is used for video conferencing, live streaming, and interactive applications.
WebSocket is a communication protocol providing full-duplex communication channels over a single TCP connection. In streaming contexts, it is often used for signaling and control messages alongside media protocols.
WebTransport is a modern web API for bidirectional communication between a client and server using HTTP/3 (QUIC). It offers lower latency than WebSocket while supporting both reliable and unreliable data delivery, making it suitable for real-time media applications.
Container Formats
MP4 (ISO/IEC 14496-14) is a digital multimedia container format commonly used to store video, audio, subtitles, and metadata. It is based on the ISO Base Media File Format and is widely supported across devices and platforms.
Transport Stream (TS, MPEG-TS) is a container format specified in MPEG-2 Part 1. It is designed for transmission over unreliable media and is used in broadcast television, IPTV, HLS, and other streaming applications.
Network and Infrastructure
ICE (Interactive Connectivity Establishment, RFC 8445) is a framework for NAT traversal used by WebRTC and other peer-to-peer protocols. ICE coordinates STUN and TURN servers to find the best connection path between peers, even when they are behind firewalls or NAT devices.
Multicast is a network communication method where data is sent from one source to multiple destinations simultaneously. Unlike unicast (one-to-one), multicast reduces bandwidth usage when delivering the same content to many receivers.
NTP (Network Time Protocol) is a networking protocol for clock synchronization between computer systems. Accurate time synchronization is essential for media streaming to ensure proper playback timing.
ONVIF (Open Network Video Interface Forum) is a global standard for interoperability of IP-based physical security products. VASTreaming supports ONVIF for IP camera discovery, configuration, and PTZ control.
PTZ (Pan-Tilt-Zoom) refers to motorized camera control capabilities. Pan rotates the camera horizontally, tilt adjusts vertical angle, and zoom changes the focal length. PTZ cameras are commonly controlled via ONVIF or proprietary protocols in surveillance and broadcast applications.
STUN (Session Traversal Utilities for NAT, RFC 5389) is a protocol that allows clients behind NAT to discover their public IP address and port mapping. It is used in WebRTC and VoIP to enable direct peer-to-peer connections when possible.
TURN (Traversal Using Relays around NAT, RFC 5766) is a protocol that relays media traffic through a server when direct peer-to-peer connection is not possible. TURN servers act as intermediaries, ensuring connectivity even in restrictive network environments at the cost of additional latency and bandwidth.
Publishing point is a virtual endpoint on a media server that serves content to clients. When a client connects to a publishing point, the server manages the connection and streams the associated content.
Unicast is point-to-point communication where data is sent from one sender to one receiver. Most Internet streaming uses unicast delivery.
URI (Uniform Resource Identifier) is a string that identifies a resource. In streaming, URIs specify the protocol and location of media sources (e.g., rtsp://camera/stream).
VOD (Video On Demand) refers to accessing pre-recorded content that can be played, paused, and seeked at any point, in contrast to live streaming.
VASTreaming Concepts
Media descriptor is a collection of MediaType objects describing all streams (video, audio, metadata) of a particular source, sink, or publishing point.
Sink in VASTreaming refers to any destination that consumes media data. Sinks implement the IMediaSink interface and include file writers, network publishers, and rendering surfaces.
Source in VASTreaming refers to any entity that produces media data. Sources implement the IMediaSource interface and include network streams, file readers, and capture devices.
Software and Frameworks
ASIO (Audio Stream Input/Output) is a low-latency audio driver protocol developed by Steinberg. ASIO bypasses the operating system's audio mixing layer to provide direct access to audio hardware, achieving latencies as low as 1-2 milliseconds for professional audio applications.
CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform and API for GPU acceleration. VASTreaming leverages CUDA for hardware-accelerated video encoding (NVENC) and decoding (NVDEC) on NVIDIA graphics cards, significantly improving performance for high-resolution and multi-stream scenarios.
Direct3D is Microsoft's graphics API for rendering 3D graphics on Windows. Part of the DirectX family, Direct3D provides hardware-accelerated video rendering and is used by VASTreaming for efficient video display and GPU-based video processing on Windows platforms.
DirectShow is a multimedia framework and API developed by Microsoft for handling media files and streams in Windows applications. While largely superseded by Media Foundation, it remains supported for compatibility.
FFmpeg is a free, open-source project providing libraries and tools for handling multimedia data. It supports encoding, decoding, transcoding, streaming, and filtering of virtually all media formats.
.NET MAUI (Multi-platform App UI) is Microsoft's cross-platform framework for building native mobile and desktop applications using C# and XAML. It is the evolution of Xamarin.Forms and supports Android, iOS, macOS, and Windows from a single codebase.
Media Foundation is Microsoft's modern multimedia framework for Windows (Vista and later, including Windows 11). It provides a pipeline architecture for media processing and supports hardware acceleration.
Metal is Apple's low-level graphics and compute API for iOS, macOS, and tvOS. It provides near-direct access to the GPU for high-performance rendering and parallel computation, and is used by VASTreaming for hardware-accelerated video processing on Apple platforms.
OBS (Open Broadcaster Software) is free, open-source software for video recording and live streaming. It is commonly used to capture and stream content to platforms via RTMP.
OpenGL (Open Graphics Library) is a cross-platform API for rendering 2D and 3D graphics. In media applications, it is used for video rendering and hardware-accelerated processing.
WASAPI (Windows Audio Session API) is Microsoft's low-level audio interface for Windows. It provides applications with direct access to audio devices for both playback and capture.
WPF (Windows Presentation Foundation) is Microsoft's UI framework for Windows desktop applications. VASTreaming provides integration components for displaying video in WPF applications.