Table of Contents

Advanced Capture and Mixing

The ExtendedCapturePage demonstrates multi-source video mixing with compositing, overlays, visual effects, and live streaming. It extends the Simple Capture page with the ability to combine multiple video and audio inputs into a single mixed output.

Overview

The ExtendedCapturePage performs the following:

  1. Enumerates video devices, audio devices, and displays
  2. Creates a MixingSource that composites multiple inputs into a single output
  3. Supports several simultaneous input sources: camera, microphone, screen capture, video file, network stream, user-generated frames, text overlay, and logo overlay
  4. Arranges sources into layers with configurable positioning, z-order, and visual effects
  5. Previews the mixed output locally or the direct camera feed
  6. Streams the mixed output to a remote server (RTMP, RTSP, etc.)
  7. Dynamically adds and removes sources at runtime without stopping sessions

Input Sources

The mixing source accepts the following inputs, each toggled independently via checkboxes:

Source Type Description
Camera IVideoCaptureSource2 Hardware camera capture
Microphone IAudioCaptureSource2 Hardware microphone capture
Screen IScreenCaptureSource Display or window capture
Video file IsoSource MP4 file played in a loop
Network stream IMediaSource Remote stream (RTSP, RTMP, etc.)
User source VirtualNetworkSource Programmatically generated frames
Text overlay Text content Dynamic timestamp updated every second
Logo overlay Image content Static watermark image

Creating the Mixing Source

this.mixingSource = new VAST.Image.Mixing.MixingSource();
this.mixingSource.AddRef();

this.mixingSource.Parameters.VideoDecoderParameters.PreferredMediaFramework = videoFramework;
this.mixingSource.Parameters.VideoDecoderParameters.AllowHardwareAcceleration = allowHardwareAcceleration;
this.mixingSource.Parameters.AudioDecoderParameters.PreferredMediaFramework = audioFramework;
this.mixingSource.Parameters.AudioDecoderParameters.AllowHardwareAcceleration = false;
this.mixingSource.Parameters.VideoEncoderParameters.PreferredMediaFramework = videoFramework;
this.mixingSource.Parameters.VideoEncoderParameters.AllowHardwareAcceleration = allowHardwareAcceleration;
this.mixingSource.Parameters.AudioEncoderParameters.PreferredMediaFramework = audioFramework;
this.mixingSource.Parameters.AudioEncoderParameters.AllowHardwareAcceleration = false;

MixingSource composites all input sources into a single output with video and audio tracks. AddRef() is called because the mixing source is shared between the preview and streaming sessions.

Decoder and encoder parameters are configured with the selected framework. The encoding framework options are the same as in the Simple Capture page.

Configuring the Descriptor

The mixing scene is defined by a Descriptor that specifies global options, input sources, output tracks, and scene composition:

VAST.Image.Mixing.Descriptor descriptor = new VAST.Image.Mixing.Descriptor
{
    AllowVideoProcessing = true,
    AllowAbsoluteTimestamps = true,
    Processing = new VAST.Image.Mixing.Processing
    {
        VideoProcessing = new VAST.Image.Mixing.VideoProcessing
        {
            Discard = !allowVideo,
        },
        AudioProcessing = new VAST.Image.Mixing.AudioProcessing
        {
            Discard = !allowAudio,
        }
    }
};

AllowVideoProcessing enables layer compositing and visual effects. AllowAbsoluteTimestamps enables absolute timestamp mode for accurate source synchronization. Discard controls whether video or audio output is generated.

The descriptor is then populated with output tracks, input sources, and scene composition before being applied via mixingSource.Update(descriptor).

Output Tracks

When streaming is active, the video track is configured for H.264 encoding:

var track = new VAST.Image.Mixing.VideoTrack
{
    Index = trackIndex,
    Width = outputWidth,
    Height = outputHeight,
    Framerate = outputFramerate,
    Codec = VAST.Common.Codec.H264,
    Bitrate = videoBitrate,
    KeyframeInterval = keyframeInterval,
    Profile = 66, // baseline
    Level = 31,
};

When only preview is active, the track uses uncompressed output to save encoding resources:

track.Codec = VAST.Common.Codec.Uncompressed;
track.PixelFormat = VAST.Common.PixelFormat.None;

Audio track configuration follows the same pattern — AAC for streaming, PCM for preview only. On Android, the audio track uses the capture device's native sample rate and channels because audio resampling is not available on that platform.

Adding Sources

Each source is added to the descriptor's Sources list as a Source. The source index in this list is used later for layer composition.

Camera and Microphone

this.activeVideoCaptureSource = VAST.Media.SourceFactory.CreateVideoCapture(
    this.videoDevice.DeviceId, this.videoCaptureMode);
this.activeVideoCaptureSource.Rotation = this.videoRotation;
this.activeVideoCaptureSource.AddRef();

this.videoCaptureSourceIndex = descriptor.Sources.Count;
descriptor.Sources.Add(new VAST.Image.Mixing.Source { MediaSource = this.activeVideoCaptureSource });

Capture sources are created via SourceFactory and added using the MediaSource property. AddRef() is called because capture sources are shared between the mixing source and the preview renderer.

Screen Capture

this.activeScreenCaptureSource = VAST.Media.SourceFactory.CreateScreenCapture();
this.activeScreenCaptureSource.DeviceId = this.screen.DeviceId;
this.activeScreenCaptureSource.Region = new VAST.Common.Rect {
    Left = this.screen.Location.Left, Top = this.screen.Location.Top,
    Right = this.screen.Location.Right, Bottom = this.screen.Location.Bottom };
this.activeScreenCaptureSource.ShowMouse = true;
this.activeScreenCaptureSource.AddRef();

Available displays are enumerated via DisplayHelper. The screen capture source captures the selected display area with the mouse cursor visible.

Video File

var fileSource = new VAST.File.ISO.IsoSource();
fileSource.Stream = stream;
fileSource.PlaybackRate = 1;
fileSource.Loop = true;
descriptor.Sources.Add(new VAST.Image.Mixing.Source { MediaSource = fileSource });

A video file (video.mp4) is loaded from the app package and played in an endless loop. On Android, the app package stream is not seekable, so it is copied into a MemoryStream first.

Network Stream

descriptor.Sources.Add(new VAST.Image.Mixing.Source { Uri = tboxStreamingSource.Text });

A remote stream is added by URI. The mixing source handles connection, decoding, and synchronization automatically.

User Source (Manual Frame Pushing)

this.userSource = new VAST.Network.VirtualNetworkSource();
this.userSource.AddRef();
this.userSource.AddStream(new VAST.Common.MediaType
{
    ContentType = VAST.Common.ContentType.Video,
    CodecId = VAST.Common.Codec.Uncompressed,
    PixelFormat = VAST.Common.PixelFormat.BGRA,
    Width = this.outputWidth,
    Height = this.outputHeight,
    Framerate = this.outputFramerate,
});

A VirtualNetworkSource is created for pushing user-generated frames. The demo generates an animated bouncing rectangle using SkiaSharp and pushes frames at 30 fps:

VAST.Common.VersatileBuffer vb = VAST.Media.MediaGlobal.LockBuffer(imageSize);
vb.Append(bmpImage.GetPixels(), imageSize);
vb.Pts = vb.Dts = currentVideoTimestamp;
vb.StreamIndex = 0;
this.userSource.PushMedia(vb);
vb.Release();

Text Overlay

descriptor.Sources.Add(new VAST.Image.Mixing.Source
{
    Content = DateTime.Now.ToString(),
    Format = "text",
    HorizontalAlignment = VAST.Common.HorizontalAlignment.Left,
    VerticalAlignment = VAST.Common.VerticalAlignment.Top,
    Decoration = new VAST.Image.Mixing.Decoration
    {
        FontFamily = "Calibri",
        Size = 30,
        Bold = true,
        Italic = true,
        Color = "#FFFFFF00",
        OutlineColor = "#FF000000",
        OutlineWidth = 1,
    }
});

Text sources use the Content property with Format = "text". The Decoration object configures font, size, style, color, and outline. A background task updates the text content every second by re-calling updateSources.

Logo Overlay

descriptor.Sources.Add(new VAST.Image.Mixing.Source
{
    Content = stream, // image stream
    Format = "image"
});

Image sources use the Content property with Format = "image" and a Stream containing the image data.

Scene Composition

After adding sources, the scene defines how they are composited into layers:

descriptor.Processing.VideoProcessing.Mixing = new VAST.Image.Mixing.VideoMixing
{
    Type = VAST.Image.Mixing.VideoMixingType.All,
    Layers = new List<VAST.Image.Mixing.Layer>()
};

VideoMixingType.All composites all layers in order. Each Layer specifies which source it renders, its position, stretch mode, and optional visual effects.

Layer Layout

Layer Position Description
Camera Full frame Primary background with brightness, contrast, and chroma key
Screen capture Full frame Display capture, z-order configurable relative to camera
Video file Bottom-left quadrant Picture-in-picture overlay
Network stream Bottom-right quadrant Picture-in-picture overlay
Text overlay Bottom strip Dynamic timestamp text
Logo overlay Top-left corner Static watermark (245x59 pixels)
User source Full frame Animated overlay with transparency

Each layer uses LayoutType.Manual with explicit Location rectangles and StretchType.Preserve to maintain aspect ratio:

new VAST.Image.Mixing.Layer
{
    Sources = new List<int>(new int[] { this.videoFileSourceIndex }),
    Layout = VAST.Image.Mixing.LayoutType.Manual,
    Stretch = VAST.Image.Mixing.StretchType.Preserve,
    Location = new VAST.Common.Rect(50, this.outputHeight / 2 + 50,
        this.outputWidth / 2 - 50, this.outputHeight - 50)
}

Visual Effects

The camera layer supports brightness, contrast, and chroma key (green screen) effects:

new VAST.Image.Mixing.Layer
{
    Sources = new List<int>(new int[] { this.videoCaptureSourceIndex }),
    Layout = VAST.Image.Mixing.LayoutType.Manual,
    Stretch = VAST.Image.Mixing.StretchType.Preserve,
    Location = new VAST.Common.Rect(0, 0, this.outputWidth, this.outputHeight),
    BrightnessAdjustment = (float)this.sliderBrightness.Value,
    ContrastAdjustment = (float)this.sliderContrast.Value,
    ChromaKeyColor = chromaKeyColor,
    ChromaKeyThreshold = 0.07f + (float)this.sliderChromaKeyThreshold.Value / 200f,
    ChromaKeySmoothing = 0.05f + (float)this.sliderChromaKeySmoothing.Value / 200f,
}

These effects are adjustable at runtime via UI sliders. Changing a slider triggers updateScene which rebuilds the layer list and calls mixingSource.Update(descriptor) without recreating sources.

Audio Mixing

Audio uses a single-source mode that passes through the microphone input:

descriptor.Processing.AudioProcessing.Mixing = new VAST.Image.Mixing.AudioMixing
{
    Type = VAST.Image.Mixing.AudioMixingType.Single,
    SourceIndex = this.audioCaptureSourceIndex,
};

Local Preview

this.previewSession = new VAST.Media.MediaSession();
this.createMixingSource();
this.previewSession.AddSource(this.mixingSource);
this.previewSession.Start();

The preview session uses MediaSession with the mixing source. The preview renderer can display either the direct camera feed (lower latency, no effects) or the mixed output (all layers and effects visible):

if (this.pickerRenderingSource.SelectedIndex == 0)
{
    // Direct camera preview
    this.activeVideoCaptureSource.Renderer = this.videoPreview.Renderer;
}
else
{
    // Mixed output preview
    this.mixingSource.Renderer = this.videoPreview.Renderer;
}

Streaming

this.streamingSession = new VAST.Media.MediaSession();
this.createMixingSource();
this.streamingSession.AddSource(this.mixingSource);

VAST.Media.IMediaSink sink = VAST.Media.SinkFactory.Create(tboxServerUri.Text);
sink.Uri = tboxServerUri.Text;
this.streamingSession.AddSink(sink);
this.streamingSession.Start();

The streaming session connects the same mixing source to a network sink. When streaming starts, the mixing source output switches from uncompressed to H.264/AAC encoded. When streaming stops while preview remains active, the output switches back to uncompressed to save encoding resources.

Dynamic Source Updates

Sources can be added or removed at runtime via checkbox toggles without stopping sessions. Each change triggers updateSources which rebuilds the complete source list and scene composition, then applies it via mixingSource.Update(descriptor).

Scene-only changes (brightness, contrast, chroma key sliders) trigger updateScene which rebuilds only the layer composition without recreating sources.

Resource Management

The mixing source uses reference counting to coordinate between preview and streaming sessions. Both sessions call AddSource(mixingSource), incrementing the reference count. When a session is disposed, the cleanup() method checks mixingSource.RefCount — resources are released only when the last session is stopped:

if (this.mixingSource.RefCount > 1)
{
    // one or more media sessions are still active
    return;
}

this.mixingSource.Release();

Send Log

The page includes a Send Log button that uploads the application log file to VASTreaming support for diagnostics:

await VAST.Common.License.SendLog("MAUI extended capture issue");

SendLog sends the current log file to the support server. A valid license key must be configured for this feature to work.

See Also