2
The previous three posts describe the properties of the transport stream being interpreted. The first thing to note is that there is no problem with the MPEG transport stream coming from the capture device; if a player properly interprets the presentation times, the video will play correctly. However, very few tools I use seem to properly handle these timestamps.
To “fix” the video, I’ll need to alter it in such a way that the resulting stream will play correctly when every frame is played, and the frames are played at a fixed rate, rather than by timestamp. Remember, timestamps mean nothing to the software in question, so the only adjustments that matter are adding or dropping whole frames. Given that, let’s take a look at fixing the stream.
The First Problem - Start Time
To start in sync, the audio and video streams should start with frames from (close to) the same presentation time. The tool can either synthesize frames to extend the shorter stream, or drop frames to trim the longer stream. Deleting is always easier than creating, so I’ll go for that. An elementary (i.e. audio or video) packet filter is perfect for this.
using MattBlagden.Mpeg.ElementaryStream;
using MattBlagden.Mpeg.TransportStream;
namespace MattBlagden.FrameSniper
{
internal sealed class TimeLimiter : ElementaryPacketFilter
{
private Timestamp minimumTime;
public TimeLimiter(ITransportPacketReader transportPacketReader,
short transportStreamId, Timestamp minimumTime)
: base(transportPacketReader, transportStreamId)
{
this.minimumTime = minimumTime;
}
protected override bool FilterElementaryPacket(byte elementaryStreamId,
Timestamp presentationTime, byte[] content)
{
return presentationTime >= this.minimumTime;
}
}
}
Using this, any packets before a certain time will be dropped. So long as the specified time is at or after the beginning of both streams, both streams will begin with frames from (very close to) the same time.
The Second Problem - Frame Rate
In the video file from the previous post, the video source generated frames slightly too quickly. This resulted in a recording that contained slightly more frames than should be present for the frame rate. This problem can be fixed by dropping frames as well. This filter counts frames and calculates when they would be displayed if all frames are being displayed sequentially at a fixed rate. If the “sequential” display time is too much later than the correct display time, the frame is dropped.
The other consideration for video is which type of frame to drop. Not all frames are equal; way back in the second post of this series, I showed that some frames (B-frames and P-frames) depend on other frames (P-frames and I-frames). To prevent any video corruption, only frames with no dependents should be dropped. Videos have a fixed pattern of frame types (a Group of Pictures), so this filter also accepts the index of a position in the GoP that’s acceptable to drop.
using MattBlagden.Mpeg.ElementaryStream;
using MattBlagden.Mpeg.TransportStream;
using MattBlagden.Mpeg.Video;
namespace MattBlagden.FrameSniper
{
internal sealed class VideoFrameRateLimiter : ElementaryPacketFilter
{
private const int MillisecondsPerKilosecond = 1000000;
private readonly Timestamp startTime;
private readonly short sequenceNumberToDrop;
private readonly int maximumSkew;
private readonly int framesPerKilosecond;
private long frameCount = 0;
public VideoFrameRateLimiter(
ITransportPacketReader transportPacketReader,
short transportStreamId, Timestamp startTime,
short sequenceNumberToDrop, int framesPerKilosecond,
int maximumSkew)
: base(transportPacketReader, transportStreamId)
{
this.startTime = startTime;
this.sequenceNumberToDrop = sequenceNumberToDrop;
this.framesPerKilosecond = framesPerKilosecond;
this.maximumSkew = maximumSkew;
}
protected override bool FilterElementaryPacket(byte elementaryStreamId,
Timestamp presentationTime, byte[] content)
{
VideoFrameProperties frameProperties =
VideoFrame.GetFrameProperties(content);
long sequentialPresentationTime =
(frameCount * MillisecondsPerKilosecond) /
this.framesPerKilosecond;
long correctPresentationTime =
(long)(presentationTime - startTime).TotalMilliseconds;
long skew = sequentialPresentationTime - correctPresentationTime;
if (skew >= maximumSkew &&
frameProperties.SequenceNumber == this.sequenceNumberToDrop)
{
return false;
}
frameCount++;
return true;
}
}
}
Although it wasn’t a problem in the sample video, it’s possible that audio frames exhibit similar behavior and may need to be dropped to maintain synchronization. A similar filter can selectively drop audio frames. Audio frames are independent, so any one can be dropped when out of sync.
using MattBlagden.Mpeg.Audio;
using MattBlagden.Mpeg.ElementaryStream;
using MattBlagden.Mpeg.TransportStream;
namespace MattBlagden.FrameSniper
{
internal sealed class AudioFrameRateLimiter : ElementaryPacketFilter
{
private const int SamplesPerFrame = 1152;
private const int MillisecondsPerSecond = 1000;
private readonly Timestamp startTime;
private readonly int maximumSkew;
private int frameCount = 0;
public AudioFrameRateLimiter(
ITransportPacketReader transportPacketReader,
short transportStreamId, Timestamp startTime, int maximumSkew)
: base(transportPacketReader, transportStreamId)
{
this.startTime = startTime;
this.maximumSkew = maximumSkew;
}
protected override bool FilterElementaryPacket(byte elementaryStreamId,
Timestamp presentationTime, byte[] content)
{
AudioProperties audioProperties =
AudioFrame.GetAudioProperties(content);
long sequentialPresentationTime = (long)frameCount * SamplesPerFrame *
MillisecondsPerSecond / audioProperties.SamplesPerSecond;
long correctPresentationTime =
(long)(presentationTime - startTime).TotalMilliseconds;
long skew = sequentialPresentationTime - correctPresentationTime;
if (skew > maximumSkew)
{
return false;
}
frameCount++;
return true;
}
}
}
Putting It All Together
The tool has a section of argument parsing and validation, followed by some DelegateElementaryPacketFilters to gather audio/video properties, just like in the previous blog post. There is one new DelegateElementaryPacketFilter to gather the GoP pattern:
Dictionary<short, VideoFrameType> groupOfPicturesFrameTypes =
new Dictionary<short, VideoFrameType>();
transportPacketReader = new DelegateElementaryPacketFilter(
transportPacketReader, videoTransportStreamId,
(transportStreamId, presentationTime, content) =>
{
VideoFrameProperties frameProperties =
VideoFrame.GetFrameProperties(content);
short sequenceNumber = frameProperties.SequenceNumber;
if (groupOfPicturesFrameTypes.ContainsKey(sequenceNumber))
{
doneGroupOfPictures = true;
}
groupOfPicturesFrameTypes[sequenceNumber] = frameProperties.FrameType;
return true;
});
After this comes the core functionality of the tool: dropping packets.
videoStartTime.Value : audioStartTime.Value;
short sequenceNumberToDrop = groupOfPicturesFrameTypes
.Where(x => x.Value == VideoFrameType.BidirectionallyPredicted)
.Max(x => x.Key);
Console.WriteLine("Video:");
Console.WriteLine(" Dimensions: {0}x{1}",
videoProperties.Width, videoProperties.Height);
Console.WriteLine(" Aspect Ratio: {0}", videoProperties.AspectRatio);
Console.WriteLine(" Frame Rate: {0}.{1} frames/second",
videoProperties.FramesPerKilosecond / 1000,
videoProperties.FramesPerKilosecond % 1000);
Console.WriteLine();
Console.WriteLine("Audio:");
Console.WriteLine(" Channel Mode: {0}", audioProperties.ChannelMode);
Console.WriteLine(" Sample Rate: {0}.{1} KHz",
audioProperties.SamplesPerSecond / 1000,
audioProperties.SamplesPerSecond % 1000);
Console.WriteLine();
Console.WriteLine("Dropping:");
Console.WriteLine(" All frames before {0}",
startTime.ToTimeSpan().ToString(@"hh\:mm\:ss\.fff"));
Console.Write(" Frame #{0} of GOP (", sequenceNumberToDrop);
groupOfPicturesFrameTypes.OrderBy(frame => frame.Key).ToList()
.ForEach(frame => Console.Write(frame.Value.ToString().Substring(0, 1)));
Console.WriteLine(") when more than {0}ms out of sync", skewTolerance);
input.Position = 0;
transportPacketReader = new ContinuityCounterVerifier(transportStreamReader);
transportPacketReader = new TimeLimiter(transportPacketReader,
videoTransportStreamId, startTime);
transportPacketReader = new TimeLimiter(transportPacketReader,
audioTransportStreamId, startTime);
transportPacketReader = new VideoFrameRateLimiter(transportPacketReader,
videoTransportStreamId, startTime, sequenceNumberToDrop,
videoProperties.FramesPerKilosecond, skewTolerance);
transportPacketReader = new AudioFrameRateLimiter(transportPacketReader,
audioTransportStreamId, startTime, skewTolerance);
transportPacketReader = new ContinuityCounterAssigner(transportPacketReader);
TransportStreamWriter transportStreamWriter = new TransportStreamWriter(output);
while (transportPacketReader.TryReadPacket(out transportPacket))
{
transportStreamWriter.Write(transportPacket);
}
The code starts off by selecting a common start time, which is simply the later of the audio start time and video start time. Next, the group-of-pictures pattern is inspected to select the most appropriate part to drop. The selected frame is the last B-frame of the GoP. A B-frame is chosen as it is not a dependency for any other frames.
A chain of readers is then created to:
- Validate the integrity of the incoming stream before it’s edited (by checking that the continuity counters on each stream are, in fact, continuous)
- Drop any video frames before the common start time
- Drop any audio frames before the common start time
- Drop the last B-frame of a GoP if the video is out of sync
- Drop audio frames if the audio is out of sync
- Reassign continuity counter values, as they are now discontinuous
Any transport packets that successfully pass through the filters are written to the output file.
The only command-line configuration option (other than the file paths) is the loss-of-synchronization tolerance (or “skew tolerance”). At 29.97 frames per second, one frame lasts approximately 33 milliseconds. The tool should wait until the streams are at least 16 milliseconds out of sync before dropping a video frame, as anything less than that would only worsen the synchronization issues (e.g. if a frame was dropped when the video is just one millisecond behind the video, the video would then be 32 milliseconds ahead of the video... worse than if nothing was done at all). In fact, the stream may go slightly out of sync and regain sync within a few frames, so a bit more room than 16 milliseconds is best to ensure the tool only drops frames when the video is actually losing sync.
For 29.97 fps NTSC video I use a tolerance of 50 milliseconds. 50 milliseconds is small enough that it’s not a noticeable loss of synchronization, but large enough to be more than just slight wobble in the stream (it’s more than a whole video frame out of sync).
Dimensions: 720x480
Aspect Ratio: FourThree
Frame Rate: 29.970 frames/second
Audio:
Channel Mode: Stereo
Sample Rate: 48.0 KHz
Dropping:
All frames before 00:00:38.690
Frame #10 of GOP (BBIBBPBBPBBP) when more than 50ms out of sync
Finally, even in slow-mo, my audio and video are just right.














