But how? Unless I’m misunderstanding how video encoding is done, you shouldn’t be able to reliably identify what’s an ad vs what’s actual video once it starts getting mixed together. The ad will be encoded differently for every video it’s inserted into.
I could be completely wrong about this, but the same ad clip’s data should end up looking completely different depending on any number of things.
Most encoding formats are deterministic, including the VP8/VP9 codec that Youtube uses. I imagine they could deliberately insert some manner of randomization in there if they really wanted to, and if they intend to carry through with this plan they may have to. But the same input with the same encoder (and settings) should produce the same output every time, at least if you begin counting from a keyframe.
Even if it can’t be identified on a binary level with clever tactics, which I think it will be unless they do some kind of picture-in-picture thing, it should be trivial with current hardware identify it even with a fairly crude optical recognition system and a database. I.e., sample N number of points on the output and gauge the average RGB data for each for a couple of frames, and if that matches our entry for the ad in our crowdsourced database, skip ahead X seconds based on the database. Even better if you did it on the keyframes.
Doing it based off of the audio of the ad should be even easier, since acoustic fingerprinting is a pretty cheap technology to implement these days.
The other question will be if Youtube is dumb enough to always insert the same type of ads in the same place in each video, which they may be at least to start with, so a very simple table of “skip X amount of time at Y timecode on Z video” would be feasible. Or even better, if they hard insert the ads into the video to save on processing time, such that they never change. Are they going to try to insert ads and encode video to serve to individual users in realtime? Doubt it. That’d be bonkers. Youtube already chews on uploaded videos for sometimes upwards of an hour before having them ready to serve… I don’t think they’re ready to commit to and pay for the compute power to try to pull a stunt like this in realtime.
All of this is going to require some manner of crowdsourcing, unless we get really good at using AI against them or something (which’d be immensely satisfying, come to think of it).
But how? Unless I’m misunderstanding how video encoding is done, you shouldn’t be able to reliably identify what’s an ad vs what’s actual video once it starts getting mixed together. The ad will be encoded differently for every video it’s inserted into.
I could be completely wrong about this, but the same ad clip’s data should end up looking completely different depending on any number of things.
Most encoding formats are deterministic, including the VP8/VP9 codec that Youtube uses. I imagine they could deliberately insert some manner of randomization in there if they really wanted to, and if they intend to carry through with this plan they may have to. But the same input with the same encoder (and settings) should produce the same output every time, at least if you begin counting from a keyframe.
Even if it can’t be identified on a binary level with clever tactics, which I think it will be unless they do some kind of picture-in-picture thing, it should be trivial with current hardware identify it even with a fairly crude optical recognition system and a database. I.e., sample N number of points on the output and gauge the average RGB data for each for a couple of frames, and if that matches our entry for the ad in our crowdsourced database, skip ahead X seconds based on the database. Even better if you did it on the keyframes.
Doing it based off of the audio of the ad should be even easier, since acoustic fingerprinting is a pretty cheap technology to implement these days.
The other question will be if Youtube is dumb enough to always insert the same type of ads in the same place in each video, which they may be at least to start with, so a very simple table of “skip X amount of time at Y timecode on Z video” would be feasible. Or even better, if they hard insert the ads into the video to save on processing time, such that they never change. Are they going to try to insert ads and encode video to serve to individual users in realtime? Doubt it. That’d be bonkers. Youtube already chews on uploaded videos for sometimes upwards of an hour before having them ready to serve… I don’t think they’re ready to commit to and pay for the compute power to try to pull a stunt like this in realtime.
All of this is going to require some manner of crowdsourcing, unless we get really good at using AI against them or something (which’d be immensely satisfying, come to think of it).
If a song can ne fingerprinted (e.g. Shazam), so can ads. Even when they’re part of a larger video.
Twitch does the same thing but you can still circumvent it. Worst case users may need a VPN to a country that doesn’t have many ads.