Peanut Butter and Chocolate
Everyone loves the clip on this page: how it works. It says it all. GPU-based video encoding is fast because all of the macroblocks can be processed at once, whereas a CPU-based solution must iterate through each macroblock one-by-one. However, critics out there are quick to point out that you can't just do that! Video encoders, they claim, need to operate on one macroblock at a time to perform tasks such as entropy-coding or to calculate motion vector deltas between macroblocks. The critics are right, partially. There are indeed dependencies between the macroblocks, as implied by the various video standards, such as H.264 and MPEG-2. How is it possible, then, to process a frame's worth of macroblocks in parallel, if there are dependencies? Well, that's the magic of the RapiHD technology. With some very clever parallel algorithms, it is possible to come up with global solutions for a frame that are as good as (if not better than) an approach that computes one macroblock at a time. The idea is to maximize the parallel processing, and minimize the need for serial operations. Elemental Technologies prides itself on being a forerunner in the area of parallel techniques for video encoding. Parallel programming on a GPU is an art, related to but different from the art of CPU-based programming. Algorithms that run fast on a CPU are not the same as those that run on a GPU, and vice-versa. Like peanut butter and chocolate, NVIDIA claims that the GPU and CPU are better together than separate. Elemental has found that this is true. Some jobs are best done on a CPU, and some are best on a GPU. Together, powered by the right mix of GPU and CPU software, your whole computer can be a video-encoding workhorse. For the skeptics, an interesting fact is that the GPU is the computing bottleneck in the RapiHD flow. Because GPUs have historically increased in computing power faster than CPUs, the RapiHD technology will continue to track with that faster growth curve, as opposed to pure software encoders, which will also benefit over time, but only by tracking CPU computing growth curves. Thus, RapiHD is poised to not only blaze ahead in the short term, but continue to do so well into the future. Editor's note: Carter is one of the key architects behind the RapiHD Video Platform, and an avowed fan of peanut butter and chocolate -- but only in combination.
- ‹ previous
- 104 of 112
- next ›
11
About
Elemental Technologies is the leading provider of video processing solutions that enable multi-screen content delivery. Founded in 2006, Elemental is headquartered in Portland, Oregon.
Elemental Technologies is the leading provider of video processing solutions that enable multi-screen content delivery. Founded in 2006, Elemental is headquartered in Portland, Oregon.
Follow us @elementaltech
- Great interview with @samsblackman via @nwinnovation http://t.co/ZtE4pIPQ — 8 hours 56 min ago
- Are you attending the @CableShow next week? Stop by booth 2253 to see @elementaltech with @Adobe @Akamai @SRSLabs & @Verimatrixinc #cable12 — 9 hours 42 min ago
- @nvidia Emerging Companies Summit underway. See @samsblackman and other industry luminaries present on stage! http://t.co/yHH2wXxk — 10 hours 33 min ago
- @nvidia Emerging Companies Summit tomorrow, see @samsblackman and other industry luminaries present on stage! http://t.co/yHH2wXxk — 10 hours 45 min ago
- Elemental at #SMEast! http://t.co/A3uwYMp7 — 13 hours 23 min ago
- Columbus Communications Selects Elemental to offer live TV and on-demand video for IP-connected devices: http://t.co/PNvJx9gh — 1 day 13 hours ago
Browse Blog Archives
- May 2012
(1)
- April 2012
(2)
- March 2012
(2)
- February 2012
(1)
- May 2012 (1)
- April 2012 (2)
- March 2012 (2)
- February 2012 (1)

Comments
I must say that your
I must say that your explanations are rather easy to understand. Here's hoping that it continues into the product. ie. good tooltip explanations.
Thanks for the explanation
Thanks for the explanation
Hi Goose, Thanks for the
Hi Goose,
Thanks for the comment. In a nutshell, I think you are asking this:
- Why use macroblock-level parallelism when you could simply use GOP-level parallelism?
(By the way, GOP means group-of-pictures, and refers to a keyframe and the frames after it that depend on it and one another).
To understand the reason for macroblock-level parallelism, we need to take a brief look at the GPU. In terms of raw FLOPS, the GPU is much more powerful per gate, dollar, and watt than a corresponding CPU. However, it gets this efficiency by being a rather fine-grained parallel computer that operates on several thousands of threads at once, rather than just a few. In addition, the GPU is a SIMD machine. This means that it likes to run the exact same program through each of its threads, except that each one gets its own set of registers and other data to chew on.
In your comment, you mentioned 10 shaders. That is simply not enough parallelism to really let the GPU show its true colors. Although, it would be possible to let 10 identical threads run on 10 different GOPs, we would not see the kind of speedup that we are seeing is possible.
Also, you mentioned that an H.264 file would be threaded if done on a CPU. RapiHD produces ordinary, single-slice per frame H.264 streams, which can be decoded by any compliant decoder.
So why not simply break up the video into 1000 GOPs? At this point, we run into some other issues. One, there are technical bottenecks, like memory bandwidth and instruction fetch issues. We have looked at these as options. Second, and probably more importantly, in order to perform rate-control, some sort of algorithm needs to run between every frame, across the GOP boundaries, and control how much compression is applied to each frame. If too much is done in parallel, there is no point at which rate control can be applied.
RapiHD will in fact use GOP-level parallelism across GPUs, when that becomes supported later on this year. However, this is a win because having the additional GPUs means increased memory bandwidth.
So, in all, RapiHD has been designed and tuned to fit the GPU, the CPU, and the bus between them. It is a whole-system approach to video encoding.
Peanut butter + Chocolate
Peanut butter + Chocolate together is the best combination.
Looking forward for rapiHD for premiere to be available.
Forgive me if this is total
Forgive me if this is total nonsense, as I am not a video encoding/transcoding god. However I fail to see the point of your "very parallel algorithms" when you could instead treat it sequentially as it has always been done. Let us envision the video split into 10 parts and a card with 10 shaders with 10 frames in each part.
eg. shader 1 starts work on it's 10 frames and chugs through them sequentially. shader 2 does the same. At the end shader 1 handles the overlap between them if you feel that it is necesary for the slight quality improvement.
The benefit of doing it this way is that the h.264 output file will be threaded and will be able to take advantage of multicore cpus when decoded. I am not sure that your solution would offer this.
Some comment on this would be appreciated.