FREE MOBILE CLOUD
COMPUTING CONCEPTS - TRAINING_MODULES_WITH_TONS_OF_VIDEOS
gpu-cpu-core-die
cpu-gpu
From Clyde McCall with Texas IT Specialists, LLC
GPU computing or GPGPU is the use of a GPU (graphics processing
unit) to do general purpose scientific and engineering computing.
The model for GPU computing is to use a CPU
and GPU together in a heterogeneous co-processing computing model. The sequential part of the application runs on the CPU
and the computationally-intensive part is accelerated by the GPU. From the user’s perspective, the application just
runs faster because it is using the high-performance of the GPU to boost performance.
The GPU has evolved over the years to have teraflops of floating point performance. NVIDIA revolutionized the
GPGPU and accelerated computing world in 2006-2007 by introducing its new massively parallel architecture called “CUDA”.
The CUDA architecture consists of 100s of processor cores that operate together to crunch through the data set in the application.
The success of GPGPUs in the past few years has been the ease of programming of the associated CUDA parallel programming
model. In this programming model, the application developer modify their application to take the compute-intensive kernels
and map them to the GPU. The rest of the application remains on the CPU. Mapping a function to the GPU involves rewriting
the function to expose the parallelism in the function and adding “C” keywords to move data to and from the GPU.
The developer is tasked with launching 10s of 1000s of threads simultaneously. The GPU hardware manages the threads and does
thread scheduling.
The Tesla 20-series GPU is based on the “Fermi” architecture, which is the latest
CUDA architecture. Fermi is optimized for scientific applications with key features such as 500+ gigaflops of IEEE standard
double precision floating point hardware support, L1 and L2 caches, ECC memory error protection, local user-managed data caches
in the form of shared memory dispersed throughout the GPU, coalesced memory accesses and so on.
"GPUs
have evolved to the point where many real-world applications are easily implemented on them and run significantly faster than
on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with
multi-core CPUs."
Prof. Jack Dongarra Director of the Innovative Computing Laboratory The University of Tennessee
History of GPU Computing Graphics chips started as fixed function graphics
pipelines. Over the years, these graphics chips became increasingly programmable, which led NVIDIA to introduce the first
GPU or Graphics Processing Unit. In the 1999-2000 timeframe, computer scientists in particular, along with researchers in
fields such as medical imaging and electromagnetics started using GPUs for running general purpose computational applications.
They found the excellent floating point performance in GPUs led to a huge performance boost for a range of scientific applications.
This was the advent of the movement called GPGPU or General Purpose computing on GPUs.
The problem
was that GPGPU required using graphics programming languages like OpenGL and Cg to program the GPU. Developers had to make
their scientific applications look like graphics applications and map them into problems that drew triangles and polygons.
This limited the accessibility of tremendous performance of GPUs for science.
NVIDIA realized the potential to
bring this performance to the larger scientific community and decided to invest in modifying the GPU to make it fully programmable
for scientific applications and added support for high-level languages like C, C++, and Fortran. This led to the CUDA
architecture for the GPU.
CUDA Parallel Architecture and Programming Model The CUDA parallel hardware
architecture is accompanied by the CUDA parallel programming model that provides a set of abstractions that enable expressing
fine-grained and coarse-grain data and task parallelism. The programmer can choose to express the parallelism in high-level
languages such as C, C++, Fortran or driver APIs such as OpenCL™ and DirectX™-11 Compute.
NVIDIA today provides support for programming the GPU with C, C++, Fortran, OpenCL, and DirectCompute. A set of
software
development tools along
with libraries and middleware are available to developers as shown in the figure above and linked from here. GPU to be programmed using C with a minimal set of keywords or extensions.
Support for Fortran, OpenCL, et cetera will follow soon.
The CUDA parallel programming model guides programmers
to partition the problem into coarse sub-problems that can be solved independently in parallel. Fine grain parallelism in
the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel.
The CUDA GPU architecture and the corresponding
CUDA parallel computing model are now widely deployed with 1000s of applications and 1000s of published research papers. CUDA Zone lists many of these applications and papers....................
+
NVIDIA’s
Tesla GPU processor architecture has the capability to quickly execute computations that are important in engineering analysis
and simulation.
Graphics processing units (GPUs) have, for many years, powered the display of images and motion on computer displays.
GPUs are now powerful enough to do more than just move images across the screen. They are capable of performing high-end computations
that are the staple of many engineering activities.
Benchmarks that focus on floating point arithmetic, those most often used
in these engineering computations, show that GPUs can perform such computations much faster than the traditional central processing
units (CPUs) used in today’s workstations—sometimes as much as 20 times faster, depending on the computation.
But the performance
advantage in these benchmarks doesn’t automatically make it a slam dunk for running engineering applications. Comparing
CPUs with GPUs is like comparing apples with oranges.
GPU Challenges—and Rewards
The GPU remains a specialized processor, and its performance in graphics computation belies a host of difficulties to perform
true general-purpose computing. The processors themselves require recompiling any software; they have rudimentary programming
tools, as well as limits in programming languages and features.
These difficulties mean applications are limited to
those that commercial software vendors develop and make available to engineering customers, or in some cases, where source
code is owned by the engineering firm and ported to the GPU. Vendors have to perceive that a market for a GPU version of their
software exists, while engineering groups have to determine that it will pay for them to make the investment in hardware,
software and expertise..........................
That concept
is a long way from the industry standard Intel and AMD CPUs, which are used to power the majority of workstations (and even
high-end supercomputers). Changing that would be an expensive and time-consuming affair for software vendors.
Nevertheless, the
cost and performance of GPUs can make a difference in how design engineering is done. Imagine being able to run an analysis
on your design 20 times faster than you can today, for example.
Benchmarks Are Not Real Life But
it’s not a simple matter. First of all, “20 times faster” is highly problematic: Just because some computations
can be speeded up by that much doesn’t mean that the entire analysis would be. In fact, the overall analysis could even
be slower than using a CPU, if the CPU can compute other parts of the analysis faster.
Second, it would be a significant software
development effort to run even fairly common code on a GPU. Some types of code may require modification, while other types
may not be able to run on the GPU at all. Many engineering software vendors aren’t yet convinced that the effort can
pay for itself and make a profit.
So it turns out that you still need the traditional CPU after all. You need it because that is where the vast majority
of engineering and office software runs, where the primary software development skill set resides, and whose all-around performance
is at least good enough to remain in that role for the foreseeable future.
Intel hasn’t been sitting still as GPUs have increased
performance. Up until the beginning of this year, the company had been working on its own multi-core processor, codenamed
Larrabee. While it ultimately canceled the initial release of a Larrabee processor, the technology still exists, and will
likely find its way into either an Intel-designed GPU or a hybrid CPU.
Such technology may ultimately provide the best of both
worlds: compatible performance on most applications, and high performance on engineering computations.
A Future with
Both Processors To their credit, NVIDIA and AMD are expanding both the sophistication of their processors and
the software development tools for developing, porting, and debugging GPU code. NVIDIA has an intriguing software tool called
Nexus that should go a long way toward helping software developers to trace and debug application code from the CPU running
on Windows into the GPU, including parallel applications on the GPU, and back to the CPU. These enhancements mean it will
be easier to get existing software running on GPUs, although it will still require a software development effort.