Intel Xeon Phi vs. NVIDIA Tesla GPU: Just the Facts

This version of the page http://www.nvidia.com/object/justthefacts.html (0.0.0.0) stored by archive.org.ua. It represents a snapshot of the page as of 2014-09-02. The original page over time could change.

Intel Xeon Phi vs. NVIDIA Tesla GPU: Just the Facts | NVIDIA

JUST THE FACTS

Just the Facts

Accelerated computing is revolutionizing high performance computing (HPC). It is now widely accepted that systems with accelerators deliver the highest performance and most energy efficient computing for HPC today. Accelerated systems are the new norm in large scale HPC and leadership supercomputing. It's not a question of "if" but "when and how?"

We'd like to share some facts about accelerated computing by removing the promises and hype. Specifically, notions that Intel's Xeon Phi accelerator can deliver acceptable application performance compared to a GPU by simply recompiling and running code natively on Xeon Phi, or that performance optimization is easier on a Xeon Phi than a GPU, are simply not based on fact.

FACT: A GPU is significantly faster than Intel's Xeon Phi on real HPC applications.

Speeding time to result for key science applications by 2x over Xeon Phi.

Although Xeon Phi can be optimized to outperform a CPU, GPU consistently outperforms Xeon Phi on a wide range of supercomputing applications. System and configuration details¹ (Data from August 2013)

The Department of Energy uses a collection of 'mini-applications' (Miniapps) to assess the performance of computing architectures on highly representative HPC workloads. Running the mini-applications shown in the above chart, GPUs deliver a speedup from approximately 2.5x-5x over CPUs. Although Intel's Xeon Phi can be optimized to outperform a CPU, GPU performance remains on average more than 2x faster than Xeon Phi.

Organization	Application	GPU Speed-up over Xeon Phi
Tokyo Institute of Technology	CFD Diffusion	2.6x
Xcelerit	Monte-Carlo LIBOR Swap Pricing	2.2x - 4x
Georgia Tech	Synthetic Aperture Radar	2.1x
CGGVeritas	Reverse Time Migration	2.0x
Paralution	BLAS & SpMV	2.0x
Univ. of Wisconsin-Madison	WRF (Weather Forecasting)	1.8x
University Erlangen-Nuremberg	Medical Imaging- 3D Image Reconstruction	7x

Independent results have shown GPU outperforms Xeon Phi by 2x or more. (Data from January 2014)

HPC is all about application performance. Today, more than 200 applications across a wide range of fields are GPU-accelerated.

Read Less

Read More

FACT: "Recompile & Run" on Xeon Phi actually slows down your application.

The notion that developers can simply "recompile and run" applications on Intel's Xeon Phi, without any change to their CPU code, is attractive but misleading. The resulting performance is usually much slower than CPU performance, literally the opposite of acceleration.

Simple recompile and run on Xeon Phi can work, but codes run much slower than on the CPU. System and configuration details² (Data from August 2013)

While a simple recompile to run natively on Xeon Phi may work on many codes, doing so decelerates the application performance compared to CPU – up to 4x slower on DOE mini-applications as shown above.

"Recompile and run" faces a host of technical challenges as described in the NVIDIA blog post "No Free Lunch for Intel MIC (or GPUs)", including Amdahl's Law for serial portions of the code. Because of the poor serial performance of the Xeon Phi cores (based on an old Pentium design) compared to the modern CPU cores, the serial portion of codes run natively on a Xeon Phi can run an order of magnitude slower.

In practice, a developer must work to get the code to recompile on Xeon Phi first, then apply effort to re-factor and optimize the code to increase performance – just to get to performance parity back to CPUs.

At the end of the day, it takes some effort to extract parallelism, whether you want to accelerate with Xeon Phi or GPU. At best, "recompile and run" is a mildly convenient first step for developers; at worst, an attractive claim destined to disappoint.

Read Less

Read More

FACT: Programming for a GPU and Xeon Phi require similar effort — but the
results are significantly better on a GPU.

Same optimization techniques. Same developer effort. 2x faster acceleration on GPU.

Method	GPU	Phi
Libraries	CUDA Libraries + others	Intel MKL + others
Directives	OpenACC	OpenMP + Phi Directives
Native Programming Models	CUDA	Vector Intrinsics

Developers use libraries, directives, or native programming models to program accelerators and optimize for performance. (Data from August 2013)

GPU and Intel's Xeon Phi may be different in some ways, but they are similar in that both are parallel processors. Developers need to put in similar effort and use similar optimization techniques to expose massive amounts of parallelism, whether on Xeon Phi or GPU.

As shown in the table above, a developer uses the same three methods to accelerate their code – libraries, directives, and native programming models like CUDA C for GPU or vector intrinsics on Xeon Phi.

And the programming efforts for Xeon Phi and GPU are more alike than most people realize.

Below, an N-body kernel code illustrates that comparable optimization techniques and effort are required to optimize for either accelerator. While the code changes are basically the same, performance on GPU significantly outpaces that of Xeon Phi. Download the optimization example.

A simple n-body code comparison shows similar optimization techniques must be used, but the GPU is significantly faster. System and configuration details³ (Data from August 2013)

Read more

Read Less

"You can port easily, but the things you do in CUDA to vectorize your code still have to be done for Phi."

Dr. Karl Schultz
Director of Scientific Applications at
Texas Advanced Computing Center (TACC)
Source: HPCWire, May 17, 2013

"Our GPU codes are quite similar to the Xeon Phi codes, except for replacing SIMD operations with SIMT operations."

Source: "Swendsen-Wang Multi-Cluster Algorithm for the 2D/3D Ising Model on Xeon Phi and GPU", Zuse Institute Berlin

“Results gathered on Intel’s Xeon Phi were surprisingly disappointing… It took quite some effort to create solutions with good performance due to vectorization tuning, despite that the Xeon Phi is said to be easily programmable.”

Source: "Accelerators for Technical Computing: Is It Worth the Pain? A TCO Perspective", Aachen University

"While getting a program running on Xeon Phi is easy, I found that it is easier with CUDA and NVIDIA GPUs to achieve high sustained performances for Lattice Boltzmann applications."

Dr. Sebastiano Fabio Schifano
Department of Mathematics and Informatics
University of Ferrara

Once you see the facts, a better understanding of accelerated computing emerges. Today, a GPU provides double the performance for essentially the same developer effort. GPUs are the logical choice for accelerating parallel code. In part, this could be why scientific researchers have published with GPUs more than 10:1 over Intel Xeon Phi this year.⁴ And why NVIDIA GPU is favored more than 20:1 over Xeon Phi in HPC systems today.⁵

Footnotes:
¹ Dual-socket Intel Xeon Phi E5-2667, 6 cores/socket @ 2.90 GHz with HT off, 64 GB RAM, RHEL 6.2, Tesla K20X, Intel Xeon Phi 5110P. Data is based on 2 CPU versus 2 CPU + GPU. MiniMD and NAMD were run in offload mode on Intel Xeon Phi. CloverLeaf ran in native mode on Intel Xeon Phi. For codes, go to MiniMD Version 1.2RC1, CloverLeaf, and NAMD. (Data from August 2013)
² MiniFE data is based on Dual-socket Intel Xeon Phi E5-2670, 8 cores/socket @ 2.60 GHz with HT off, 128 GB RAM, RHEL 6.2, Intel Xeon Phi 5110P. All other apps are based on Dual-socket Intel Xeon Phi E5-2667, 6 cores/socket @ 2.90 GHz with HT off, 64 GB RAM, RHEL 6.2, Xeon Phi 5110P. To see actual codes, go to MiniMD Version 1.2RC1, MiniFE, MiniGhost, GTC, and SNAP. (Data from August 2013)
³ Data is based on Dual-socket Intel Xeon Phi E5-2667, 6 cores/socket @ 2.90 GHz, 64 GB RAM, RHEL 6.2, Tesla K20X, Intel Xeon Phi 5110P. (Data from August 2013)
⁴ Source: Google scholar, all results since 2013. GPU search terms: CUDA GPU. Intel Xeon Phi search terms: "Xeon Phi".
⁵ Source: Intersect360 Research, HPC User Site Census, 2013.

GPU Computing Solutions
Overview
What is GPU Computing?
GPU Applications
Case Studies
Why Choose Tesla
Servers and Workstations
Where to Buy

Software and Hardware
Tesla Product Literature
Tesla Software Features
Software Development Tools
CUDA Training and Consulting
GPU Cloud Computing
OpenACC GPU Directives

News and Information
News and Articles
GPU Technology Conference On-Demand
Just The Facts
NVIDIA Research
Tesla Newsletter
Contact Us

Find Us Online

	NVIDIA Blog
	Facebook
	Twitter
	YouTube

JUST THE FACTS

Intel Xeon Phi: Just the Facts

Just the Facts