With China expected to officially take the supercomputer performance crown next month, I asked an expert about the state of supercomputing in the U.S. and whether China poses a long-term threat to the United States' current preeminence in supercomputing.
Nvidia announced yesterday that its chips are powering the "Tianhe-1A" Chinese supercomputer that achieved 2.507 petaflops, beating a U.S.-based system that is currently ranked No. 1 on the June Top500 list of the fastest supercomputers in the world. The Chinese system is a unique hybrid design that uses approximately 7,000 Nvidia graphics chips along with 14,000 Intel Xeon CPUs. The graphics chips are what give the system the extra oomph to catapult it into the top supercomputer spot.
I spoke with Jack Dongarra, university distinguished professor at University of Tennessee's Department of Electrical Engineering and Computer Science and part of a group from the University of Tennessee, Oak Ridge National Laboratories, and Georgia Tech that recently purchased a hybrid system. It is important to note that Oak Ridge houses the supercomputer, dubbed "Jaguar," cited above that is currently ranked No. 1 in the world based on the Top500 June list: it is not a hybrid system.
Q: Does Oak Ridge have anything analogous to the Chinese hybrid system Dongarra: Oak Ridge has a small version of a machine that is hybrid in nature. So, this is an acquisition that just took place...out of a grant from the National Science Foundation. It involved Oak Ridge National Labs, University of Tennessee, and Georgia Tech. But it's much, much smaller than the Chinese system. The machine is in place and testing is being carried out at Oak Ridge. A node has two Intel Westmere chips and three Nvidia Fermi boards. There are 120 nodes in the system.
What makes the Chinese supercomputer so fast Dongarra: The Chinese designed their own interconnect. It's not commodity. It's based on chips, based on a router, based on a switch that they produce.
Is that in essence the secret sauce Dongarra: It's similar to Cray. Cray's contribution, besides the integration and software, is the interconnect network. They have a very fast interconnect that makes that machine perform very well. Though [the Chinese] project is based on U.S. processors, it uses a Chinese interconnect. That's the interesting part. They've put something together that is roughly twice the bandwidth of an InfiniBand interconnect [which is used widely in the U.S.]
Will the Chinese system in fact take the No. 1 spot on the Top500 list in November Dongarra: Yes. I saw the machine. I saw the output. It's the real thing.
Why doesn't Oak Ridge do what the Chinese are doing Dongarra: Oak Ridge doesn't have the ability or technology to develop an interconnect or a router. We don't make computers. We buy computers and use them. It's not within our scope or mission to be in the computer design business.
What's your advice Dongarra: You have to remember that you have to not only invest in the hardware. It's like a race car. In order to run the race car, you need a driver. You need to effectively use the machine. And we need to invest in various levels within the supercomputer ecology. The ecology is made up of the hardware, the operating system, the compiler, the applications, the numerical libraries, and so on. And you have to maintain an investment across that whole software stack in order to effectively use the hardware. And that's an aspect that sometimes we forget about. It's underfunded. We fund the hardware but we don't fund the other components. The ecosystem tends to get out of balance because the hardware tends to run far ahead of what we can develop in terms of software. We have machines that have a tremendous level of parallelism. We currently have a very crude way of doing programming.
Who would do that Dongarra: The research is performed under the auspices of the Department of Energy, the National Science Foundation, and the Department of Defense.
Is this a red flag for the U.S. Dongarra: Yes, this is a wake-up call. We need to realize that other countries are capable of doing this. We're losing an advantage.
Comments