utahsetr.blogg.se - I7 6700k gflops fp64

I7 6700K GFLOPS FP64 CODE

I7 6700K GFLOPS FP64 CODE

You will generally be better off first optimizing your code for AVX and FMA instructions and running on Haswell's CPU cores. Kaveri's fp64 peak including both the CPU and GPU is only about 110 gflops.

In applications depending upon fp64 performance, conditions are not generally favorable to Kaveri. However, Kaveri and HSA will enable many more applications to be GPU accelerated. Compared to discrete GPUs, applications that are already ported and work well on discrete GPUs will continue to be best run on discrete GPUs. Some of you might be wondering whether Kaveri is good for HPC applications. From an API standapoint, Kaveri's GCN GPUs should work fine on for fp64 under all APIs.

Trinity/Richland do not appear to support fp64 under DirectCompute (and MS C++ AMP implementation) from what I can tell. fp64 support under OpenCL is not standards-compliant and depends upon using a proprietary extension (cl_amd_fp64). Situation on AMD's Trinity/Richland is even more complicated. However Intel only enables fp64 under DirectCompute but does not enable fp64 under OpenCL for any of its GPUs. The fp64 rate of Intel's GPUs does not appear to be published but David Kanter provides an estimate of 1/4 speed compared to fp32. The fp64 support situation is a bit of a mess because some GPUs only support fp64 under some APIs. Here, for Haswell, we chose to include both GT2 and GT3e variants. As a comparison point, one core in Haswell has the same floating point performance per cycle as two modules (or four cores) in Steamroller. It is no secret that AMD's Bulldozer family cores (Steamroller in Kaveri and Piledriver in Trinity) are no match for recent Intel cores in FP performance due to the shared FP unit in each module. We consider three cases: SSE, AVX (without FMA) and AVX with FMA (either FMA3 or FMA4). The peak CPU performance will depend on the SIMD ISA that your code was written and compiled for.

In any case, we believe our readers are smart enough to calculate peaks at any frequency they want, given that we already supply per-cycle peaks :) For CPUs, we are using the base frequency and for GPUs we are using the boost frequency because in multithreaded and/or heterogeneous scenarios the CPU is less likely to turbo. Due to turbo boost, it was difficult to decide what frequency to use for peak calculations. already take into account the number of cores or modules. We provide a per-cycle estimate for the chips as well as peak calculated in gflops. I am taking this opportunity to summarize the info about Kaveri, Trinity, Llano and Intel's competing platforms Haswell and Ivy Bridge on both the CPU and GPU side. At launch time, we were not clear on the fp64 performance of Kaveri's GPU but now we have official confirmation from AMD that it is 1/16th the rate of fp32 (similar to most GCN based GPUs except the flagships) and we have verified this on our 7850K by running FlopsCL. Floating point peak performance of the CPU and GPU on both fp32 and fp64 datatypes is one of the considerations. With the launch of Kaveri, some people have been wondering if the platform is suitable for HPC applications.