[問題] 如何讓計算速度發揮到極限
hi,hi 感謝各位之前熱情的回覆讓我學到用binary方式寫檔案
我才發現原來binary的方式這麼好用
小的我有另外的問題想請教大家
在做流體計算時繁複的計算我都是丟到GPU上計算
計算的效率幾乎都是2TLOPs以上,可以發揮到顯示卡計算能力極限的70%左右
(以GTX680為例2 FLOPS/Clock × 1006 MHz × 1536 = 3.090 TFLOPS)
但是同樣的計算放到CPU上都<10GFLOPs,幾乎不到CPU計算能力極限的10%
(以Ivy Bridge為例 8 FLOPS/Clock × 3.5GHz ×4 = 102.4 GFLOPS )
乍看之下GPU好像加速幾乎上百上千倍的計算速度但其實CPU根本沒發揮真本事
在GPU的情況編譯器會把乘加合併成一條指令去做才能做到2 FLOPS/Clock
那在CPU的部分要啟用SSE指令集或AVX指令集的話是要自己去編寫嗎?還是編譯器會做?
因為我在VC2010 中加入/arch:AVX之類的指令但速度並沒有增加
我這邊做的事情基本上就是迭代計算
vector<vector<double> > V, VNew ,rho;
void Jacobi()
{
#pragma omp parallel for
for (int i = 1; i <= L; i++)
#pragma omp parallel for
for (int j = 1; j <= L; j++)
VNew[i][j] = 0.25 * (V[i - 1][j] + V[i + 1][j] +
V[i][j - 1] + V[i][j + 1] +
h * h * rho[i][j]);
}
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 1.170.79.135
→
08/14 23:27, , 1F
08/14 23:27, 1F
→
08/14 23:29, , 2F
08/14 23:29, 2F
→
08/14 23:29, , 3F
08/14 23:29, 3F
推
08/14 23:30, , 4F
08/14 23:30, 4F
→
08/14 23:31, , 5F
08/14 23:31, 5F
→
08/14 23:34, , 6F
08/14 23:34, 6F
→
08/14 23:34, , 7F
08/14 23:34, 7F
→
08/14 23:35, , 8F
08/14 23:35, 8F
→
08/14 23:39, , 9F
08/14 23:39, 9F
推
08/14 23:39, , 10F
08/14 23:39, 10F
→
08/14 23:40, , 11F
08/14 23:40, 11F
→
08/14 23:40, , 12F
08/14 23:40, 12F
→
08/14 23:41, , 13F
08/14 23:41, 13F
推
08/14 23:43, , 14F
08/14 23:43, 14F
→
08/14 23:44, , 15F
08/14 23:44, 15F
→
08/14 23:52, , 16F
08/14 23:52, 16F
可以給點示範嗎?拆掉內層迴圈無所謂
※ 編輯: Lepton 來自: 1.170.79.135 (08/14 23:56)
→
08/14 23:57, , 17F
08/14 23:57, 17F
→
08/14 23:57, , 18F
08/14 23:57, 18F
→
08/15 00:00, , 19F
08/15 00:00, 19F
hi,我知道在GPU也有cache的問題但在CPU上要怎樣處理這個問題我不熟悉
請問可以給點參考資料嗎?我先google研究看看不懂再來提問好了
※ 編輯: Lepton 來自: 1.170.79.135 (08/15 00:02)
→
08/15 00:05, , 20F
08/15 00:05, 20F
→
08/15 00:05, , 21F
08/15 00:05, 21F
→
08/15 00:32, , 22F
08/15 00:32, 22F
→
08/15 00:33, , 23F
08/15 00:33, 23F
→
08/15 00:42, , 24F
08/15 00:42, 24F
→
08/15 00:43, , 25F
08/15 00:43, 25F
→
08/15 00:46, , 26F
08/15 00:46, 26F
→
08/15 00:47, , 27F
08/15 00:47, 27F
推
08/15 00:50, , 28F
08/15 00:50, 28F
→
08/15 00:54, , 29F
08/15 00:54, 29F
→
08/15 00:55, , 30F
08/15 00:55, 30F
QQ這邊大家討論的東西好深奧我都不懂!看來我C/C++沒學好
※ 編輯: Lepton 來自: 111.252.0.105 (08/15 01:00)
→
08/15 01:13, , 31F
08/15 01:13, 31F
→
08/15 01:14, , 32F
08/15 01:14, 32F
→
08/15 01:17, , 33F
08/15 01:17, 33F
→
08/15 01:20, , 34F
08/15 01:20, 34F
→
08/15 01:21, , 35F
08/15 01:21, 35F
→
08/15 01:47, , 36F
08/15 01:47, 36F
→
08/15 01:53, , 37F
08/15 01:53, 37F
→
08/15 01:53, , 38F
08/15 01:53, 38F
→
08/15 02:08, , 39F
08/15 02:08, 39F
推
08/15 06:08, , 40F
08/15 06:08, 40F
→
08/15 06:09, , 41F
08/15 06:09, 41F
→
08/15 08:43, , 42F
08/15 08:43, 42F
→
08/15 08:46, , 43F
08/15 08:46, 43F
→
08/15 08:46, , 44F
08/15 08:46, 44F
推
08/15 10:45, , 45F
08/15 10:45, 45F
→
08/15 10:46, , 46F
08/15 10:46, 46F
→
08/15 11:06, , 47F
08/15 11:06, 47F
→
08/15 11:07, , 48F
08/15 11:07, 48F
→
08/15 11:09, , 49F
08/15 11:09, 49F
推
08/15 13:09, , 50F
08/15 13:09, 50F
→
08/15 16:42, , 51F
08/15 16:42, 51F
推
08/15 23:07, , 52F
08/15 23:07, 52F
→
08/15 23:07, , 53F
08/15 23:07, 53F
討論串 (同標題文章)