[問題] CUDA 執行結果問題
開發平台(Platform): (Ex: Win10, Linux, ...)
Linux GPGPU-sim
編譯器(Ex: GCC, clang, VC++...)+目標環境(跟開發平台不同的話需列出)
nvcc
問題(Question):
正在練習簡單的vectorAdd
原本是在main()中呼叫function來launch kernel 這樣沒問題
不過想試著把launch kernel放到main()中
但卻沒有出現如預期的結果
目前找不到是什麼問題
預期的正確結果(Expected Output):
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
錯誤結果(Wrong Output):
dataD = 1.000000
dataD = 1.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
程式碼(Code):(請善用置底文網頁, 記得排版)
__global__ void VectorAdd( float* arrayA, float* arrayB, float* output )
{
int idx = threadIdx.x;
output[idx] = arrayA[idx] + arrayB[idx] + 1;
}
void add_vector_gpu( float* a, float* b, float *c, int size );
int main( int argc, char** argv){
int data_size = 10;
float *dataA = new float[data_size],
*dataB = new float[data_size],
*dataC = new float[data_size],
*dataD = new float[data_size],
*dataE = new float[data_size];
for( int i = 0; i < data_size; ++ i )
{
dataA[i] = i;
dataB[i] = -1 * i;
}
add_vector_cpu( dataA, dataB, dataC, data_size );
float data_size2 = data_size * sizeof(float);
float *dev_A, *dev_B, *dev_C, *dev_D;
cudaMalloc( (void**)&dev_A, data_size2 );
cudaMalloc( (void**)&dev_B, data_size2 );
cudaMalloc( (void**)&dev_C, data_size2 );
cudaMalloc( (void**)&dev_D, data_size2 );
cudaMemcpy( dev_A, dataA, data_size, cudaMemcpyHostToDevice );
cudaMemcpy( dev_B, dataB, data_size, cudaMemcpyHostToDevice );
VectorAdd<<< 1, 10 >>>( dev_A, dev_B, dev_C );
cudaMemcpy( dataD, dev_C, data_size, cudaMemcpyDeviceToHost );
for( int i = 0; i < data_size; ++ i )
{
printf( "dataD = %f\n", dataD[i] );
}
}
補充說明(Supplement):
另外想請問 如果想在kernel中printf一些資料該怎麼做
有看到說要 #include "cuPrintf.cu"
才可以使用 cuPrintf ("Thread_number %d\n", threadIdx.x);
但還是沒有print 是不是方法用錯?
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.118.155.204
※ 文章網址: https://www.ptt.cc/bbs/C_and_CPP/M.1494208139.A.3C4.html
推
05/08 11:26, , 1F
05/08 11:26, 1F
→
05/08 11:27, , 2F
05/08 11:27, 2F
→
05/08 11:27, , 3F
05/08 11:27, 3F
→
05/08 11:28, , 4F
05/08 11:28, 4F
恩恩 原來是我眼拙把memcpy的大小設錯 謝謝前輩
不過請問您指的float是哪個部分? 這與size_t的差別是什麼?
※ 編輯: v00623 (140.118.155.204), 05/08/2017 12:44:03
推
05/08 14:00, , 5F
05/08 14:00, 5F
好的~ 謝謝
※ 編輯: v00623 (140.118.155.204), 05/08/2017 16:48:14