CUDA notes



Error Handle

unified memory profiling failed


nvprof --unified-memory-profiling off ./add_cuda

Strange Output

Pay attention to the size of vectors when running cudaMemcpy() .

and don’t mess up the order of dimensions.

cudaMemcpy fails

check whether the order of destination and source variables.


think twice before adding the following code


free(): invalid next size (fast/normal)

Error [an illegal memory access was encountered]



double *pone;

and DO NOT

double one;
double *pone = &one;
double *pone = (double*)malloc(sizeof(double));
*pone = 1.0;


use gsl in GNU

How to use the GNU scientific library (gsl) in nvidia Nsight eclipse

Getting started with parallel MCMC

Multiple definitions Error

Some similar problems and explanations:

  1. multiple definition error c++
  2. Multple c++ files causes “multiple definition” error?
  3. getting “multiple definition” errors with simple device function in CUDA C
  4. CUDA multiple definition error during linking

First Try: separate definition and implementations

According to Separate Compilation and Linking of CUDA C++ Device Code, it seems that it is reasonable to separate the device code header file with implementation into pure header file and implementation parts.

But the template cannot be separated, refer to How to define a template class in a .h file and implement it in a .cpp file and Why can’t templates be within extern “C” blocks?

Second Try: add extern "C"

A reference about extern "C": C++项目中的extern “C” {}

There are several function names with different parameter list, it reports

more than one instance of overloaded function "gauss1_pdf" has "C" linkage

In one word, overloading is a C++ feature, refer to More than one instance overloaded function has C linkage.

Last Try: add inline


Refer to C/C++ “inline” keyword in CUDA device-side code

