'Programming/openCL &amp; CUDA' 카테고리의 글 목록 (4 Page)

'Programming/openCL & CUDA'에 해당되는 글 80건

2012.06.02 cuda 5 preview
2012.05.22 nvidia ion cuda core와 h.264 library
2012.05.18 CUDA API 메모리 종류
2012.05.04 Interoperability (상호운용성)
2012.04.30 cuda 내장변수
2012.04.26 kernel block 과 thread
2012.04.23 cuda 4.2 devicequey
2012.04.22 cuda 4.2 released
2012.04.09 CUDA 장치별 cuda core 갯수
2012.03.12 AMD APP SDK 예제 컴파일

ubuntu 에서 vectorAdd 직접 컴파일 하기 (0)	2012.06.03
CUDA devicequery - ION 330 (0)	2012.06.02
nvidia ion cuda core와 h.264 library (0)	2012.05.22
CUDA API 메모리 종류 (0)	2012.05.18
Interoperability (상호운용성) (0)	2012.05.04

nvidia ion cuda core와 h.264 library

2개의 mp를 내장하고 있는데
음. H.264 인코딩/디코딩 라이브러리 제한이 몇개부터드라...

scalar processor는 cuda core인데 1개의 mp에 8개 있고, 4개의 mp가 32 scalar 프로세서이니
h.264 인코딩은 ion에서는 불가능 할 것으로 보인다

MPEG-2/VC-1 support

 Decode Acceleration for G8x, G9x (Requires Compute 1.1 or higher)

 Full Bitstream Decode for MCP79, MCP89, G98, GT2xx, GF1xx

 MPEG-2 CUDA accelerated decode with a GPUs with 8+ SMs (64 CUDA cores). (Windows)

 Supports HD (1080i/p) playback including Bluray content

 R185+ (Windows), R260+ (Linux)

H.264/AVCHD support

 Baseline, Main, and High Profile, up to Level 4.1

 Full Bitstream Decoding in hardware including HD (1080i/p) Bluray content

 Supports B-Frames, bitrates up to 45 mbps

 Available on NVIDIA GPUs: G8x, G9x, MCP79, MCP89, G98, GT2xx, GF1xx

 R185+ (Windows), R260+ (Linux)

[출처 : CUDA_VideoDecoder_Library.pdf]

 Supported on all CUDA-enabled GPUs with 32 scalar processor cores or more
[출처 : CUDA_VideoEncoder_Library.pdf]

[링크 : http://www.vpac.org/files/GPU-Slides/05.CudaOptimization.pdf ]

Device 0: "ION"

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 1

Total amount of global memory: 268435456 bytes

Number of multiprocessors: 2

Number of cores: 16

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.10 GHz

Concurrent copy and execution: No

Run time limit on kernels: No

Integrated: Yes

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this devi

[링크 : http://forums.nvidia.com/index.php?showtopic=100288 ]

[링크 : http://www.nvidia.com/object/picoatom_specifications.html ]
[링크 : http://en.wikipedia.org/wiki/Nvidia_Ion ]

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA devicequery - ION 330 (0)	2012.06.02
cuda 5 preview (0)	2012.06.02
CUDA API 메모리 종류 (0)	2012.05.18
Interoperability (상호운용성) (0)	2012.05.04
cuda 내장변수 (0)	2012.04.30

Posted by 구차니

CUDA API 메모리 종류

CUDA device에서 제공하는 메모리의 종류는 다음과 같다.

5.3.2 Device Memory Accesses .................................................................... 70

5.3.2.1 Global Memory ............................................................................ 70

5.3.2.2 Local Memory .............................................................................. 72

5.3.2.3 Shared Memory ........................................................................... 72

5.3.2.4 Constant Memory ........................................................................ 73

5.3.2.5 Texture and Surface Memory ........................................................ 73

[출처 : CUDA C Programming guide.pdf]

Local memory 와 Global memory는 그래픽 카드의 비디오 메모리(통상 512MB / 1기가 이런식으로 말하는)에 존재하고
Shared memory는 GPU 내의 Multi-Processor에 통합되어있다.

Devicequery를 비교하면서 보자면
8800GT 512MB 짜리에서
Global memory와 Local memory는 512MB 까지 가능하며
Shared memory는 블럭당 16KB 까지 가능하다.

Device 0: "GeForce 8800 GT"
CUDA Driver Version:                           3.20
CUDA Runtime Version:                          3.10
CUDA Capability Major revision number:         1
CUDA Capability Minor revision number:         1
Total amount of global memory:                 536543232 bytes
Number of multiprocessors:                     14
Number of cores:                               112

Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       16384 bytes
Total number of registers available per block: 8192
Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
Maximum sizes of each dimension of a grid:     65535 x 65535 x 1

2011/01/02 - [Programming/openCL / CUDA] - deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조

예제로 들어있는 행렬곱 예제에서
shared memory를 사용하고 사용하지 않는 차이점은 아래의 그림처럼
Global memory에 직접 한 바이트씩 읽어서 계산하는지

아니면 global memory의 블럭을
shared memory로 일정 영역만(블럭 사이즈 만큼) 복사해서 계산을 하는지의 차이점이 있다.

다른 책에 의하면 global memory는 700~900 cuda clock에 읽어오고
shared memory는 거의 1 cuda clock에 읽어 온다고 하니
되도록이면 shared memory에 복사해서 더욱 빠르게 연산하는게 유리하다고 한다.

// Matrices are stored in row-major order:

// M(row, col) = *(M.elements + row * M.width + col)

typedef struct {

int width;

int height;

float* elements;

} Matrix;

// Thread block size

#define BLOCK_SIZE 16

// Forward declaration of the matrix multiplication kernel

__global__ void MatMulKernel(const Matrix, const Matrix, Matrix);

// Matrix multiplication - Host code

// Matrix dimensions are assumed to be multiples of BLOCK_SIZE

void MatMul(const Matrix A, const Matrix B, Matrix C)

{

// Load A and B to device memory

Matrix d_A;

d_A.width = A.width; d_A.height = A.height;

size_t size = A.width * A.height * sizeof(float);

cudaMalloc(&d_A.elements, size);

cudaMemcpy(d_A.elements, A.elements, size,

cudaMemcpyHostToDevice);

Matrix d_B;

d_B.width = B.width; d_B.height = B.height;

size = B.width * B.height * sizeof(float);

cudaMalloc(&d_B.elements, size);

cudaMemcpy(d_B.elements, B.elements, size,

cudaMemcpyHostToDevice);

// Allocate C in device memory

Matrix d_C;

d_C.width = C.width; d_C.height = C.height;

size = C.width * C.height * sizeof(float);

cudaMalloc(&d_C.elements, size);

// Invoke kernel

dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE);

dim3 dimGrid(B.width / dimBlock.x, A.height / dimBlock.y);

MatMulKernel<<<dimGrid, dimBlock>>>(d_A, d_B, d_C);

// Read C from device memory

cudaMemcpy(C.elements, Cd.elements, size,

cudaMemcpyDeviceToHost);

// Free device memory

cudaFree(d_A.elements);

cudaFree(d_B.elements);

cudaFree(d_C.elements);

}

// Matrix multiplication kernel called by MatMul()

__global__ void MatMulKernel(Matrix A, Matrix B, Matrix C)

{

// Each thread computes one element of C

// by accumulating results into Cvalue

float Cvalue = 0;

int row = blockIdx.y * blockDim.y + threadIdx.y;

int col = blockIdx.x * blockDim.x + threadIdx.x;

for (int e = 0; e < A.width; ++e)

Cvalue += A.elements[row * A.width + e]

* B.elements[e * B.width + col];

C.elements[row * C.width + col] = Cvalue;

}

// Matrices are stored in row-major order:

// M(row, col) = *(M.elements + row * M.stride + col)

typedef struct {

int width;

int height;

int stride;

float* elements;

} Matrix;

// Get a matrix element

__device__ float GetElement(const Matrix A, int row, int col)

{

return A.elements[row * A.stride + col];

}

// Set a matrix element

__device__ void SetElement(Matrix A, int row, int col,

float value)

{

A.elements[row * A.stride + col] = value;

}

// Get the BLOCK_SIZExBLOCK_SIZE sub-matrix Asub of A that is

// located col sub-matrices to the right and row sub-matrices down

// from the upper-left corner of A

__device__ Matrix GetSubMatrix(Matrix A, int row, int col)

{

Matrix Asub;

Asub.width = BLOCK_SIZE;

Asub.height = BLOCK_SIZE;

Asub.stride = A.stride;

Asub.elements = &A.elements[A.stride * BLOCK_SIZE * row

+ BLOCK_SIZE * col];

return Asub;

}

// Thread block size

#define BLOCK_SIZE 16

// Forward declaration of the matrix multiplication kernel

__global__ void MatMulKernel(const Matrix, const Matrix, Matrix);

// Matrix multiplication - Host code

// Matrix dimensions are assumed to be multiples of BLOCK_SIZE

void MatMul(const Matrix A, const Matrix B, Matrix C)

{

// Load A and B to device memory

Matrix d_A;

d_A.width = d_A.stride = A.width; d_A.height = A.height;

size_t size = A.width * A.height * sizeof(float);

cudaMalloc(&d_A.elements, size);

cudaMemcpy(d_A.elements, A.elements, size,

cudaMemcpyHostToDevice);

Matrix d_B;

d_B.width = d_B.stride = B.width; d_B.height = B.height;

size = B.width * B.height * sizeof(float);

cudaMalloc(&d_B.elements, size);

cudaMemcpy(d_B.elements, B.elements, size,

cudaMemcpyHostToDevice);

// Allocate C in device memory

Matrix d_C;

d_C.width = d_C.stride = C.width; d_C.height = C.height;

size = C.width * C.height * sizeof(float);

cudaMalloc(&d_C.elements, size);

// Invoke kernel

dim3 dimBlock(BLOCK_SIZE, BLOCK_SIZE);

dim3 dimGrid(B.width / dimBlock.x, A.height / dimBlock.y);

MatMulKernel<<<dimGrid, dimBlock>>>(d_A, d_B, d_C);

// Read C from device memory

cudaMemcpy(C.elements, d_C.elements, size,

cudaMemcpyDeviceToHost);

// Free device memory

cudaFree(d_A.elements);

cudaFree(d_B.elements);

cudaFree(d_C.elements);

}

// Matrix multiplication kernel called by MatMul()

__global__ void MatMulKernel(Matrix A, Matrix B, Matrix C)

{

// Block row and column

int blockRow = blockIdx.y;

int blockCol = blockIdx.x;

// Each thread block computes one sub-matrix Csub of C

Matrix Csub = GetSubMatrix(C, blockRow, blockCol);

// Each thread computes one element of Csub

// by accumulating results into Cvalue

float Cvalue = 0;

// Thread row and column within Csub

int row = threadIdx.y;

int col = threadIdx.x;

// Loop over all the sub-matrices of A and B that are

// required to compute Csub

// Multiply each pair of sub-matrices together

// and accumulate the results

for (int m = 0; m < (A.width / BLOCK_SIZE); ++m) {

// Get sub-matrix Asub of A

Matrix Asub = GetSubMatrix(A, blockRow, m);

// Get sub-matrix Bsub of B

Matrix Bsub = GetSubMatrix(B, m, blockCol);

// Shared memory used to store Asub and Bsub respectively

__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];

__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];

// Load Asub and Bsub from device memory to shared memory

// Each thread loads one element of each sub-matrix

As[row][col] = GetElement(Asub, row, col);

Bs[row][col] = GetElement(Bsub, row, col);

// Synchronize to make sure the sub-matrices are loaded

// before starting the computation

__syncthreads();

// Multiply Asub and Bsub together

for (int e = 0; e < BLOCK_SIZE; ++e)

Cvalue += As[row][e] * Bs[e][col];

// Synchronize to make sure that the preceding

// computation is done before loading two new

// sub-matrices of A and B in the next iteration

__syncthreads();

}

// Write Csub to device memory

// Each thread writes one element

SetElement(Csub, row, col, Cvalue);

}

'Programming > openCL & CUDA' 카테고리의 다른 글

cuda 5 preview (0)	2012.06.02
nvidia ion cuda core와 h.264 library (0)	2012.05.22
Interoperability (상호운용성) (0)	2012.05.04
cuda 내장변수 (0)	2012.04.30
kernel block 과 thread (0)	2012.04.26

Posted by 구차니

Interoperability (상호운용성)

CUDA를 보다보면
openGL / DirectX와의 interoperability 라는 용어가 나온다.
딱히 번역하기에는 애매해서 단어를 찾아보니 상호운용(interoperate)라는 말이 나온다.
굳이 붙여도 문제는 없어 보이지만..
[링크 : http://endic.naver.com/enkrEntry.nhn?entryId=62756ad640c44b41919ec9e5b504d898]

3.2.11 Graphics Interoperability

Some resources from OpenGL and Direct3D may be mapped into the address space of CUDA, either to enable CUDA to read data written by OpenGL or Direct3D, or to enable CUDA to write data for consumption by OpenGL or Direct3D.

3.2.11 그래픽 상호운용성
OpenGL이나 Direct3D에 의해 쓰여진 데이터를 CUDA에서 읽도록 허용 하거나
OpenGL 이나 Direct3D에 의해서 사용(소비)될 데이터를 CUDA가 쓸 수 있도록
OpenGL과 Direct3D로 부터의 몇몇 자원들은 CUDA의 주소 공간에 할당(연관/매핑)되어질수 있다.

의역하자면, "CUDA와 OpenGL / Direct3D와의 상호연계를 위해 메모리 통로를 양쪽으로 연결해 준다." 정도이려나?

'Programming > openCL & CUDA' 카테고리의 다른 글

nvidia ion cuda core와 h.264 library (0)	2012.05.22
CUDA API 메모리 종류 (0)	2012.05.18
cuda 내장변수 (0)	2012.04.30
kernel block 과 thread (0)	2012.04.26
cuda 4.2 devicequey (0)	2012.04.23

Posted by 구차니

cuda 내장변수

일단 grid는 배재하고 생각을 하자면.
블럭과 쓰레드는 3차원으로 구성이 가능하다.
즉, 차원의 인덱스와 카운트 값이 존재해야 하는데
#define으로 하면 좋겠지만 cuda 는 그래픽 카드에서 도는 넘이다 보니
이러한 문자상수나 선언문을 넘겨줄수가 없는 구조이다.

그래서 쓰레드에게 전체 배열의 크기를 알려주기 위한 방법으로
blockDim / blockIdx / threadIdx 변수가 존재한다.

간단하게 2차원(20x30))으로 구성된 블럭이 존재한다면
blockDim.x는 블럭의 가로 차수인 20을 모든 쓰레드에서 읽어오고
blockDim.y는 블럭의 세로 차수인 30을 모든 쓰레드에서 읽어오고

블럭별 쓰레드에서는
blockIdx.x가 0에서 19까지
blockIdx.y가 0에서 29까지 존재하게 된다.

2011/01/16 - [Programming/openCL / CUDA] - CUDA 내장변수 - built in variable

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA API 메모리 종류 (0)	2012.05.18
Interoperability (상호운용성) (0)	2012.05.04
kernel block 과 thread (0)	2012.04.26
cuda 4.2 devicequey (0)	2012.04.23
cuda 4.2 released (0)	2012.04.22

Posted by 구차니

kernel block 과 thread

vectorAdd.cu 파일을 보다보니 느낌표가 똭!

// Invoke kernel

int N = 50000;

int threadsPerBlock = 256;

int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;

VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);

kernel 에서
앞은 Grid당 Block의 갯수
뒤는 Block당 thread의 갯수를 나타낸다.

솔찍히 단일 GPU를 쓴다면 grid는 저~~~언혀 고려하지 않아도 되는데
괜히 머리 아프게 grid랑 이상한 개념들을 다 이야기 하는 바람에 이해만 어려웠던 듯 하다.

아무튼, Devicequery를 다시 보면
블럭당 쓰레드의 최대 갯수는 512 이고
그리드당 블럭의 최대 갯수는 3차원 배열로 512x512x64가 한계이다.

Device 0: "GeForce 8800 GT"
CUDA Driver Version:                           3.20
CUDA Runtime Version:                          3.10
CUDA Capability Major revision number:         1
CUDA Capability Minor revision number:         1
Total amount of global memory:                 536543232 bytes
Number of multiprocessors:                     14
Number of cores:                               112

Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       16384 bytes
Total number of registers available per block: 8192
Warp size:                                     32
Maximum number of threads per block:           512
Maximum sizes of each dimension of a block:    512 x 512 x 64
Maximum sizes of each dimension of a grid:     65535 x 65535 x 1

2011/01/02 - [Programming/openCL / CUDA] - deviceQuery on 8600GT 512MB + CUDA 하드웨어 구조

그런 이유로, 아래의 예제에서는 3차원 배열로 쓰레드를 구성한 총갯수가 512를 넘지 않으면 작동을 했던 것이다.

dim3 blocksPerGrid(1,1);

dim3 threadsPerBlock(8,8,8);

이 코드는 8*8*8 = 512로 쓰레드의 최대 갯수를 넘지 않아 실행이 되지만

dim3 blocksPerGrid(1,1);

dim3 threadsPerBlock(9,9,9);

이 코드는 9*9*9 = 729로 쓰레드의 최대 갯수를 넘어 실행이 되지 않고 오류가 발생한다.

2011/01/22 - [Programming/openCL / CUDA] - CUDA 관련 해외글

한줄요약 : 단일 그래픽 카드로 CUDA를 하면 grid는 잊자! 좀 꺼져줘!!!!

'Programming > openCL & CUDA' 카테고리의 다른 글

Interoperability (상호운용성) (0)	2012.05.04
cuda 내장변수 (0)	2012.04.30
cuda 4.2 devicequey (0)	2012.04.23
cuda 4.2 released (0)	2012.04.22
CUDA 장치별 cuda core 갯수 (0)	2012.04.09

Posted by 구차니

cuda 4.2 devicequey

C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\C\bin\win32\Release>deviceQuery.exe

[deviceQuery.exe] starting...

deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: "GeForce 8800 GT"

CUDA Driver Version / Runtime Version 4.2 / 4.2

CUDA Capability Major/Minor version number: 1.1

Total amount of global memory: 512 MBytes (536870912 bytes)

(14) Multiprocessors x ( 8) CUDA Cores/MP: 112 CUDA Cores

GPU Clock rate: 1500 MHz (1.50 GHz)

Memory Clock rate: 900 Mhz

Memory Bus Width: 256-bit

Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)

Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per multiprocessor: 768

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Concurrent copy and execution: Yes with 1 copy engine(s)

Run time limit on kernels: Yes

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: No

Alignment requirement for Surfaces: Yes

Device has ECC support enabled: No

Device is using TCC driver mode: No

Device supports Unified Addressing (UVA): No

Device PCI Bus ID / PCI location ID: 2 / 0

Compute Mode:

< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime Version = 4.2, NumDevs = 1, Device = GeForce 8800 GT

[deviceQuery.exe] test results...

PASSED

> exiting in 3 seconds: 3...2...1...done!

MP당 쓰레드가 768 이라면
1개의 MP에는 8개의 cuda core가 있고
1개의 cuda core에는 그럼 96개의 쓰레드가 존재하는건가?
32 Warp x 3 인것 같기도 하고 그러면 1개의 core에서는 3개의 warp 가능?

아.. 모르겠다 ㅠ.ㅠ

'Programming > openCL & CUDA' 카테고리의 다른 글

cuda 내장변수 (0)	2012.04.30
kernel block 과 thread (0)	2012.04.26
cuda 4.2 released (0)	2012.04.22
CUDA 장치별 cuda core 갯수 (0)	2012.04.09
AMD APP SDK 예제 컴파일 (0)	2012.03.12

Posted by 구차니

cuda 4.2 released

방심한 사이에 또 4.2까지 릴리즈 -_-
8800GT 떼와서 마비노기나 하고 있고.. OTL
에효 언넝 다시 하자 ㅠ.ㅠ

[링크 : http://developer.nvidia.com/cuda-downloads]

'Programming > openCL & CUDA' 카테고리의 다른 글

kernel block 과 thread (0)	2012.04.26
cuda 4.2 devicequey (0)	2012.04.23
CUDA 장치별 cuda core 갯수 (0)	2012.04.09
AMD APP SDK 예제 컴파일 (0)	2012.03.12
openCL - ATI APP SDK (0)	2012.03.11

Posted by 구차니

CUDA 장치별 cuda core 갯수

아.. 머가 먼소리인지 모르겠어서 일단 정리중..
결론 : 비싸고 새로 나온게 좋은거다~

	Multi Processor	Cores	Total
8800 GT	14	8	112
8800 GTX	16	8	128
GTX 480	15	32	480

[링크 : http://pastebin.com/KMUXqmTY] GTX 480
[링크 : http://gpucoder.livejournal.com/990.html] 8800 GTX
2011/01/18 - [Programming/openCL / CUDA] - CUDA 3.1과 3.2의 devicequery 결과 차이점 8800 GT

'Programming > openCL & CUDA' 카테고리의 다른 글

cuda 4.2 devicequey (0)	2012.04.23
cuda 4.2 released (0)	2012.04.22
AMD APP SDK 예제 컴파일 (0)	2012.03.12
openCL - ATI APP SDK (0)	2012.03.11
ATI Stream 하드웨어 요구사항 (0)	2011.10.07

Posted by 구차니

AMD APP SDK 예제 컴파일

Visual Studio 2008 Express 버전으로 컴파일이 가능한데 몇개는 안된다.
드라이버를 신버전으로 안깔아서인지 아니면 그래픽 카드가 지원을 안하는건지(740g / Radeon 2100) 모르겠다 ㅠ.ㅠ

C:\Users\minimonk\Documents\AMD APP\samples\opencl\bin\x86>NBody.exe

Platform 0 : Advanced Micro Devices, Inc.

GPU not found. Falling back to CPU device

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ Device ID is 014A14F8

특이한점은 openCL은 GPU 탐지에 실패하면 그냥 CPU로 돌린다는 점.
cuda는 예전에 할때 배짼거 같은데 장점이라면 장점이라고 해야하려나?

---
openCL 1.0은 Radeon HD 4300 이상부터
openCL 1.1은 Radeon HD 5400 이상부터 지원한다
내장형 그래픽 중에는 APU E/C 시리즈만 지원한다.(openCL 1.1)
[링크 : http://developer.amd.com/sdks/AMDAPPSDK/pages/DriverCompatibility.aspx ]

결론 : 740g에 내장된 Radeon 2100으로는 택~도 없음 -_-

'Programming > openCL & CUDA' 카테고리의 다른 글

cuda 4.2 released (0)	2012.04.22
CUDA 장치별 cuda core 갯수 (0)	2012.04.09
openCL - ATI APP SDK (0)	2012.03.11
ATI Stream 하드웨어 요구사항 (0)	2011.10.07
CUDA processor roadmap / CUDA SDK 4.0 (1)	2011.07.03

Posted by 구차니

구차니의 잡동사니 모음

'Programming/openCL & CUDA'에 해당되는 글 80건

cuda 5 preview

'Programming > openCL & CUDA' 카테고리의 다른 글

nvidia ion cuda core와 h.264 library

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA API 메모리 종류

'Programming > openCL & CUDA' 카테고리의 다른 글

Interoperability (상호운용성)

'Programming > openCL & CUDA' 카테고리의 다른 글

cuda 내장변수

'Programming > openCL & CUDA' 카테고리의 다른 글

kernel block 과 thread

'Programming > openCL & CUDA' 카테고리의 다른 글

cuda 4.2 devicequey

'Programming > openCL & CUDA' 카테고리의 다른 글

cuda 4.2 released

'Programming > openCL & CUDA' 카테고리의 다른 글

CUDA 장치별 cuda core 갯수

'Programming > openCL & CUDA' 카테고리의 다른 글

AMD APP SDK 예제 컴파일

'Programming > openCL & CUDA' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

티스토리툴바