컴파일러가 ATI 쪽만 인식하도록 되어있는지, nVidia의 GPU를 제대로 활용하지는 못한다.
아니면 예제 프로그램들이 openCL을 이용하기는 하지만, nVidia의 openCL과는 달라서 그럴려나?
Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.2 (302) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_khr_d3d10_sharing
Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 2393Mhz Address bits: 32 Max memory allocation: 536870912 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Profiling timer resolution: 427 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 00C3D40C Name: Intel(R) Core(TM) i5 CPU M 450 @ 2.40GHz Vendor: GenuineIntel Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.2 (302) Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf cl_khr_d3d10_sharing Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 128 Max work items[1]: 128 Max work items[2]: 128 Max work group size: 128 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 720Mhz Address bits: 32 Max memory allocation: 134217728 Image support: No Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 268435456 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 16384 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 00C3D40C Name: ATI RV710 Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.838 Profile: FULL_PROFILE Version: OpenCL 1.0 ATI-Stream-v2.2 (302) Extensions: cl_khr_icd cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing
Passed!
Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 CUDA 3.2.1 Platform Name: NVIDIA CUDA Platform Vendor: NVIDIA Corporation Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.2 (302) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: NVIDIA CUDA Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4318 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 512 Max work items[1]: 512 Max work items[2]: 64 Max work group size: 512 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 0 Max clock frequency: 1350Mhz Address bits: 5347096844566560 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 4096 Max image 2D height: 32768 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4352 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 268107776 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 16384 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 003E8750 Name: GeForce 8600 GT Vendor: NVIDIA Corporation Driver version: 260.99 Profile: FULL_PROFILE Version: OpenCL 1.0 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics Device Type: CL_DEVICE_TYPE_GPU Device ID: 4318 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 512 Max work items[1]: 512 Max work items[2]: 64 Max work group size: 512 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 0 Max clock frequency: 1188Mhz Address bits: 5347096844566560 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 4096 Max image 2D height: 32768 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4352 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 268107776 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 16384 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 003E8750 Name: GeForce 8600 GT Vendor: NVIDIA Corporation Driver version: 260.99 Profile: FULL_PROFILE Version: OpenCL 1.0 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
Error : Bytes mismatch! Error : glSharing mismatch! Error : images mismatch! Error : printf mismatch! Error : deviceAttributeQuery mismatch! Failed! Platform Name: ATI Stream Number of devices: 1 Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 2211Mhz Address bits: 32 Max memory allocation: 536870912 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 65536 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Profiling timer resolution: 279 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 01DFD40C Name: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Vendor: AuthenticAMD Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.2 (302) Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf