gcc cortex-a9 double형 neon 연산 가속

프로그램 사용/gcc2023. 8. 8. 11:17

gcc cortex-a9 double형 neon 연산 가속

문득 cpu 사양 다시 볼까? 싶어서 보니

어? NEON이 아니라 NEON MPE?

NEON™ media-processing engine
Single and double precision Vector Floating Point Unit (VFPU)

[링크 : https://docs.xilinx.com/v/u/en-US/ds190-Zynq-7000-Overview]

그래서 cortex-A9 NEON MPE 명령을 뒤져보는데

VADD나 VSUB VMUL VDIV에 대해서 찾아보니 NEON으로는 float까지만 되도, double은 VFP를 통해서 가능할 것 같은데

D
Double precision floating-point values

F
Single precision floating-point values

H
Half precision floating-point values

I
Integer values

P
Polynomials with single-bit coefficients

X
Operation is independent of data representation.

Name Advanced SIMD VFP Description
VADD I, F F, D Add
VDIV - F, D Divide
VMUL I, F, P F, D Multiply
VSUB I, F F, D Subtract

[링크 : https://developer.arm.com/documentation/ddi0409/i/instruction-timing/cortex-a9-neon-mpe-instructions?lang=en]

타입을 바꾸어 봐도 안되서 골머리를 싸매다가(float는 된다매!!! double은 vfp로 된다매!!!)

main.c:187:2: missed: couldn't vectorize loop
main.c:177:6: missed: not vectorized: unsupported data-type double

main.c:187:2: missed: couldn't vectorize loop
main.c:177:6: missed: not vectorized: unsupported data-type float

금단의 플래그를 설정하니 잘 된다. -_-

main.c:194:2: optimized: loop vectorized using 16 byte vectors
main.c:188:2: optimized: loop vectorized using 16 byte vectors

IEEE를 무시하고 안전하지 않은 연산도 적용되고 하다보니 영 쓰기가 불안한데...

In addition GCC offers the -ffast-math flag which is a shortcut for several options, presenting the least conforming but fastest math mode. It enables -fno-trapping-math, -funsafe-math-optimizations, -ffinite-math-only, -fno-errno-math, -fno-signaling-nans, -fno-rounding-math, -fcx-limited-range and -fno-signed-zeros. Each of these flags violates IEEE in a different way. -ffast-math also may disable some features of the hardware IEEE implementation such as the support for denormals or flush-to-zero behavior. An example for such a case is x86_64 with it's use of SSE and SSE2 units for floating point math.

[링크 : https://gcc.gnu.org/wiki/FloatingPointMath]

아무튼 어제 어디서 보다 찾았던 associative 옵션을 못찾아서 헤매다가 다시 생각나서 보는데

associative하지 않다.. 이게 무슨 의미지?

Goldberg 논문에 나온 것 처럼 floating-point의 계산은 associative하지 않다.
그러므로 ffast-math 연산 방식에서는 실제 값에 오류를 포함할 수 밖에 없다.
이러한 점 때문에 ffast-math 방식은 IEEE에서 정의한 방식을 따르지 못한다.

위와 같은 특징 때문에, 정확한 값을 계산해야하는 것이라면 ffast-math를 사용하면 안된다.
하지만 대충 어림잡아서 맞는 값을 원하는 것이라면?

[링크 : https://www.cv-learn.com/20210107-gcc-ffast-math/]

float 형의 오차로 인해서 계산때 마다 동일 결과가 나오지 않는다는 의미군..

결합의((a × b) × c = a × (b × c)의 예에서처럼 계산식이 부분의 순서와 상관없이 동일한 결과가 나오는)

[링크 : https://en.dict.naver.com/#/entry/enko/43a6bbaaacf546199c5d4c57b6b88ebb]

그래서 한번 -ffast-math 대신 적용해보려는데 다른 상위 옵션에 의해서 무시 당했다고 나온다.

누가 상위 옵션이려나?

-o -W -Wall -fopt-info-vec -march=armv7-a -mfpu=neon -O3 -fassociative-math

cc1: warning: ‘-fassociative-math’ disabled; other options take precedence

-ffast-math 보단 순한 맛이긴 한데 적용이 안되면 의미 없지 머..

-fassociative-math
Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like (x + 2**52) - 2**52. May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. This option requires that both -fno-signed-zeros and -fno-trapping-math be in effect. Moreover, it doesn’t make much sense with -frounding-math. For Fortran the option is automatically enabled when both -fno-signed-zeros and -fno-trapping-math are in effect.

The default is -fno-associative-math.

[링크 : https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html]

저작자표시 (새창열림)

'프로그램 사용 > gcc' 카테고리의 다른 글

gcc tree vectorize (0)	2023.01.26
gcc fstack-protector-strong (0)	2022.12.06
gcc vectorization 실패 (0)	2022.06.02
gcc / 문자열 선언 (0)	2022.03.17
static link (0)	2022.02.07

Posted by 구차니

구차니의 잡동사니 모음

gcc cortex-a9 double형 neon 연산 가속

'프로그램 사용 > gcc' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

티스토리툴바