ox-arrays/src/Data/Array/Mixed/Internal/Arith, branch repro-9.14-branch

ox-arrays/src/Data/Array/Mixed/Internal/Arith, branch repro-9.14-branch Nested, compositional struct-of-arrays orthotope arrays https://git.tomsmeding.com/ox-arrays/atom/src/Data/Array/Mixed/Internal/Arith?h=repro-9.14-branch 2025-03-20T12:01:24Z Separate arith routines into a library 2025-03-20T12:01:24Z Tom Smeding tom@tomsmeding.com 2025-03-20T12:01:24Z urn:sha1:55036a5ea4a6e590d0404638b2823c6a4aec3fba The point is that this separate library does not depend on orthotope. arith: Don't FFI-import unused dotprod_*_strided ops 2025-03-18T20:32:49Z Tom Smeding tom@tomsmeding.com 2025-03-18T20:32:49Z urn:sha1:27c2823387b21e8ed801e4d8eeb0b3e5588a2920 Optimise reductions and dotprod with more vectorisation 2025-03-14T20:58:51Z Tom Smeding tom@tomsmeding.com 2025-03-14T20:57:56Z urn:sha1:6276ed3c7bcd20c8b860e1275386ecd068671bcc Turns out that if you don't supply -ffast-math, the C compiler will faithfully reproduce your linear reduction order, which is rather disastrous for parallelisation with vector units. This changes the summation order, so numerical results might differ slightly. To wit: the test suite needed adjustment. Implement quot/rem 2025-03-13T08:27:51Z Tom Smeding tom@tomsmeding.com 2025-03-12T22:20:13Z urn:sha1:ed6acbe5f409aba2fb222693da567ce04b7c4e01 Binary ops without normalisation 2025-03-12T21:25:35Z Tom Smeding tom@tomsmeding.com 2025-03-05T23:08:40Z urn:sha1:766a925698a97cac03e972bdaa2500085be17c65 Before: > sum(*) Double [1e6] stride 1; -1: OK > 68.9 ms ± 4.7 ms After: > sum(*) Double [1e6] stride 1; -1: OK > 1.44 ms ± 50 μs arith: Unary float ops on strided arrays without normalisation 2025-03-05T21:09:50Z Tom Smeding tom@tomsmeding.com 2025-03-05T21:09:50Z urn:sha1:984e5315768dd190a97069167daf970c17c3c867 arith: Only strided unary int ops 2025-02-16T22:50:07Z Tom Smeding tom@tomsmeding.com 2025-02-16T22:49:56Z urn:sha1:71908c23307952fac26a4e24066e064d9cbb71c0 This should have negligible overhead and will save a whole bunch of C code duplication when the FUnops are also converted to strided form. arith: Unary int ops on strided arrays without normalisation 2025-02-15T23:30:25Z Tom Smeding tom@tomsmeding.com 2025-02-15T23:30:25Z urn:sha1:c14017f4bc28951be7e298d01769b5b49384a7c3 Add {m,r,s}dot1Inner 2024-06-19T13:57:43Z Tom Smeding t.j.smeding@uu.nl 2024-06-19T13:57:43Z urn:sha1:aafe5f6b5fa772d0e2e9f9b4f91bc3e7cf696840 Clean up Foreign.hs 2024-06-18T19:55:35Z Tom Smeding tom@tomsmeding.com 2024-06-18T19:55:35Z urn:sha1:97ab8502b9cd3f7d908160d13c7d85d23c99e203