ox-arrays - Nested, compositional struct-of-arrays orthotope arrays

	Commit message (Collapse)	Author	Age
*	Dotprod: Optimise reversed and replicated dimensions	Tom Smeding	2025-03-25
\|
*	Separate arith routines into a library	Tom Smeding	2025-03-20
\| \| \| \|	The point is that this separate library does not depend on orthotope.
*	arith stats: Print timings with 3 digits precision	Tom Smeding	2025-03-18
\| \| \| \| \|	If you render microseconds timings as milliseconds, you _have_ only 3 digits behind the decimal point.
*	arith stats: Improve output format	Tom Smeding	2025-03-18
\| \| \| \| \| \|	This makes it nicer to process using unix tools. Try: $ sed -n '/ox-arrays-arith-stats start/,/ox-arrays-arith-stats end/ !d; /===/ !p' \| sort -n -k4,4 -k6,6
*	Arith statistics collection from C	Tom Smeding	2025-03-18
\|
*	Optimise reductions and dotprod with more vectorisation	Tom Smeding	2025-03-14
\| \| \| \| \| \| \| \| \|	Turns out that if you don't supply -ffast-math, the C compiler will faithfully reproduce your linear reduction order, which is rather disastrous for parallelisation with vector units. This changes the summation order, so numerical results might differ slightly. To wit: the test suite needed adjustment.
*	arith: Remove CASE1, add restrict	Tom Smeding	2025-03-14
\| \| \| \| \| \| \| \| \|	Turns out that GCC already splits generates separate code for an inner stride of 1 automatically, so no need to do fancy stuff in C. Also, GCC generated a whole bunch of superfluous code to correctly handle the case where output and input arrays overlap; since this never happens in our case, let's add `restrict` and save some binary size.
*	Add atan2	Tom Smeding	2025-03-13
\|
*	arith: Fix enum typing typos	Tom Smeding	2025-03-13
\|
*	Implement quot/rem	Tom Smeding	2025-03-13
\|
*	Binary ops without normalisation	Tom Smeding	2025-03-12
\| \| \| \| \| \| \| \| \|	Before: > sum() Double [1e6] stride 1; -1: OK > 68.9 ms ± 4.7 ms After: > sum() Double [1e6] stride 1; -1: OK > 1.44 ms ± 50 μs
*	C: Simplify DOTPROD_STRIDED_OP signature	Tom Smeding	2025-03-05
\|
*	arith: Unary float ops on strided arrays without normalisation	Tom Smeding	2025-03-05
\|
*	arith: Only strided unary int ops	Tom Smeding	2025-02-16
\| \| \| \| \|	This should have negligible overhead and will save a whole bunch of C code duplication when the FUnops are also converted to strided form.
*	arith: Unary int ops on strided arrays without normalisation	Tom Smeding	2025-02-16
\|
*	Add {m,r,s}dot1Inner	Tom Smeding	2024-06-19
\|
*	More sensible argument order to reduce1 C op	Tom Smeding	2024-06-18
\|
*	C cleanup: abstract strides[rank-1] case into macro	Tom Smeding	2024-06-18
\|
*	sumAllPrim	Tom Smeding	2024-06-17
\|
*	Only use intel SIMD on intel platforms	Tom Smeding	2024-06-12
\|
*	Fix SIMD code to allow for unaligned arrays	Tom Smeding	2024-06-11
\|
*	Manual vectorisation of dot product for floating points	Tom Smeding	2024-06-10
\|
*	Dot product	Tom Smeding	2024-06-10
\|
*	Rename arg{min,max} to {min,max}Index	Tom Smeding	2024-06-10
\|
*	argmin and argmax	Tom Smeding	2024-06-09
\|
*	Fast (C) Floating ops	Tom Smeding	2024-05-27
\|
*	Fast Fractional ops via C code	Tom Smeding	2024-05-26
\|
*	Refactor C interface to pass operation as enum	Tom Smeding	2024-05-26
\| \| \| \| \|	This is hmatrix style, less proliferation of functions as the number of ops increases
*	Add more const in C arith ops	Tom Smeding	2024-05-24
\|
*	Better naming in C code	Tom Smeding	2024-05-23
\|
*	Fast sum	Tom Smeding	2024-05-23
\| \| \| \|	Also fast product, but that's currently unused
*	Fast numeric operations for Num	Tom Smeding	2024-05-23