aboutsummaryrefslogtreecommitdiff
path: root/cbits
Commit message (Collapse)AuthorAge
* Separate arith routines into a libraryTom Smeding11 days
| | | | The point is that this separate library does not depend on orthotope.
* arith stats: Print timings with 3 digits precisionTom Smeding13 days
| | | | | If you render microseconds timings as milliseconds, you _have_ only 3 digits behind the decimal point.
* arith stats: Improve output formatTom Smeding13 days
| | | | | | This makes it nicer to process using unix tools. Try: $ sed -n '/ox-arrays-arith-stats start/,/ox-arrays-arith-stats end/ !d; /===/ !p' | sort -n -k4,4 -k6,6
* Arith statistics collection from CTom Smeding13 days
|
* Optimise reductions and dotprod with more vectorisationTom Smeding2025-03-14
| | | | | | | | | Turns out that if you don't supply -ffast-math, the C compiler will faithfully reproduce your linear reduction order, which is rather disastrous for parallelisation with vector units. This changes the summation order, so numerical results might differ slightly. To wit: the test suite needed adjustment.
* arith: Remove CASE1, add restrictTom Smeding2025-03-14
| | | | | | | | | Turns out that GCC already splits generates separate code for an inner stride of 1 automatically, so no need to do fancy stuff in C. Also, GCC generated a whole bunch of superfluous code to correctly handle the case where output and input arrays overlap; since this never happens in our case, let's add `restrict` and save some binary size.
* Add atan2Tom Smeding2025-03-13
|
* arith: Fix enum typing typosTom Smeding2025-03-13
|
* Implement quot/remTom Smeding2025-03-13
|
* Binary ops without normalisationTom Smeding2025-03-12
| | | | | | | | | Before: > sum(*) Double [1e6] stride 1; -1: OK > 68.9 ms ± 4.7 ms After: > sum(*) Double [1e6] stride 1; -1: OK > 1.44 ms ± 50 μs
* C: Simplify DOTPROD_STRIDED_OP signatureTom Smeding2025-03-05
|
* arith: Unary float ops on strided arrays without normalisationTom Smeding2025-03-05
|
* arith: Only strided unary int opsTom Smeding2025-02-16
| | | | | This should have negligible overhead and will save a whole bunch of C code duplication when the FUnops are also converted to strided form.
* arith: Unary int ops on strided arrays without normalisationTom Smeding2025-02-16
|
* Add {m,r,s}dot1InnerTom Smeding2024-06-19
|
* More sensible argument order to reduce1 C opTom Smeding2024-06-18
|
* C cleanup: abstract strides[rank-1] case into macroTom Smeding2024-06-18
|
* sumAllPrimTom Smeding2024-06-17
|
* Only use intel SIMD on intel platformsTom Smeding2024-06-12
|
* Fix SIMD code to allow for unaligned arraysTom Smeding2024-06-11
|
* Manual vectorisation of dot product for floating pointsTom Smeding2024-06-10
|
* Dot productTom Smeding2024-06-10
|
* Rename arg{min,max} to {min,max}IndexTom Smeding2024-06-10
|
* argmin and argmaxTom Smeding2024-06-09
|
* Fast (C) Floating opsTom Smeding2024-05-27
|
* Fast Fractional ops via C codeTom Smeding2024-05-26
|
* Refactor C interface to pass operation as enumTom Smeding2024-05-26
| | | | | This is hmatrix style, less proliferation of functions as the number of ops increases
* Add more const in C arith opsTom Smeding2024-05-24
|
* Better naming in C codeTom Smeding2024-05-23
|
* Fast sumTom Smeding2024-05-23
| | | | Also fast product, but that's currently unused
* Fast numeric operations for NumTom Smeding2024-05-23