nullandkale's comments

nullandkale · on May 18, 2024

Currently ILGPU already supports arm64 linux if you have a cuda or opencl device, so supporting apple silicon is not out of the question. It has been floated that we would support Metal via Vulkan support and MoltenVK but not much work has been done on that front.

Currently SIMD support for a fast CPU accelerator is the main focus for the next big accelerator type.

ein0p · on May 18, 2024

Note also that Apple SoCs also have a matrix unit, called AMX, which is designed specifically for matmuls. You could expand the math throughput of the CPU quite dramatically by taking advantage of it. I don’t think there’s any official public documentation on it, but here’s a GitHub repo that documents it independently https://github.com/corsix/amx

neonsunset · on May 18, 2024

Keep in mind that Apple's AMX is pretty much undocumented and not guaranteed to be stable, so the effort invested into integrating reverse engineered work might not be the best allocation of resources. Particularly now that M4 supports ARM SME which is the "official" extension (though not yet offered via hardware intrinsics in .NET (pretty much no hardware on the market supports it as of now), with the closest one in the form of SVE2 coming in .NET 9).

I did look once at back-end implementations and contributing either Metal back-end or adapting Vulkan back-end to run on top of MoltenVK seemed much more realistic.

With that said, OpenCL already works as is so you are not completely unsupported.

ein0p · on May 18, 2024

No disagreement on allocation. I just don't see why anyone would use something like this in the CUDA/CPU ecosystem where lots of established alternatives exist that do not require one to use C# or F#, neither of which is common in scientific computing.

OTOH the field is almost bare on the Apple side, in spite of there being over a billion devices out there with relatively low hardware fragmentation and almost uniform ISA throughout the entire product lineup. Hence the suggestion.

neonsunset · on May 18, 2024

C# and F# are very niche that is true. I think the basics of the languages are way nicer and saner than Python's and C++'s.

You can just give it a try and see if you like it or not.

In general, I think you are right and Python has completely won for high-level libraries while C++ has completely won for implementation, and threshold for making people move is way too high: difficult to match 5-10x improvement over Python in experience, and C++ crowd would never even look at C#, let alone think it has something to offer them, because the mythology says that the only true way is C/C++ for this kind of code and C# is just weird Java (especially now with ggml being on the radar of many).

(fun thought experiment: imagine average reaction to a statement "you can write high level code that compiles to Metal Performance Shaders or targets Apple AMX but it's C#", not dissimilar to a reaction when people hear that C# is the prime choice for portable SIMD code)

nullandkale · on May 18, 2024

We have thought of supporting the tensor cores in cuda devices as well, we could probably use the same abstractions we need for that for the amx support. Unfortunately we mainly focus on cuda support because most cases people are using cuda for gpu compute purposes.

ein0p · on May 18, 2024

CUDA is a bit of a well-trodden ground, you aren’t going to do much better there (if at all) than cuBLAS and cuDNN. But I get what you’re saying, gotta pick one’s battles.

neonsunset · on May 18, 2024

My understanding is it's less about competing with cuBLAS and cuDNN directly but rather offering the features they expose in a better and more idiomatic way - there's a reason it's less fun and more tedious to write C++ AMP code.

ein0p · on May 18, 2024

Why would anyone write C++ AMP code when AMP is deprecated, and e.g. Triton exists though?

nullandkale · on May 18, 2024

I use ILGPU all the time! You would be amazed the performance you can get out of c# if you are careful about how you write your code. I recently wrote a super fast gaussian splat renderer in c# and used ILGPU to sort the splats on the gpu. You can even do weird cuda stuff like pass buffers between opengl and cuda, all in c#! https://github.com/NullandKale/OpenTKSplat/blob/main/OpenTKS...

nullandkale · on May 18, 2024

I use ILGPU all the time for prototyping different image processing and rendering things for work and for my personal projects. I use cpp for a few things where the extra efficiency is worth it, but for most tasks c# is plenty fast enough.