Good news – I’ve finally made some progress with performance improvements for v8. I’m not yet where I want to be (ie, where I think the true “speed of light” for the KNLs should be), but as of last night, I finally got a newly vectorized version that was already running around 30% faster than the previous one. There’s more potential for optimizations in that version – it’s a complete rewrite, so lot still in flux – but 30ish% is already enough to at least share this version. I’ll need probably tonight – and maybe tomorrow – to do some more testing, burn in, packaging, etc, but “something” should be coming up soon.
BTW: Just to explain where the entire v8 performance issue came from: In the past, the cryptonight family always stressed only on memory performance, with relatively little “compute” thrown in … yes, there was the AES encoding step, and the 64-bit multiply, but both are hardware-supported on CPUs and KNL, so the “true” cost was exclusively in the memory system. With v8, this has changed – there’s now some pretty nasty 64-bit division and double-precision floating point multiply (plus a lot of additional gunk) in the inner loop, and these are pretty compute intensive. To get these pieces fast I had to completely change the vectorization pattern in the inner core, and doing that is a pain if ever there was one: do a single bit wrong and you get a wrong result, and since all intermediary numbers are completely meaningless semi-random bit-patterns it’s near impossible to reasonably debug …. all you can do is write gigabytes of log files of every operation performance, and compare them bit by bit.
Anyway – that restructured code is now working. Lots of opportunity to do some low-level optimizations, and probably even a reasonable way of porting all that to KNC, too (which needs the same kind of vectorization) …. but at least in the short term I had to change a lot, probably broke a lot (including the regular CPU version :-/), etc. It’ll take a day to clean up and release a first version, but from then on we’re back on an upward ramp. Happy!
With that – happy mining!
PS: Just to give you guys an idea of just some of the things I had to deal with on this rewrite: The newly vectorized code needs to do some 64-bit integer divisions, and though KNL can do that in AVX512F the respective intrinsic for that (_mm512_div_epu64) is not even supported in either clang or gcc (not even in the latest top-of-tree’s, let alone released version); and though the Intel compiler does support this operation you need the brand newest latest intel compiler to even run on ubuntu 18; and …..
8 thoughts on “Upcoming v.14 w/ KNL v8 Performance Improvements”
Thanks for the hard work man!
Wow, 30% is a big jump. Kudos. And the fact you can even ramp up a little more is amazing. Good luck man. As always you keep coming through in the clutch.
Will the new version enhance other algo too, or just xmrv8?
Could you change a devshare pool pls,i always connect xmr-usa.dwarfpool.com:9100 failed (FROM CN).
[11:53:17] waiting for successful login on devshare account…
[11:53:17] (if miner seems to get stuck here, please make sure
[11:53:17] that your firewall/network config allows the miner
[11:53:17] access to the devShare pool(s)/port(s) specified above)
[11:53:18] waiting for successful login on devshare account…
[11:53:18] (if miner seems to get stuck here, please make sure
[11:53:18] that your firewall/network config allows the miner
[11:53:18] access to the devShare pool(s)/port(s) specified above)
[11:53:19] waiting for successful login on devshare account…
[11:53:19] (if miner seems to get stuck here, please make sure
[11:53:19] that your firewall/network config allows the miner
[11:53:19] access to the devShare pool(s)/port(s) specified above)
[11:53:20] waiting for successful login on devshare account…
[11:53:20] (if miner seems to get stuck here, please make sure
[11:53:20] that your firewall/network config allows the miner
[11:53:20] access to the devShare pool(s)/port(s) specified above)
Done. Seems dwarfpool is having some issues. Changed to cryptoknight.cc for now. New binary just uploaded (still .14.0)
Thanks a lot,why not use supportxmr or nanopool