Good news – I’ve finally made some progress with performance improvements for v8. I’m not yet where I want to be (ie, where I think the true “speed of light” for the KNLs should be), but as of last night, I finally got a newly vectorized version that was already running around 30% faster than the previous one. There’s more potential for optimizations in that version – it’s a complete rewrite, so lot still in flux – but 30ish% is already enough to at least share this version. I’ll need probably tonight – and maybe tomorrow – to do some more testing, burn in, packaging, etc, but “something” should be coming up soon.
BTW: Just to explain where the entire v8 performance issue came from: In the past, the cryptonight family always stressed only on memory performance, with relatively little “compute” thrown in … yes, there was the AES encoding step, and the 64-bit multiply, but both are hardware-supported on CPUs and KNL, so the “true” cost was exclusively in the memory system. With v8, this has changed – there’s now some pretty nasty 64-bit division and double-precision floating point multiply (plus a lot of additional gunk) in the inner loop, and these are pretty compute intensive. To get these pieces fast I had to completely change the vectorization pattern in the inner core, and doing that is a pain if ever there was one: do a single bit wrong and you get a wrong result, and since all intermediary numbers are completely meaningless semi-random bit-patterns it’s near impossible to reasonably debug …. all you can do is write gigabytes of log files of every operation performance, and compare them bit by bit.
Anyway – that restructured code is now working. Lots of opportunity to do some low-level optimizations, and probably even a reasonable way of porting all that to KNC, too (which needs the same kind of vectorization) …. but at least in the short term I had to change a lot, probably broke a lot (including the regular CPU version :-/), etc. It’ll take a day to clean up and release a first version, but from then on we’re back on an upward ramp. Happy!
With that – happy mining!
PS: Just to give you guys an idea of just some of the things I had to deal with on this rewrite: The newly vectorized code needs to do some 64-bit integer divisions, and though KNL can do that in AVX512F the respective intrinsic for that (_mm512_div_epu64) is not even supported in either clang or gcc (not even in the latest top-of-tree’s, let alone released version); and though the Intel compiler does support this operation you need the brand newest latest intel compiler to even run on ubuntu 18; and …..
Congratulation Luk!
LikeLike
Thanks for the hard work man!
LikeLike
Wow, 30% is a big jump. Kudos. And the fact you can even ramp up a little more is amazing. Good luck man. As always you keep coming through in the clutch.
LikeLike
Will the new version enhance other algo too, or just xmrv8?
LikeLike
Could you change a devshare pool pls,i always connect xmr-usa.dwarfpool.com:9100 failed (FROM CN).
[11:53:17] waiting for successful login on devshare account…
[11:53:17] (if miner seems to get stuck here, please make sure
[11:53:17] that your firewall/network config allows the miner
[11:53:17] access to the devShare pool(s)/port(s) specified above)
[11:53:18] waiting for successful login on devshare account…
[11:53:18] (if miner seems to get stuck here, please make sure
[11:53:18] that your firewall/network config allows the miner
[11:53:18] access to the devShare pool(s)/port(s) specified above)
[11:53:19] waiting for successful login on devshare account…
[11:53:19] (if miner seems to get stuck here, please make sure
[11:53:19] that your firewall/network config allows the miner
[11:53:19] access to the devShare pool(s)/port(s) specified above)
[11:53:20] waiting for successful login on devshare account…
[11:53:20] (if miner seems to get stuck here, please make sure
[11:53:20] that your firewall/network config allows the miner
[11:53:20] access to the devShare pool(s)/port(s) specified above)
LikeLike
Done. Seems dwarfpool is having some issues. Changed to cryptoknight.cc for now. New binary just uploaded (still .14.0)
LikeLike
Thanks a lot,why not use supportxmr or nanopool
LikeLike
knc when?
LikeLike