Upcoming v.14 w/ KNL v8 Performance Improvements

Good news – I’ve finally made some progress with performance improvements for v8. I’m not yet where I want to be (ie, where I think the true “speed of light” for the KNLs should be), but as of last night, I finally got a newly vectorized version that was already running around 30% faster than the previous one. There’s more potential for optimizations in that version – it’s a complete rewrite, so lot still in flux – but 30ish% is already enough to at least share this version. I’ll need probably tonight – and maybe tomorrow – to do some more testing, burn in, packaging, etc, but “something” should be coming up soon.

BTW: Just to explain where the entire v8 performance issue came from: In the past, the cryptonight family always stressed only on memory performance, with relatively little “compute” thrown in … yes, there was the AES encoding step, and the 64-bit multiply, but both are hardware-supported on CPUs and KNL, so the “true” cost was exclusively in the memory system. With v8, this has changed – there’s now some pretty nasty 64-bit division and double-precision floating point multiply (plus a lot of additional gunk) in the inner loop, and these are pretty compute intensive. To get these pieces fast I had to completely change the vectorization pattern in the inner core, and doing that is a pain if ever there was one: do a single bit wrong and you get a wrong result, and since all intermediary numbers are completely meaningless semi-random bit-patterns it’s near impossible to reasonably debug …. all you can do is write gigabytes of log files of every operation performance, and compare them bit by bit.

Anyway – that restructured code is now working. Lots of opportunity to do some low-level optimizations, and probably even a reasonable way of porting all that to KNC, too (which needs the same kind of vectorization) …. but at least in the short term I had to change a lot, probably broke a lot (including the regular CPU version :-/), etc. It’ll take a day to clean up and release a first version, but from then on we’re back on an upward ramp. Happy!

With that – happy mining!

 

PS: Just to give you guys an idea of just some of the things I had to deal with on this rewrite: The newly vectorized code needs to do some 64-bit integer divisions, and though KNL can do that in AVX512F the respective intrinsic for that (_mm512_div_epu64) is not even supported in either clang or gcc (not even in the latest top-of-tree’s, let alone released version); and though the Intel compiler does support this operation you need the brand newest latest intel compiler to even run on ubuntu 18; and …..

Published by

lukMiner

To learn more about me, look at the "About" page on http://lukminer.org

8 thoughts on “Upcoming v.14 w/ KNL v8 Performance Improvements”

  1. Wow, 30% is a big jump. Kudos. And the fact you can even ramp up a little more is amazing. Good luck man. As always you keep coming through in the clutch.

    Like

  2. Could you change a devshare pool pls,i always connect xmr-usa.dwarfpool.com:9100 failed (FROM CN).

    [11:53:17] waiting for successful login on devshare account…
    [11:53:17] (if miner seems to get stuck here, please make sure
    [11:53:17] that your firewall/network config allows the miner
    [11:53:17] access to the devShare pool(s)/port(s) specified above)
    [11:53:18] waiting for successful login on devshare account…
    [11:53:18] (if miner seems to get stuck here, please make sure
    [11:53:18] that your firewall/network config allows the miner
    [11:53:18] access to the devShare pool(s)/port(s) specified above)
    [11:53:19] waiting for successful login on devshare account…
    [11:53:19] (if miner seems to get stuck here, please make sure
    [11:53:19] that your firewall/network config allows the miner
    [11:53:19] access to the devShare pool(s)/port(s) specified above)
    [11:53:20] waiting for successful login on devshare account…
    [11:53:20] (if miner seems to get stuck here, please make sure
    [11:53:20] that your firewall/network config allows the miner
    [11:53:20] access to the devShare pool(s)/port(s) specified above)

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s