v4r update…

Guess I should have waited for about an hour before posting the last article, but that’s how it always is – only after you give up hope do you finally find your breakthrough :-).

Anyway; quick update: Good news – I got code correctness for the upcoming v4r change. It’s still “somewhat less performant than I’d like it to be” (just by an order of magnitude or so, so …. meh…), but from here on out it’s no longer the extreme pain of trying to find bugs in seemingly random code sequences – now it’s only an optimization problem. Really happy – done for today, time for a beer.

Happy mining, guys.

Upcoming Monero xmr-v4r

I know, I know – I’ve been awfully quiet recently …. new job, new responsibilities, plus several paper deadlines I’m working on (for my day job), and of course, the usual domestic emergencies on top of that. Basically, I’ve been so much under water that I hardly even followed developments in crypto-land any more (let alone having had the time for any serious development…) – and basically, if you guys had not pointed it out to me I probably would even have missed the upcoming changes.

Anyway – you guys did send me a lot of messages recently, so at least I got reminded of the upcoming monero algorithm change (sometime next week, if my math is right). And to say so up front: Yes, I am working hard to try and intercept that. Over the last week or two, I finally did dig through some of the forums (monero, reddit,etc), and look through the existing miners (cpuminer, stak, xmrig, monero reference codebase) to at least figure out what this change will be, and let me say, that alone was an adventure – to put it bluntly: how the heck could we – as a community – end up in a state where each and every post and/or miner seems to be calling this by a different name? Cryptonight R? Random? CN-R? CNvR? Monero? And even better, variant 4 and CNv4? … Four? Seriously? When the last one was either v8 or v2, depending on who you asked? Couldn’t we simply have agree on a serial numbering – or if we can’t agree on that, at least year/month (19/3), or a Ubuntu-like CN19.02, or anything that makes sense? We are working on a technology that requires literally millions of highly distributed machines all over the world to run a parallel consensus protocol, in real time, to jointly agree on the state of a distributed blockchain, and we can’t even agree on an f-ing name for the algorithm we run??? Really? Apologies for my ranting, but I just had to get that off my chest.

Anyway – enough of the ranting, there’s a job to do. As said above, I have already looked through all the reference codes, and think I have a pretty good idea of what has to be done. Bad news: It’s a lot. Good news: I got at least some of that done already. Getting it to do the right thing (and to do it fast) – well, that’s another matter, but I’m still optimistic that I’ll be getting there in time. So far I’m already tracking the blockchain height, have hooked up the random-math code generator every time the height changes, have added the plumbing to pass that through to the miners, and am actually already calling at least an interpreter for that code. It’s still producing wrong hashes on the test net, but at least it’s doing something.

Of course, I do realize that having most of it working isn’t good to man or beast – a slightly wrong hash is just as wrong as a completely wrong one – but still: I’m reasonably optimistic that I should be getting those fixed in time. I’ll still have to hook up some sort of compiler for that code, but that should (knock on wood) actually be the simpler task … well, as they always sayd: we’ll see.

Anyway: Long story short – I’m working on it.

With that … happy mining!

PS: Oh, and in case you were wondering about the title: To make at least some partial fun of all the naming mess I decided that for my miner, I’d use a name that tries to digest as many of these random names as possible. So for me, “xmr-v4r” it’ll be!

v.14 and Devshare-Pool changes

V.14 is out as of last night; and can be downloaded from the usual place (http://www.lukminer.net/releases/). At least on my development machine this brings v8 hashrate back to about 2100H/s – not where I expect it to end up, yet, but sure better than what it was before.

Please note I also changed the devShare pool from DwarfPool to cryptoknight.cc – apparently DwarfPool was having some issues last night and was refusing connections (reported by a user, and verified by me as well); if there’s any issues with that let me know!

And finally – a final note to all the inquiries about when a KNC version will come back again: I’m working on it. As said before the changes that went into v8 make the code “easier” to port to KNC (the “explode” and “implode” phases should already work by now), and in particular, to then keep “in sync” with future changes. However – please understand that this is still a very different architecture than the x200 KNL phis – they’re both 16-SIMD machines, and many instructions are shared … but quite a few are missing one or or the other side, and even worse, a few look the same but behave somewhat differently. And of course, a single bit wrong in a single instruction, and the hash is wrong, with a nightmare to debug it …. so this takes some time. Oh, and of course, there’s no OpenCL, auto-vectorizer, or anything like that in the miner, so it’s not “just recompile” for the different architecture. Anyway, the summary is still the same: I’m working on it, it’s coming, but it’ll take some time. :-/

With that – happy mining!

Upcoming v.14 w/ KNL v8 Performance Improvements

Good news – I’ve finally made some progress with performance improvements for v8. I’m not yet where I want to be (ie, where I think the true “speed of light” for the KNLs should be), but as of last night, I finally got a newly vectorized version that was already running around 30% faster than the previous one. There’s more potential for optimizations in that version – it’s a complete rewrite, so lot still in flux – but 30ish% is already enough to at least share this version. I’ll need probably tonight – and maybe tomorrow – to do some more testing, burn in, packaging, etc, but “something” should be coming up soon.

BTW: Just to explain where the entire v8 performance issue came from: In the past, the cryptonight family always stressed only on memory performance, with relatively little “compute” thrown in … yes, there was the AES encoding step, and the 64-bit multiply, but both are hardware-supported on CPUs and KNL, so the “true” cost was exclusively in the memory system. With v8, this has changed – there’s now some pretty nasty 64-bit division and double-precision floating point multiply (plus a lot of additional gunk) in the inner loop, and these are pretty compute intensive. To get these pieces fast I had to completely change the vectorization pattern in the inner core, and doing that is a pain if ever there was one: do a single bit wrong and you get a wrong result, and since all intermediary numbers are completely meaningless semi-random bit-patterns it’s near impossible to reasonably debug …. all you can do is write gigabytes of log files of every operation performance, and compare them bit by bit.

Anyway – that restructured code is now working. Lots of opportunity to do some low-level optimizations, and probably even a reasonable way of porting all that to KNC, too (which needs the same kind of vectorization) …. but at least in the short term I had to change a lot, probably broke a lot (including the regular CPU version :-/), etc. It’ll take a day to clean up and release a first version, but from then on we’re back on an upward ramp. Happy!

With that – happy mining!


PS: Just to give you guys an idea of just some of the things I had to deal with on this rewrite: The newly vectorized code needs to do some 64-bit integer divisions, and though KNL can do that in AVX512F the respective intrinsic for that (_mm512_div_epu64) is not even supported in either clang or gcc (not even in the latest top-of-tree’s, let alone released version); and though the Intel compiler does support this operation you need the brand newest latest intel compiler to even run on ubuntu 18; and …..

XMR-v8 fork: Remember to UPDATE YOUR MINERS!

Today is the day for the v8 fork – and given that hash rate on Dwarfpool just took a precipitous drop I assume that it just happened. As such: Make sure to check your miners, and update to 12.1 as soon as v8 is active!

For those using the Phi 7220/7240 PCI cards – make sure to use 0.12.1, not the 0.12.0 I posted earlier this week – 0.12.0 works on socketed phis, but had a bug in the MPSS offload code, which got fixed in 0.12.1.

With that – happy mining!

PS: And of course, also change “-a xmrv7” to “-a xmrv8” in your mining scripts!

A Brief update on TRTL and AEON

Over the last week, I had at least two people ask me about updating the miner to support the cryptonight light algorithm required AEON and TRTL. When I got these requests, I was a bit confused … I thought I had updated those ages ago … but who knows, maybe there had been another fork!?

Well, I didn’t have any time to look into it until earlier today; but having now just re-tested the respective two command-lines from the “supported coins” page, I still am confused: at least on my side both AEON and TRTL are running just fine. That said, nobody seems to have run either one of these two coins for months (there’s no dev share activity for them), so there seems to be some issue that I cannot reproduce.

As such: If anybody did want to run those coins, and ran into issues with it: Please let me know. The only issue I could think of is that the miner that is preinstalled on the lukSticks is too old (in which case all you have to do is update the miner on those sticks), but otherwise it should work just fine.

All that said – AEON and TRTL might not be the most profitable coins for Phis: The big advantage of the phis is that they have lots of MCDRAM, so the “heavier” the coin the bigger the (relative) advantage over CPUs with smaller caches – so at least if the market forces are only mildly in effect CPUs with small caches should be most profitable on “light” coins, and phis most profitable at “heavy” ones.

With that – happy mining!

xmr-v8 ready to go …

Another heads-up: I finally manged to stress test the v.12 version that supports xmr v8, and at least on the test net it works perfectly fine also on the phis. I haven’t ported it to KNCs, yet (KNL has many more users, and thus much higher priority :-/), but at KNL seems to work fine.

The latest release (v0.12) is available at its usual place (http://lukminer.net/releases) – but make sure to remember to change your algorithm flag (“-a”) to “xmrv8” once the fork hits. And of course, do not use the v8 flag before that fork happens.

Finally, a note on performance: Don’t be too surprised if you’ll see significantly lower hash rates once v8 hits – the additional operations they added to the inner loop are really expensive, so on the (non-asrock) development machine I was using I’m seeing a drop from about 2500 to about 1700 H/s. That’s a little bit more than I expected, but as I just said: The additional operations are expensive, so any other CPU or GPU miner will likely see quite an impact on hash rate, too … which means difficulty should adjust accordingly, at least after a few days.

With that – happy mining!