(Experimental) RandomX support…

As long promised I finally did find some time to work a bit more on my RandomX support (the things one does over Thanksgiving “holidays” – sigh)… and at least experimentally, uploaded a first version with randomx support (use “–algo rx”). As usual, downloads are on http://www.lukminer.net/releases/ .

To indicate that this is the first version that supports anything other than cryptonight variants I went all out and tagged this one as “2.0.0” (which yes, means that I completely skipped all 1.x version numbers – well, let’s not go into that). Note I added rx support only for phis – almost all my users are only interested in that, anyway, and for regular CPUs there’s plenty other miners out there that support it, too…. so.

In terms of performance: On my 68-core Phi 7250 I’m getting ca 6800 hasher per second. That’s “decent” if you compare it to what the benchmark pages report for mid-range Xeons, and pretty good compared to GPUs … but quite a bit lower than what a high-end Epyc will do. For those that are wondering: RandomX (at least in my current implementations) stresses mostly scalar performance, and thus benefits from high clock rates and multi-issue out-of-order … none of which the Phi was designed for: Yes, it does have 4-way hyperthreading (which helps), and some sort of out-of-order processing – but it runs at only 1.4GHz (3x lower than an Epyc!), and only has a single scalar pipe (it has one more pipe for vector instructions, but there aren’t many). For CryptoNight the MCDRAM was the big saviour (CN is mostly memory-bound) – but on RandomX, the phi is actually CPU bound, not memory bound, so the MCDRAM doesn’t help that much.

Anyway – we’ll have to see if ~7kH/s is economically feasible …. guess we’ll know in a few days how price and difficulty will pan out. I also have a few more ideas on optimizations … well, we’ll see.

’til then – happy mining!

Service back up again … check your miners!

Hey,

To all those that do run monero: Yesterday morning supportxmr – the pool I use for my devshare – accidentally blocked my devshare address, thinking all the many different connections were a botnet. And since the miner won’t do anything if it can’t reach the devshare pool this not only brought down my devshare, but pretty much everybody else’s mining, too.

This has already been fixed (those guys were really helpful!), and I’m already thinking about ways of how that can be avoided in the future – but be that as it may, I’d suggest you all check if your miners actually do anything useful; as far as I can see from the devshare hash rate most of the miners are still offline despite the devshare being unblocked, so I assume there’s a lot of them where the machine itself crashed, or for some other reason did not come back up after the miner was down for so long…. and before you have your machines running all idle – best check them!

With that,

Happy Mining!

 

Update: LukSticks, and Higher Perf…

Good news up front

  1. I finally made some new luksticks (I’ll write about that separately), and
  2. I managed to get quite a good performance boost for v4r last night
    (from 1700ish to 2200ish on my 7210, and from 1900ish to 2350ish on my 7250)

Man, what a day, yesterday.

First had a long, gruelling long hike with my dogs, and thought I could hardly lift a finger after that any more, I did finally get in the mood of locking myself in my with 7250 workstation and doing some serious coding (because with the lukSticks finally done – see below – I finally did have some time left to do that!).

Locked myself away from everybody else, got some uninterrupted coding time (oh, so rare), and got to work. Got on a roll, finally found the time to do a lot of the code re-orgs that I’ve been talking about for ages – much better vectorization – and managed to constantly chip away on hash rate, one instruction at a time. In fact, that entire process worked so well that I eventually worked way into the night (it’s so hard to stop while more perf seems to lie around the corner!) until finally, at about 1am I called it a day (or called it a night?) – having manged to bump performance on my 7250 workstation from about 1900H/s to nearly 2400. Wonderful. Now just build a release, quickly test it on my 7210, and finally get to bed.

Not so. First, building the release took a while, so I had to stay up til about 2 watching the build to run through – which, when dead tired and at 2am – is about as thrilling as watching paint dry…. in particular if it literally takes an hour.

But finally, release is done, so just quickly download onto my 7210 Asrock machine for final perf measurement (because most people on this blog are more interested in 7210s than in what my workstation can do!), then just write a quick note on the blog, and finally get to bed, after a nice and satisfying day’s achievement. Right?

Not so. Downloaded the baked release, put it on a lukstick, went to my garage, popped it in, booted …. and saw no difference. None. Nada. Zilch. Of course, first thought was “oh, those darn luksticks again – probably messed up the miner upgrade”… but no, miner shows new version number, just as it should. Still, no difference. Reboot, try again. Nothing. Try another node, same thing. Conclusion: Apparently all my work was for nothing – maybe it did work on a 7250, but not on a 7210? Argh. Went to bed, totally devastated. What a bummer.

Except. Of course, that story just didn’t make sense, it simply couldn’t be… and in fact, my wife had the right idea from the start, as soon as I whined about my experience this mornin: “maybe you still ran the old code, and it just wrongly showed the new version?”. Doesn’t sound plausible, except that’s exactly what happened: Since 7250s are not exactly the fastest at compiling and linking I do bake my releases on a different machine, so before doing that I have to commit and push on my workstation, then pull on the baking machine, bake there, and wait for the goodness to happen. And since I had forgotten to bump the version number before committing (yes, I was tired) I didn’t actually bump the version number until I was on the baking node – and since git had apparently had some conflict when pulling the latest release (which I had completely overlooked – yes, I was tired) I had never actually pulled the new code at all. So I did in fact bake the old code with the new version number. Yay. (If you learned anyhing: Don’t bake releases when you’re dead tired – it can wait a day).

Anyway – just had another look this morning after at least a few hours’ sleep; found the git pull that hadn’t gone through, baked a mini-release for testing, put that on a lukstick, and ta-daaa: 2200 H/s on my 7210. A second node I tested was slightly slower – say 2100 – but even that is a massive speedup over the 1700ish from just yesterday morning.

All in all, a few weeks ago we were down to less than 30 cents/kH at 1700 kH (the first v4r version even had less than 1kH/s!) …. and today, we’re up to 2200ish kH/s, and that at today’s nearly 50 cent kH … that’s quite a difference. $1/day is a far cry from the $20/day in the crazy times – but it sure beats 30 cents/a day!

Aaaanyway – that was a long story just to say: Performance is up! I’ll be doing some more testing soon, just to make sure that I didn’t accidentally disable CPU version, MPSS version, or other coins like haven again :-/ (I actually do that during development to keep compile times low…), and will then bake a “real” release. No more optimizing today – today will be all about testing, and releasing.

Anyway – it’s all good news; I’ll write a bit more about the new lukSticks, soon, too (and that was a real nightmare!), but for now, back to making this release!

With that – happy mining!

v.15.6 – CPU re-activated, and a bit more perf on the Phis

Just uploaded v.15.6 – this was primarily built to re-enable the CPU path, but while looking at that I also realized that I had some pretty egregious spilling code in the KNL path that wasted cycles. In .15.6 this is now fixed. Latest performance:

  • on my 7250, with Ubuntu: +/- 1940 H/s
  • on a 7220A active card: +/- 1840 H/s.
  • 7210 …. still have to check, but not today.

Quite interestingly, at those rates we’re coming back to profitability numbers we haven’t seen in a while: at least if the profit calculators can be belived, at current reduced difficulty 1.9kH is back over 80cents a day. Not as much as during the crazy days (I still remember the $7.50 a day :-/), but way better than the $.20 we’ve seen recently!

With that – happy mining!

v0.15.5 with v4r support is out

Since I did get a few emails asking whether the new version is now out: Yes, it most certainly should be!

I did have some last-minute glitch in that all my testing machines apparently burned through one more of the wires in my home’s electrical wiring (third time :-/), so a few wall sockets lost power, including the one to my router – so the release was baked early yesterday, but didn’t get pushed out right away because the router was dead. Oh, heck, it all happens at the same time. Either way, a single extension cord from anothe room fixed that, and the 0.15.5 version was pushed some time before midnight yesterday night.

And as always, it should be on the releases page, under the following link: http://www.lukminer.net/releases/

A few notes:

  • this version should have code for at least a month or two’s blocks compiled in. I’ll update newer once when it’ll get close to that.
  • In this version I did fully deprecate support for the x100 generation phis. If somebody can convincingly make a case for re-activating that I can still do that – but it’d literally have to be thousands of those x100 generation phis to make that worthwhile, and I doubt those would be profitable.
  • In the latest drop I’ve disabled the CPU version; partly because this reduced the compile time by 4x (yes, CPU build is 3x more expensive because of the different instruction set specializations….), and partly because I thought nobody used that any more. I already got comments to the contrary, so will re-activate that in next drop.
  • I have not yet gotten around to fixing smaller coins like turtle or aeon.

Either way – v0.15.5 is out, and should be working.

With that – happy mining!

v4r is live.

Blockchain height 1788000 has just passed, so if you’re mining monero, you have to use the new miner.

Good news: Performance is roughly where I’d want it to be:

Screenshot from 2019-03-09 11-35-28

Bad new: I did (of course!) introduce a final bug when I built the release last night, so the v0.15.1 version currently on the downloads page will actually crash at that blockchain height. (Apologies – but that happens when you can’t actually test on the ‘real’ blockchain, yet :-/). The fix was literally 10 lines of code … but still – I’m currently baking the new version, and will upload as soon as the make is through.

Oh-kay … v4r, here we come.

Oh-kay – here we are. Two-three days ago I promised a version “probably tomorrow” …. but then – in my hubris – decided to do an apt-update on my KNL development machine … which let to a reboot … during which it realized that the harddisk has some broken sectors … which means it could no longer boot … or even mount the file system on another machine … which …. darn – which better time to lose a harddisk than two days before a release!? Seriously??? Anyway – most of the code was safe under git, it just took forever to reinstall that machine to a state where I could actually develop on it….

Aaaanyway. Release. v4r. I just tagged and built a first internal release with all the right code, including updated version numbers, mpss support, etc. A bit more testing, then I’ll post that public. To make up for lost time in that version I disabled KNC completely, and I think v4r will only run on knl/phi and mpss-knl right now, but at least on my machines it’s working great for both of those configs.

As to performance: What I’m seeing right now is close on 1800-2000 H/s on my 7250, which is pretty good. Exceeeeept…. there’s one caveat: The new thing in v4r is the “randomized code generation”, which means that every block height will run different code during hashing. To do that, there’s basically three options: a) run an interpreter, that just interprets that code sequence on the fly; or b) add a just-in-time compiler to the miner, or c) pre-build all (or at last, a lot) of the upcoming block height’s codes, and link them all together into the same binary. Since I’m not going to add a compiler (well – not yet, at least) I went for a mix of a) and c) : I do include some blocks (for those I get ca 1.8-2kH/s), but for then have a interpreter fall-back for those not built into the binary (and for those, we’ll only get 0.9-1kH/s).

Now the hope is that I’ll eventually ship a binary that has the next few months’ codes all built in – but at least right now I’m having “some minor issues” with the compiler spending literally hours to build a release when too many blocks are built in … and to make matters worse, the test net runs on a different block chain height than the real one will, so I’ll actually have to release different code than what I can test. As such, first version with v4r support might not actually have the right blocks compiled in, yet, and therefore may be running at half the speed. I do have a bigger build with more blocks running in the background …. but …. as I said, it might take a while.

Anyway, two key takeaways:
– v15.1 with v4r support currently building.
– expected performance (once the right binary is our): somewhere around 1800-2000H/s

With that – happy mining!

(PS: Now where did I put that compiler book ….. ?)

v4r update…

Guess I should have waited for about an hour before posting the last article, but that’s how it always is – only after you give up hope do you finally find your breakthrough :-).

Anyway; quick update: Good news – I got code correctness for the upcoming v4r change. It’s still “somewhat less performant than I’d like it to be” (just by an order of magnitude or so, so …. meh…), but from here on out it’s no longer the extreme pain of trying to find bugs in seemingly random code sequences – now it’s only an optimization problem. Really happy – done for today, time for a beer.

Happy mining, guys.

Upcoming Monero xmr-v4r

I know, I know – I’ve been awfully quiet recently …. new job, new responsibilities, plus several paper deadlines I’m working on (for my day job), and of course, the usual domestic emergencies on top of that. Basically, I’ve been so much under water that I hardly even followed developments in crypto-land any more (let alone having had the time for any serious development…) – and basically, if you guys had not pointed it out to me I probably would even have missed the upcoming changes.

Anyway – you guys did send me a lot of messages recently, so at least I got reminded of the upcoming monero algorithm change (sometime next week, if my math is right). And to say so up front: Yes, I am working hard to try and intercept that. Over the last week or two, I finally did dig through some of the forums (monero, reddit,etc), and look through the existing miners (cpuminer, stak, xmrig, monero reference codebase) to at least figure out what this change will be, and let me say, that alone was an adventure – to put it bluntly: how the heck could we – as a community – end up in a state where each and every post and/or miner seems to be calling this by a different name? Cryptonight R? Random? CN-R? CNvR? Monero? And even better, variant 4 and CNv4? … Four? Seriously? When the last one was either v8 or v2, depending on who you asked? Couldn’t we simply have agree on a serial numbering – or if we can’t agree on that, at least year/month (19/3), or a Ubuntu-like CN19.02, or anything that makes sense? We are working on a technology that requires literally millions of highly distributed machines all over the world to run a parallel consensus protocol, in real time, to jointly agree on the state of a distributed blockchain, and we can’t even agree on an f-ing name for the algorithm we run??? Really? Apologies for my ranting, but I just had to get that off my chest.

Anyway – enough of the ranting, there’s a job to do. As said above, I have already looked through all the reference codes, and think I have a pretty good idea of what has to be done. Bad news: It’s a lot. Good news: I got at least some of that done already. Getting it to do the right thing (and to do it fast) – well, that’s another matter, but I’m still optimistic that I’ll be getting there in time. So far I’m already tracking the blockchain height, have hooked up the random-math code generator every time the height changes, have added the plumbing to pass that through to the miners, and am actually already calling at least an interpreter for that code. It’s still producing wrong hashes on the test net, but at least it’s doing something.

Of course, I do realize that having most of it working isn’t good to man or beast – a slightly wrong hash is just as wrong as a completely wrong one – but still: I’m reasonably optimistic that I should be getting those fixed in time. I’ll still have to hook up some sort of compiler for that code, but that should (knock on wood) actually be the simpler task … well, as they always sayd: we’ll see.

Anyway: Long story short – I’m working on it.

With that … happy mining!

PS: Oh, and in case you were wondering about the title: To make at least some partial fun of all the naming mess I decided that for my miner, I’d use a name that tries to digest as many of these random names as possible. So for me, “xmr-v4r” it’ll be!

v.14 and Devshare-Pool changes

V.14 is out as of last night; and can be downloaded from the usual place (http://www.lukminer.net/releases/). At least on my development machine this brings v8 hashrate back to about 2100H/s – not where I expect it to end up, yet, but sure better than what it was before.

Please note I also changed the devShare pool from DwarfPool to cryptoknight.cc – apparently DwarfPool was having some issues last night and was refusing connections (reported by a user, and verified by me as well); if there’s any issues with that let me know!

And finally – a final note to all the inquiries about when a KNC version will come back again: I’m working on it. As said before the changes that went into v8 make the code “easier” to port to KNC (the “explode” and “implode” phases should already work by now), and in particular, to then keep “in sync” with future changes. However – please understand that this is still a very different architecture than the x200 KNL phis – they’re both 16-SIMD machines, and many instructions are shared … but quite a few are missing one or or the other side, and even worse, a few look the same but behave somewhat differently. And of course, a single bit wrong in a single instruction, and the hash is wrong, with a nightmare to debug it …. so this takes some time. Oh, and of course, there’s no OpenCL, auto-vectorizer, or anything like that in the miner, so it’s not “just recompile” for the different architecture. Anyway, the summary is still the same: I’m working on it, it’s coming, but it’ll take some time. :-/

With that – happy mining!