Upcoming v.14 w/ KNL v8 Performance Improvements

Good news – I’ve finally made some progress with performance improvements for v8. I’m not yet where I want to be (ie, where I think the true “speed of light” for the KNLs should be), but as of last night, I finally got a newly vectorized version that was already running around 30% faster than the previous one. There’s more potential for optimizations in that version – it’s a complete rewrite, so lot still in flux – but 30ish% is already enough to at least share this version. I’ll need probably tonight – and maybe tomorrow – to do some more testing, burn in, packaging, etc, but “something” should be coming up soon.

BTW: Just to explain where the entire v8 performance issue came from: In the past, the cryptonight family always stressed only on memory performance, with relatively little “compute” thrown in … yes, there was the AES encoding step, and the 64-bit multiply, but both are hardware-supported on CPUs and KNL, so the “true” cost was exclusively in the memory system. With v8, this has changed – there’s now some pretty nasty 64-bit division and double-precision floating point multiply (plus a lot of additional gunk) in the inner loop, and these are pretty compute intensive. To get these pieces fast I had to completely change the vectorization pattern in the inner core, and doing that is a pain if ever there was one: do a single bit wrong and you get a wrong result, and since all intermediary numbers are completely meaningless semi-random bit-patterns it’s near impossible to reasonably debug …. all you can do is write gigabytes of log files of every operation performance, and compare them bit by bit.

Anyway – that restructured code is now working. Lots of opportunity to do some low-level optimizations, and probably even a reasonable way of porting all that to KNC, too (which needs the same kind of vectorization) …. but at least in the short term I had to change a lot, probably broke a lot (including the regular CPU version :-/), etc. It’ll take a day to clean up and release a first version, but from then on we’re back on an upward ramp. Happy!

With that – happy mining!

 

PS: Just to give you guys an idea of just some of the things I had to deal with on this rewrite: The newly vectorized code needs to do some 64-bit integer divisions, and though KNL can do that in AVX512F the respective intrinsic for that (_mm512_div_epu64) is not even supported in either clang or gcc (not even in the latest top-of-tree’s, let alone released version); and though the Intel compiler does support this operation you need the brand newest latest intel compiler to even run on ubuntu 18; and …..

XMR-v8 fork: Remember to UPDATE YOUR MINERS!

Today is the day for the v8 fork – and given that hash rate on Dwarfpool just took a precipitous drop I assume that it just happened. As such: Make sure to check your miners, and update to 12.1 as soon as v8 is active!

For those using the Phi 7220/7240 PCI cards – make sure to use 0.12.1, not the 0.12.0 I posted earlier this week – 0.12.0 works on socketed phis, but had a bug in the MPSS offload code, which got fixed in 0.12.1.

With that – happy mining!

PS: And of course, also change “-a xmrv7” to “-a xmrv8” in your mining scripts!

A Brief update on TRTL and AEON

Over the last week, I had at least two people ask me about updating the miner to support the cryptonight light algorithm required AEON and TRTL. When I got these requests, I was a bit confused … I thought I had updated those ages ago … but who knows, maybe there had been another fork!?

Well, I didn’t have any time to look into it until earlier today; but having now just re-tested the respective two command-lines from the “supported coins” page, I still am confused: at least on my side both AEON and TRTL are running just fine. That said, nobody seems to have run either one of these two coins for months (there’s no dev share activity for them), so there seems to be some issue that I cannot reproduce.

As such: If anybody did want to run those coins, and ran into issues with it: Please let me know. The only issue I could think of is that the miner that is preinstalled on the lukSticks is too old (in which case all you have to do is update the miner on those sticks), but otherwise it should work just fine.

All that said – AEON and TRTL might not be the most profitable coins for Phis: The big advantage of the phis is that they have lots of MCDRAM, so the “heavier” the coin the bigger the (relative) advantage over CPUs with smaller caches – so at least if the market forces are only mildly in effect CPUs with small caches should be most profitable on “light” coins, and phis most profitable at “heavy” ones.

With that – happy mining!

xmr-v8 ready to go …

Another heads-up: I finally manged to stress test the v.12 version that supports xmr v8, and at least on the test net it works perfectly fine also on the phis. I haven’t ported it to KNCs, yet (KNL has many more users, and thus much higher priority :-/), but at KNL seems to work fine.

The latest release (v0.12) is available at its usual place (http://lukminer.net/releases) – but make sure to remember to change your algorithm flag (“-a”) to “xmrv8” once the fork hits. And of course, do not use the v8 flag before that fork happens.

Finally, a note on performance: Don’t be too surprised if you’ll see significantly lower hash rates once v8 hits – the additional operations they added to the inner loop are really expensive, so on the (non-asrock) development machine I was using I’m seeing a drop from about 2500 to about 1700 H/s. That’s a little bit more than I expected, but as I just said: The additional operations are expensive, so any other CPU or GPU miner will likely see quite an impact on hash rate, too … which means difficulty should adjust accordingly, at least after a few days.

With that – happy mining!

Heads-up on XMR v8 support …

It’s been a while since I last posted (I’ve switched employers, and been rather busy in the next job …), and as I can see from the many emails I got quite a few people had already started to wonder whether I had disappeared from the face of the earth completely – and in particular, gotten a bit concerned what would happen with lukMiner once the upcoming Monero v8 fork will hit the road (which looks pretty soon now).

Thus – to hopefully put those fears to rest, here a quick heads-up: As of a few minutes ago I finally finished a first draft of the XMR v8 changes, and at least on the testnet they seem to be working fine. I haven’t done full testing and burn-in, yet, so final release may take a bit longer – but still: so far everything looks good for a “v8” version even before the actual switch will happen….

With that – happy mining!

Created new “static” supported-coins overview page …

I have in the past tried my best to document which coins lukMiner does support – by keeping an up-to-date readme with each version, by having my release scripts automatically post that together with the latest releases, by having the lukMiner binary print example commandlines when launched without parameters, by posting new blog articles every time I added a new coin, etc … but still, with the flurry of new cryptonight coin variants several users asked for some better “overview” of what is supported, how to call it, etc (I guess a new article for every new coin is good and well – but not all too useful if you’re new to lukMiner and have to google through 10 such articles).

In light of those requests, I did spend some time trying to figure out how WordPress really works, and did finally manage to create a new “static” page that is not a blog, but accessible from the main page of http://lukminer.org: It’s called – who’d have guessed – “Supported Coins”, and is also statically linked to its own URL: https://lukminer.org/supported-coins/).

In the future, I’ll still add updates to the README.md as I did in the past, but will also continually update this page with any newly supported coin, changes in how to execute them, etc. In particular, I’ll use the example command lines on this page to do my own testing – so if any of those don’t work, please let me know (and/or check your firewall settings).

With that – Happy Mining!

lukMiner v0.11.4 adds Stellite and Masari

Upon repeated popular request, I just added Stellite and Masari…

In the last few weeks there’s been a veritable flurry of different new “cousins” of algorithms in the cryptonight family – it all started with monero going v7, then aeon v1, haven, alloys, stellite, turtle, sumo, loki, ryo, niobio, and at least a dozen others that are all virtually identical, but with minute differences in their core algorithm. Adding those is relatively easy – they often differ in only a few lines of code (templates are your friend!), but it does take a certain effort to set up a node, create a wallet for the dev share, do some testing and burn-in, bake a new release, update the documentation, etc; as such, I usually add new coins only when they seem to be “real” (and even then, only on the next weekend :-/), but once a few users ask for it, I usually do add it at some point in time.

As such: I have the honor to present – ta-daa – the new version v0.11.4 that primarily adds Stellite and Masari. I’m also almost ready with IPBC and Arto, if only I get them to compile on the cloud node that hosts my nodes….

Please note I did run them for a little while for testing, but this time did not do a full 24 hour burn in test. If there are any issues, please let me know.

With that:

Happy Mining!

 

Haven …

Just released version 0.11.2, with support for Haven. That version was actually built two weeks ago, I only just realized that I had never moved it to the public part of the download pages (which is at http://www.lukminer.net/releases, of course), so nobody could actually use it – duh.

I had in fact already added Alloy, Turtle, and a few others recently, but the previous version did not yet contain Haven – they had their split right after the previous release, so I had missed that. Now of course, the moment the fork occurred I had some early users ask for support, I added it, gave them a (internal) link for testing, and …. promptly forgot about it, until another users earlier today sent me an email asking whether I had ever thought of adding it …

Either way – the new version is not online, and as far as I heard from those users that did already test it, it seems to be working just fine.

With that – happy mining!

Wheels within wheels… or better: Phis within Phis?

Another weekend, another article – this time about a special from-scratch “build” for a cute little desktop machine that contains a bootable Xeon Phi 7250 as main processor, and – to top it off – also some Phi 7220A KNL cards… so “wheel within wheels”, indeed!

This article is actually kind-of long overdue – part of it starts as early as early 2017(!), and even when it comes to writing I had initially planned an article for as early as January (as the time stamp on some of the photos indicated), but then forgot about it …. though truth be told, the second half of the article about adding the cards (which likely is being the more fun one) didn’t happen until more recently, anyway. So, better late than never.

Now before I write anything else, let me be clear that at least for this article it’s not exactly clear how “practical” it will be for the average reader. There was a ton of work in that one, most of the pieces are insanely hard to get, and in combination I don’t think that was the most profitable use of my time …. but it’s also been a huge amount of fun, so I’m not ruing it, either.

How did this all start …

The base motivation for this particular build was exactly the same as what several others have since experienced: I found some cheap KNL “ES/engineering sample” processors on Ebay (7250s, I think), thought that this would be a very cheap and easy way of building my own Phi box, and just bought them. After all, $300 for a $2k list price processor? And all one needs is a motherboard, plug it in, and done, right? How hard could this be? In particular, that was in early 2017, before all the pricing specials came out (and when coin prices where sky high), so this sounded like a really good deal.

Now once I had these Phis the “get a motherboard and plug it in” turned out to be a little bit harder than I had thought – back then, there were only two boards available that could take those chips: The “Adams Pass” server boards for the 2u4n Adams Pass servers, and the Supermicro “K1SPE” desktop boards that are also in those Supermicro SYS5031 KNL Desktop systems (and in the Colfax “Ninja” DAP systems – which were super-cool, by the way). Of those, the AP systems were near impossible to get in barebone form, and though I finally found one single node (for a horrendous price!) from some canadian reseller I never ever managed to get a chassis for it, and never figured out how to power up that board without said chassis (which has the power connectors for those nodes 😦 ).

So, option 2 it is: Build a desktop machine (which was the preferred option, anyway, since you can actually put that under your desk without going deaf). Now for that I needed a Supermicro K1SPE board, which turned out to be equally hard to find – apparently Supermicro tried to enforce some policy that those could be sold only in fully assembled systems (far too expensive), so finding one took forever – but eventually, I did.

OK, got a board, got a CPU, plug it in, power it up…. except that you need a fan, and try find one that fits this board…. sigh. Tried leaving off the fans and just blowing lots of cool air on it, and that made it at least boot – but only for a few seconds, then it shut off (surprise). After lots and lots of googline I eventually found a distributor in China that sold a heat sing that claimed to be compatible with this socket, except it wasn’t. The combination of “steel saw”, “file”, and “lots of sweat” eventually did convince it to fit the board’s screw holes, but by that time I had taken the CPU in and out so often (it gets stuck to the heat sink via the cooling paste :-/) that I seem to have bent some pins, and the board wouldn’t boot any more. Oh shucks.

Now, end of 2017 I went to the SC17 (“Supercomputing 2017”) conference in Denver, and while there, also strolled past the “CoolIT Systems” booth: CoolIT was the company that built the water-coolers for the Colfax DAP systems (as the big sticker on those coolers very clearly states), and though they didn’t sell those coolers individually, they were – upon asking – helpful enough to help me get one (well, maybe it didn’t hurt that I had worked with them in the past, in another capacity). Maybe not the most practical way for the average person to get hold of one, but either way – I turned lucky, and got one.

So, now having a cooler, it was time to reactivate this build. Sent the board in for repair (another $80, plus shipping, plus hours on Supermicro’s support web page), and finally got it back. And this is where the story starts.

Part 1: Building a Phi Desktop System

OK, at this point – Jan 2018 – I had the following pieces: A Xeon Phi 7250 bought off ebay; a Supermicro K1SPE board; and a CoolIT water-cooler that fit this board.

So, step 1: Take out my brand-new(ly repaired) K1SPE board, and plop in the 7250:

Step 2: Temporarily attach the water-cooler, and screw it on (tip: it might be easier to do this after you put it into a case :-/) :

Now, to test it all out, hook up some PSU (I used an EVGA Supernova 850). The board connects as usual, but for the cooler I couldn’t find a good way to connect so used a fan extension able that plugs into a SATA molex:

Now, to make it boot, I first had to find and read the manual: Since the board is not a regular ATX board it also doesn’t fit the regular power button etc connectors from a regular ATX case …. so had to read the manual to find the pins that the power button would go to if I only had the right proprietary SuperMicro case (you can see which two pins to short from the image below). Note it actually takes a while to boot – it doesn’t just boot up upon the first touching of those pins, you actually have to hold them shorted for a few seconds. Weird, but that’s what it is. Either way, eventually it booted.

Note I did add in a GPU as well – just for testing – but of course that’s not required, since the board does come with a VGA on-board graphics port (and in fact, for the BIOS config it’s easier if you use the VGA port, since the GPU “may” not yet be active to show it by that time. Also don’t get worried if it takes forever to boot – that is normal…. can be up to a few minutes until it beeps and enters BIOS, at least in my board. Either way, eventually it did boot, and I could enter the BIOS to change MCDRAM to cache/quadrant mode:

Interestingly my BIOS complained about something with KNM incompatibility – which doesn’t make the slightest sense because I have a Knights Landing CPU – not a Knights Mill one (at least as far as I know!?), so that was funny. Either way, it eventually did allow me to make the changes, anyway (just showing that for the fun of it 🙂 ), so everything turned out well.

Rebooted, added a harddisk, installed, and used this as a development machine for quite a while (initially even without a case – who needs a case, anyway!?).

Step 2: Putting it into a case

OK, though one doesn’t need a case, I thought that for the article it’d be way nicer to actually put it into one … so took a Phanteks Enthoo case I had from another (GPU) build, and put that in. Now why an actual section on that? Because as with everything “special built” it turned out it wan’t all that easy, of course.

The problem with fitting it into a case is that the board is not an ATX form factor board. It has about the same dimensions – so it physically fits – but …. but…. the screws to screw the board into the case are in different positions (at least some of them), so they don’t exactly fit. Now some do fit, so my first reaction was simply “I’ll just use the ones that fit” – after all, why would a board need more than two or three screws, anyway – it’s not going to fall out, is it? Well, let me give a clear warning here: The problem is not which board holes do not have any screws fitting them, but which “expected” screws have no holes on the motherboard: turns out there’s a few of those little thingys where the screws are supposed to go into in places where the board doesn’t have holes, and these do scratch the board from the bottom – which I only realized because it no longer wanted to boot (probably shorted something). Turns out I was lucky, and no lasting damage was done – but I did definitely have to take the board out again, find all the “offending” screws, and take them out.

Second problem was that after taking out all the wrong screws there’s so few screws that the board in some areas touches the case (in particular when you press on it to insert PCI cards, etc), so after the previous experience I decided to first cut out some insulation (the foam that comes in most motherboard shipping boxes), and put that in between board and case metal. Screwed on the remaining three or so screws that did have a fitting hole, and perfect. Squeezing in the water cooler (I put it at the top, so it can readiate out on the case’s top) was a bit tricky, but eventually – with some gentle nudges – it actually did fit. So here’s the complete system:

IMG_0486

Part 3: Wheels within Wheels

Now those first two parts were actually done in January 2018 (actually, over the Christmas/New Years’ holidays), and initially, I just put in some GPUs to use the machine as a plain old desktop box. But of course, if you already have some Phi-powered motherboard, wouldn’t it be fun to also use some Phi PCI cards in that? Phis within Phis?

Well, when I first built this machine I had already had some 7220P passive Phi cards from ebay – but those turned out to be horribly hard to cool in a desktop setup (see related article), and though I did think of moving the turbo-fans from that desktop build over into the KNL box it simply didn’t fit. So at least initially, I abandoned this idea – I was doubtful it’d even work, and I wasn’t going to spent all the work trying to rig something up that wouldn’t work.

Now later this year, as I wrote in another article I finally got my hands on some actively cooled Phi PCI cards, and ever since then wanted to play around with trying those in this machine – if I could get Z170 board to take one, maybe the phi board would take one, too?

Well, some time last week somebody that also reads this blog eventually asked me exactly this question: Do the K1SPE boards take Phi cards? And with that question, I finally decided to finally just try it out. And turns out, it does: First tried only one card, removed the old harddisk (which didn’t have MPSS on it), and tried with a regular lukStick … and turns out, it worked like a charm. Now getting bold, I also plugged in the second card, and lo and behold, that one still works. Amazed – all kind of boards have issues with those cards, and the one board that takes two cards without the slightest hitch probably wasn’t even ever intended to work with them.

Here a picture, of the full system: Note I used an old Celeron plastic box as a “spacer” to improve airflow into the cards (else they are back on back).

img_0492.jpg

With this experiment successful, only one thing had to be fixed: The existing luk-mpss-knl miner for mpss offload actually expected a regular (ie, non-KNL) host CPU, and thus got only low hash rate on the host CPU. Had to build a special version that ported the MPSS offload also into the luk-phi miner, but with that, lo and behold, we not have ca 9 kH in that one box: ca 2.6k from the host CPU (it’s centos, after all, and runs the MPSS offload), and two times 3.2kH from the cards…. which is nigh on perfect:

OK, that’s it for today – really have to get some food now…. I do want to stress again that this is probably not the most practical build to try and replicate – but I sure do hope you’ll enjoy it, anyway… it’s been super fun to build, so hope you’ll have some fun just by reading.

With that: Happy Mining!

PS: One of the fun things of this build is that it’s still using the EVGA Supernova 850G: A 850W power supply for a total of three KNLS (of 300W TDP each), plus board, case, fans, etc…..

 

Making lukSticks under Windows …

In the last few days I’ve received several questions from users that had issues with the lukSticks…. in particular, while most found them rather useful from the linux side of things, for the main user group for which they were intended – Windows users that aren’t comfortable with linux – there were some issues. And as I myself am not all too much of a windows person, I never ran into these issues myself, and just assumed that whoever is a windows person will likely figure out how to do all the iso image burning etc. Well, seems out that wasn’t the case, so here a few quick instructions.

First: Update to lukSticks v3.1

Before I go into the actual burning instructions: If you still have a lukStick image pre the images I uploaded last night (yes, I still have to write about those …): Please update to those “v3.1” image (they’re at http://www.lukminer.net/releases/lukSticks-v3.1). These v3.1 lukSticks not only have the latest version of the miner you’ll need for the post-sumo times, they also have several improvements in particular for windows users:

  • reduced size: To adjust for USB sticks’ slight variations in “usable” size the old lukSticks already had some “unused space” at the end, so when burnt onto a sligthly too small USB stick no real harm would be done, because it was only ‘unused’ space that was lost. That in theory worked, but apparently some burning programs simply threw an error if they couldn’t write the whole thing, so the new sticks are only as big as they really need to be (Duh, should have done that in the first place). The cpu-phi one should be about 7.5GBs in size, the mpss-knl one is about 15 GBs in size (it’s so much bigger because it basically contains 8 live images for the up to 8 KNLs).
  • DOS/Windows newline/line feeds: In the old lukSticks, the config files were on a DOS/Windows readable FAT file system, but unfortunately were written under linux, so contained only unix-style newlines … meaning that whenever one opened those under Windows one would see all the individual config file lines squashed together into a single line; and if one did succeed in editing those, the editor almost invariable saved with windows-newlines, which then confused the mining script, leading to errors such as “unknown algorithm ‘-a sumo<newline>'” … The new sticks fix that: the lukMiner.cfg file is in DOS newlines, and the miner script converts those on the fly when running.

Of course, make sure to download the right stick. Currently, there’s two versions (the KNC one is still missing):

  • mpss-knl: That’s the one to use only if you have a system with 7220A or 7240 PCI cards (which usually have regular Xeon/Core/Celeron host CPUs). Do not use those on the bootable phi asrock/exxact etc machines. Note on terminology: the “mpss-knl” stands for “mpss offload for knl cards”.
  • cpu-phi: This is the one to use if you have an Asrock, Exxact, Neon Miner, etc based system (the cpu-phi stands for “cpu or phi”)

Unzip

The iso images I’ve uploaded are gzipped (of course), so unzip them first until you have the full ‘.iso’ images. I used 7zip (https://www.7-zip.org/), but any other tool should do.

Burn to Disk (“imageUSB”)

Now this is the tricky stage, because “apparently” the default “right-click then “burn to disk”‘ option under windows doesn’t always do what one expects it to do (at least one user had issues with that).

The tool used is called “imageUSB” (https://www.osforensics.com/tools/write-usb-images.html), and it seems to work just fine. I’m not much of a windows user, so I don’t have any opinions on this vs probably many other tools – it’s just the first one that google came up with (except one other one that wanted me to register for something), and it seems to do the trick.

Here a screenshot of using this tool, on a brand new SanDisk Cruzer 16GB stick:capture-lukstick-begin

Once it’s done burning, it should show something like that:capture-lukstick-burned

Starting to configure: Un-plug, and Re-plug

Now after you’re done burning (and just as the README says you should): First unplug your USB stick, and re-insert it – else Windows won’t realize that there’s now different file systems on that stick.

When you do so, what should happen is that windows eventually (after possibly a few seconds) find two new drives (D: and E: in my case), it’ll likely open the file explorer (or whoever this thingy is called under windows), and – and this is perfectly OK – it’ll open a window saying it can’t read one of those drives, and suggests to format it … something like the following:

capture-lukstick-reinserted

Note this “error” is perfectly OK, so do not format this drive! What happens here is that the USB stick contains two partitions (what windows calls “drives”) – one windows one with the config files (D:, which is OK), and of course the linux partition with the linux OS and miner – and this of course windows can’t read. Do not format it, though, or you won’t have a linux OS (or miner) to run from ….

Last: Edit the lukMiner.cfg config file

Once you click away the annoying “format” windows, just open the windows-readable drive (if it didn’t open automatically), and start editing the lukMiner.cfg file, which on my machine looks roughly like that:

capture-lukstick-editing-lukMiner-cfg

Note how in the v3.1 version this config file is nicely editable under windows, with proper newlines and everything, so from that point on you should be safe.

With that: Happy Mining!

PS: And yes, I did go through that – two times already – and the resulting USB stick seems to work just fine!