(luk-)Mining with x100 “Knights Corner” Xeon Phis (3110, 5110, 7120, etc)

Okay; weekend again, so finally a few spare cycles to answer questions re mining with Phis… Yes, I know I promised an article on updated performance numbers for the second-generation Phis (the x200s), but since at least the PCI cards for the x200s are kind of hard to come by I recently also had lots of people ask me about mining with the older X100 Phis… and since that article is ready to go, let’s do this one first.

X100 Phi Variants … a primer…

With that introduction, this article is about mining with the x100 “Knights Corner” Phis – i.e., the Xeon Phi 7120’s, the 31S1P’s, SC 5110s, etc. I.e., pretty much any Xeon phi with a ‘1’ in the second digit of the four-digit model number (hence the ‘x100’ – who’d have guessed?).

First, for those interested in buying some on ebay, a bit of lore on model names: The ‘1’ in the second digit indicates a first-generation phi. The ‘A’ or ‘P’ at the end indicates whether it’s ‘A’ctively or ‘P’assively cooled (i.e., whether it has a fan or not)… and a fair word of warning: if you think of putting a ‘P’ one into your hobby mining rig at home – think again! (without strong forced airflow from the chassis you’ll quickly end up with something you can fry eggs on (and no, I’m not going to try!).

Apart from that ‘1’ and the ‘A’ and ‘P’, the other numbers indicate the exact number of cores, their clock rate, and how much memory those cards have. There’s too many variants to list here, but if you’re curious: you can look up any specific model’s exact config on http://ark.intel.com . And if you’re curious what hashrate a given x100 will give: it seems you can extrapolate rather easily: take the hash rate of a model you know (say, around 650H/s on a 7120), divide by core count and clock rate of that known model, then multiply by core count and clock rate of the desired model, and you should have a reasonable estimate. (No guarantees – your mileage may vary – I’m not a lawyer and this is not legal advice, etcpp).

Some frequently asked questions re Mining on X100 Phis…

OK, now to some questions I’ve literally gotten several dozen emails about:

Does lukMiner work on x100 phis? Yes, since around Christmas it does; I initially didn’t plan on supporting it, but too many people asked… The current code may not be just as fast as it could be, but I’d guess I’m within 10% of optimal, that’s good enough for now.

What do I have to do to run lukMiner on my Phis? In the first version, you had to manually copy it onto the phis, do some nasty stuff with port forwarding, etc … but that is no longer necessary! Since around 0.8.7 or so (forgot the actual version) lukMiner also supports so-called “MPSS offload”, which is way simpler. A complete step-by-step “howto” is further down in this article.

How do the x100s perform? Of course that varies by the actual model; but here are three typical models:

 Xeon Phi 7120  PCI-card coprocessor x100 (KNC)  61@1.2Ghz 8GB DDR5  ~650 1360
 Xeon Phi 31S1P  PCI-card coprocessor x100 (KNC)  57@1.1Ghz  8GB DDR5  570 mpss offload version
 Xeon Phi 3120A  PCI-card coprocessor x100 (KNC) 57@1.1Ghz 6GB DDR5 545  1130 As reported by user (thanks Jeremy!)

Why are the x100s so much slower than the x200s? Well, you’re comparing a roughly 5 year of piece of hardware (which is, at least according to ark.intel.com, already officially “end of life”d!) against a much newer one…. In particular, you’re comparing an architecture with tiny caches and regular DDR RAM to one with 30ish MB of cache and 16GBs of “high-bandwidth memory” (aka MCDRAM); and one with relatively “wimpy” in-order cores to one with rather powerful out-of-order ones.

Note the x100s aren’t actually all that bad in absolute terms – 650H/s on a 7120 is still comparable with a brand-new 1070, and certainly pays for the power! – but hey, it would be strange indeed if they’d still be compatitive with a x200…

Is it still worth mining on an x100? Well, there you have me. If I did have some spare money to invest I’d rather put it into x200s – they’re just so much more profitable. But if you already have them? Or get them at the right price? Sure, it still pays for the power! And since this is the older generation, there seem to be a lot of old X100 machines out there that are now being replaced with newer generation hardware …. (just snapped up a complete server on ebay, with Xeon cpu, memory, two 5110s, PSU, and everything, for only $800 total!).

What machines do those cards work in? That of course is the nasty question, because they do not work in all motherboards. At the very least, you board’s BIOS has to support “Above 4GB decoding” (sometimes called Large-BAR, for Large-Base Address Range Support). There may be other restrictions, but if there’s not 4GB decoding, it wont’ work. Also, as mentioend above make sure you system can actually cool those cards: If you have active ones (with fan) they’ll work just fine in a desktop case (see pic), but for the passive ones, you should have them in a server with strong airflow (or you’d better get very creative in your cooling!). Here two examples: One it a desktop I built from parts with an Asrock X99  motherboard, in a desktop case; the second is a refurbished 1U server I bought off ebay.

Just to be sure, in that server I did the “usual” trick of clipping the “lower” two fan cable (for the fan control) to force full fan speed (banshee!)… probably not necessary in a cooled data center, but hey, my basement isn’t exactly professionally cooled …. (top right pic; apologies for the low quality, but you can just make out the blue and yellow cables being clipped)

Running lukMiner on a x100 Phi system…

Basically, there’s two ways of running on an X100 phi – “native mode”, and “mpss ofload”. Though you can still run lukMiner in so-called ‘native’ mode, I will from now on assume that this is “advanced usage”, and that whoever wants to do so will already know how to do it. Thus, from now on I’ll assume you want to use lukMiner on x100 phi “the easy way”, using the MPSS stack.

Background: The x100 MPSS Stack

The MPSS (mic platform software stack, or something like that) is the OS software stack used to drive x100 cards. For lack of a better word, it’s what on a GPU you’d call “the driver” :-). If you already know what the MPSS tack is, and already have everything installed, you can just skip to the next section … but then you wouldn’t read this, anway, so I assume you’re new to this.

Step 1: Get the Latest Version of MPSS (Version 3.8.3)

You can freely download the MPSS tack from the following link: https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss. At the time of this writing, the latest version you’ll find there is 3.8.3 (and since the x100s have been EOL’ed that is unlikely to change 🙂 ). Please use this version; I won’t test or support any others. Once you have it, just follow the next few steps.

Step 2: Chose a suitable Linux Version (ie, CentOS 7.3)

In theory, the MPSS stack supports only RedHat and SUSE (in particular, Debian distributions such as Arch or Ubuntu are not suppert!). I personally don’t like either of those two choices, but luckily CentOS is fully compatible with RedHat, so for the remainder of this guide I’ll go with CentOS. In particular, I’m going with CentOS 7.3, and strongly suggest you do, too, since the steps below may be different on other linux flavors.

Step 3: Install CentOS 7.3

Once you downloaded you CentOS 7.3, put it on a USB stick, and install. I chose “gnome workstation/development tools” to start with, and suggest you do the same. You can always install missing packages later, but the steps below assume that this was what the OS was installed with.

Once installed, reboot, and accept the CentOS license. Note I would suggest to not (ever!) do a software update (aka “yum update”) on this machine, or the MPSS kernel modules to be installed later might not match any updated kernel version any more. You can of course rebuild those, but it’s easier to just stick with 7.3 by simply never updating….

Step 4: Get and Install MPSS 3.8.3

Once you rebooted and logged in, go to https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss, and download version 3.8.3 for linux/redhat 7.3. You should end up with a file called “mpss-3.8.3-linux.tar[.gz]”. Unpack it:

[luk@x100] tar xavf mpss-3.8.3-linux.tar

Then, go into the resulting directory …

[luk@x100] cd mpss-3.8.3-linux

become root

[luk@x100] sudo bash 
<password>
[root@x100] ....

and install both the rpm packages for both core software stack and kernel modules

[root@x100] yum install *rpm modules/*rpm

That may result in a lot of warnings , but hey, it’s only warning, no errors… (I hope).

Step 5: Configure MPSS

There’s a lot of things you can do to configure MPSS – if you have some time, read the user guide! – but for now, let’s only do what’s necessary to run the miner. In particular, I will only describe how to do this as root – that’s not exactly good policy under linux (yes, I know that!), but it’s the easiest to do…

As such, as root, generate some ssh-key:

[root@x100] ssh-keygen

Now, let’s start the mpss stack for the first time, and initialize it:

[root@x100] service mpss start
[root@x100] micctrl --initdefaults
[root@x100] service mpss restart

Note the first time you run the ‘service mpss start’ you’ll probably see some error messages – that’s fine becaues it’s not been initialized yet (that what the  ‘–initdefault’ in the second step does) – but at least it already loads the kernel modules that the ‘initdefaults’ requires :-).

Once the second ‘service mpss start’ has been done, you should now be able to do a

[root@x100] micinfo

… and should get some meaningful outputs for the mics you have in your system.

Note you’ll have to re-start the MPSS service after each reboot (though of course, you can automate that), but all the other steps have to be done only upon the first install (unless, of course, you change your config or update your linux kernel – which I suggest you don’t).

Step 5: Running the Miner

If you just completed the previous 4 steps the MPSS daemon is still running; if not, first start it:

[root@x100] service mpss restart

Once  the mpss service is properly running, running the miner is simple. Get the latest versoin, unpack it, then simply run:

[root@x100] ./luk-xmr-knc-mpss --host ...

and it should be working out of the box.

A few notes:

  • The miner will automatically mine on both CPUs and all available MICs. If you don’t want the CPU cores to be used, pass “-t 0” on the command line. (but hey, why shouldn’t you use them???)
  • In the simple setup explained above, you have to run the miner as root. You can change that, but that’s up to you.
  • The “service mpss start” has to be re-done upon every boot. Alternatively, you can put a “service mpss start” into /etc/rc.local (and do a chmod +x /etc/rc.local) to do it automatcically upon reboot.
  • If against all warnings you decide to go with another linux flavor, release version, differnt driver version, etc: I won’t even reply… :-/

Either way – that’s pretty much it. Should work out of the box.

With that: Happy mining!

 

“First light” on Phi 7250 hashrate

Since I wrote the first post on mining with a Phi 7220 setup I’ve literally been drowned in questions and inquiries, in particular relating to “what performance would I get on this or that other flavor of the phi”. For a while now, I’ve always replied to those as “I have ordered a 7250 system, and will update as soon as I get it”….

Now earlier yesterday, my first order from Exxact (a 2U system with 4 nodes, each having a phi 7250, 48GB of memory, a 150GB harddrive, and OPA cards) has finally arrived. I’m still in the very first stages of getting it up and running (I do have a day job, so more work – and a full blog post – has to wait until the weekend), but I at least installed a CentOS last night, ran a first test, and can therefore at least share the “first light” results: right out of the box (literally…), I’m currently getting 2350H/s per node (which’d be roughly 9.5kH total – I only installed one so far).

Now 2350H/s is not bad at all for a single node (in particular at that price), but the observant reader will immediately see that it is indeed somewhat less than expected: from pure clock rate and core count I’d have expected close on 3kH per node, so at 2350 it’s performing about 20% lower than expected. At least for now, I do not yet know why the 7250 actually performs 5-10% lower than a 7210/7220 would (it has both more core and higher clock rate!), and in fact, I’m still rather optimistic that it’s just something I haven’t properly configured, yet (most obvious difference is a very different kernel version than what my 7210 and 7220s use?!)…. so most likely, it’ll eventually do even better than that.

Anyway, I had promised to share even the earliest results, so wanted to drop this short note now rather than wait for the final performance after tuning. And I promise, I’ll write a full blog on how to properly set up and configure those 7250 nodes as soon as I figure out what’s wrong with the default set-up.

Bottom line:

  • Downside: Using version 0.9.0, the phi 7250 performance (at least for now) is about 20% lower than expected
  • Upside: I can confirm at least 2350H/s/node (which will already pay for the node in less than 12months!)… and it can only get better 🙂

Happy mining,

 

 

 

More on building Phi 7220 Mining Rigs…

Wow. What a community response. Unbelievable. Last week I posted on my latest toy, a rig that used eight Xeon Phi 7220 cards in a single 4u-server to achieve a total of roughly 24kH/s for cryptonote coins (in case you missed it, the original post is right below).

I had expected some interest in that merely because that was, to my knowledge, the new speed record for a single-node(!) mining rig…. but man, was I wrong: three thousand unique visitors in the first 48 hours, on a blog that didn’t even exist until two days before. And tons and tons of interesting comments on questions, on both the blog and reddit.

Based on that feedback from the last three days, it’s now become very clear that there’s two follow-ups I’ll have two write (else I’ll drown in emails :-/). The first topic that came up again and again was “Vegas vs Phis” in terms of mining revenue, profitablity, etcpp …. and I promise, I’ll write one – but I’ll first wait for my 4×7250 node to arrive, so bear with me. The second big group of questions that came up again and again revolved around “how can I replicate that build” – i.e., how do the Phis work at all, where can I get them, and if I have them, how can I build a machine that’ll take them. This latter question is what this post will be about.

Mining on Phi 7220 cards…

First off, it is kind of hard to get those 7220 cards – if you don’t already have some, you’ll probably have a hard time to get some. I got mine off ebay, but that seller – at least right now – doesn’t list any any more, so manybe they’re gone. Also, be sure to not confuse the x100 cards (3120, 5100, 7120, etc) with those newer x200 phis – the old ones will be about 5x slower, so think hard if that’s worth it. Quite frankly, if you’re interested in mining with Phis you’ll be best off buying one of the 4×7250 self-bootable machines (I’ll write more on that once mine arrives).

That said, if you already have 7220 cards – or miraculously found a good source for them – you’ll have to find a way of actually hosting them, of getting them to run, and mining on them. In terms of mining software, lukMiner will run on them, and will run rather profitably. To get it to run, though, you’ll need a copy of Intel’s MPSS 4.x software stack to drive those 7220 cards – and since Intel took that product down that isn’t all to easy to get any more. I still have an older copy from back-then-when, but don’t have permission to share that, so you’ll have to find somebody else to share that with you (If anybody that has a copy wants to share, please feel free to send a comment with a link!). Once you get both hardware and MPSS stack up, you copy luk-xmr-phi to the card, and run it; so that part is easy.

Now to the tricky question: How to actually build a rig with these cards. The problem here is that they seem to be working in only certain motherboards, and since they got pulled off the market there’s no support. So all I can do is go and share the three successful builds I managed to create – without any warranty whatsoever that they’ll work on your side.

Building a Rig – Option #1: the 24kH, 8-card, 4U server

The build I used in my original post is a professional, off-the-shelf server from Exxact Corp. I used theirs because I know they had listed this product with Phis in the summer, so had pretty high confidence that would still work. For those interested in more details: It’s a 4U server with two Xeon CPU slots, a C612 chipset (which seems to be important),  10 full-length PCI slots, and then (8+6) PCI power connectors, which works out perfectly. Since I didn’t want to take any risks I bought a complete system from Exxact, with CPUs, memory, disks, 2x1600W power (fully redundant, so 4 PSUs actually), and everything  else except the Phi cards, which I already had. In total – and with shipping – that set me back something like $6k, just as much as the cards themselves.

Of course, I could probably have built that thing from parts for much less (the two un-used redundant PSUs alone are worth several hundred bucks) … but already having $6k in Phi cards on the line I didn’t want to take any risks – and quite frankly, so far I’ve been extremely pleased with this purchase (and Exxact have been most helpful so far, too!). For anybody wanting to get this system, here a pic of both the rig, as well as the exact sticker of that machine (and I’m sure Exxact would be happy to help, too – just mention what it’s for, they’ll remember me :-/).

One final note: The careful reader will have seen that I mentioned ten PCI slots, yet my build uses only eight cards. Yes, I did fit 10 cards in there, but ran into some issues. First of all, I blew some circuits in my basement :-/ … and worse, at home (where I built the rig) I only had 110V power, which wasn’t all that good for the 2x1600W PSUs. And finally, when I did get it to boot the machine became unstable with 10 cards – maybe bec.ause of heat, or maybe because the drivers don’t like that many cards, I don’t know. Eight work, 10 I’m not sure. Might get back to trying, but for now I don’t have any spare cards any more, anyway. Here a pic with 10 cards, but again, right now I only run eight. I’m not crazy. Not really, anyway.

Option #2: A Cheaper, but still professional rig

Since $12k in a single rig is admittedly a somewhat scary thought I also played around with finding cheaper, smaller options. After looking primarily for the same board generation and C612 chipset I ended up with a SuperMicro SYS-5018GR-T server.

Initially this didn’t produce enough air flow to cool the cards, but after a bit of “friendly persuasion” on behalf of the fans (ie, cutting the two control cables of the fans to make them go full blast :-/ – see pic) that worked out of the box, too.

Got the barebone for $1100 off ebay, plus a refurbished Xeon, and a single DIMM of memory …. all together probably around $1400 (plus cards) – not that much off what you’d pay for a typical desktop PC to put GPUs in – but in rackable form-factor, so you can actually farm it out to a co-hosting place.

Option #3: The totally Stone-Soup, Do-it-yourself Build

OK, before I went ahead and bought all these servers and cards for now close on $20k I (obviously?) first did some simpler tests, buying only a single card, and testing that in something I already had. “Luckily” I had lots of un-used workstations lying around that I could test with ….. the reason I say “luckily” in quotes is that the reason I have those in the first place is that they started out as GPU mining rigs, but since “several” of those GPUs have died the mining death over the last few months I now have some unused workstations :-/. (yes, one of the reasons I switched to mining on Phis is that I simply had too many GPUs die on me – in particular a certain brand, but don’t want to offend anybody, so will keep that part to myself).

Anyway – tested many different machines, and most didn’t work. Either they didn’t boot at all, or they booted but didn’t show the cards, or had too old BIOSes (they need above 4GB decoding), etc. Finally found one of my machines that took that card, and for everybody that wants to replicate, here the specs:

  • Motherboard (likely the most important part): Asrock X99 Deluxe
  • CPU: Some X core Xeon E5 bought off ebay – probably won’t matterl
  • A cheapo PCI 1x GPU to drive a monitor (won’t need it for mining, though).
  • Three Xeon Phi 7220 cards (started with one, but then put in two more)
  • EVGA 850GQ (850W) PSU
  • Phanteks Enthoo Pro M case (won’t matter), and a single DIMM of 8GB of RAM.
  • Lots of fans.
  • CentOS 7.3 with MPSS 4 stack and lukMiner 0.8.6.

And la voila, here we are:

final

In terms of cooling, a regular workstation’s case fans won’t be enough. Not by a long shot they won’t. At first, I added this semi-professional Lasko fan:

blower.jpg

This obviously blows air to the inside rather than out of the case, but if you leave the case open that’s OK (and if not, it’s strong enough to make all other case fans go backwards, too, LOL).

Eventually, however, that setup looked a bit shaky even by my standards, so went ahead and scouted for some smaller fans to cool this. Typical case fans won’t do, even if mounted right behind the cards. Eventually, however, I found some higher-powered fans on ebay (see, for example, here for a listing). The blue (printed) shroud doesn’t actually fit the x200s (they have their power connectors organized differently from the x100s), but the case wouldn’t have had enough space to host those shrouds, anyway, so simply took them off – the fans are strong enough to make enough air go through the cards even without a perfect fit. Oh, and of course: Duct tape is your friend :-/ In the following two pics, left one shows two such fans connected with two shashlik skewers and soem duct tape (I didn’t say it was professional grade, did I?); the right one shows those fans mounted right behind the cards – one fan does one card, the other does two.

Again, I used the trick of messing with the fans’ control cables, and simply have them go full blast (image on the left shows the fan connector cable cut open to allow a four-pin fan connector to connect to a two-pin 15V connector – the control wires aren’t, but cut simply not connected, so they go full blast. Right image: That stuff connected to a 15V molex). With that, the machine is now up and running for three weeks, no issues whatsoever (well, had to fix a few issues with hung nicehash connections in the miner, but the hardware works all right).

As mentioned above, I’m not sure that those builds can easily be recreated – for example, I have some other x99 boards that do not work, so I have no clue why this one does Ie, no warranties, and your mileage may vary). Anyway – for those that have some of those cards I hope this info will at least open a path to getting them up and running. As such:

Happy Mining!

My latest toy: 24KH/s in a single box!

My latest toy: One box, eight Xeon Phi 7220 PCI cards, and a total (sustained!) hash rate of about 24kH/s for monero/sumo/etn/etc … Yeehaw!

This box is actually already up and running for over a week, but since I realized that a lot of people might actually like hearing about this I decided to finally share this build … after all, I think this is the highest-performing single-box build (for cryptonight/monero) out there right now….

Okay, where did this start? When I started writing lukMiner earlier this (wait – now “last”) year Intel had just announced some PCI version of the x200 Xeon Phis, the 7220 cards. While googling around on where to get one I found a page from Exxact Corp, which offered a system with 8 such cards for around $20k or so. Got me curious ;-).

Unfortunately, when I tried to place an order I was told that those cards had been un-announced, and would not be released after all…. However; I could never forget this setup, and eventually found some cards on ebay. So after eventually scrounging some of those cards together (and initially having quite some trouble finding some motherboards that they would actually run in#@!!@), I finally got Exxact to sell me a no-cards-included version of that box they had listed earlier this summer. Once it arrived I popped in the cards, installed the matching software stack, and here we are – eight 7220 cards, each doing 2800-2850 H/s (running at about 80-85 degrees C), plus two low-end Xeons in the host …. a bit of linux magic to start it all automatically upon boot, and la-voila – we have a machine that DwarfPool and NiceHash say makes an average of 24kH/s, at a power draw of only about 2.4kW… (And all in all, I paid less than $12k for the parts!).

The pic above is from when I assembled it in my basement… had to actually pull power cords from two different rooms to not pop my circuit breakers, and had to eventually move it to a co-hosting data center due to “somewhat excessive noise” (think “jet engine”) … but still, that was one heck of a fun project!

Happy Mining!

 

 

lukMiner 0.8.6 adds Xeon Phi Offloading Support …

After having promised this for a long time, lukMiner 0.8.6 will finally be the first version that supports automated “offload” mining on the x100 “KNC” Xeon Phis. I.e., it will no longer be necessary to configure the KNCs to have network access, or to manually copy and run native binaries to/on the mic devices … (yay). Instead, this new version adds a new ./luk-xmr-knc-offload binary, and this binary will – auto-magically – use the mics’ MPSS stack to upload the native binaries to the mic, to run the code there, etc … all through a single, host-side binary.

That new “knc-offload” binary will by default automatically use all of the mics found in the system, and will – just like the OpenCL binary – by default also use all host CPU cores…. all of which makes mining with the x100 phis so much easier (also, this no longer requires network configuration for the MPSS stack). As far as I know, this is now the first miner ever that allows for mining monero, aeon, sumo, karbo, etc, via offload on the x100 Xeon Phis. Yay!

I plan on releasing this version pretty soon; the only reason I’ve not done that yet is that a few days ago the harddrive I used for making releases unfortunately died, so I have to spend some time reinstalling … Until then, a preview “release candidate” of this 0.8.6 to come is available through this link (Update 1: Fixed original tar file, which was actually missing that new binary … apologies!) .

Happy mining!

Luk

 

Oh-kay, here it goes…

… after initially distributing all news and updates on lukMiner via twitter, email, and google sites I finally decided to go all-out, and start an actual blog on it.

I’ll primarily use this site to share updates, new features, newly supported coins, updated performance on new or interesting hardware, etc, as well as for little updates on my own experiences with it (because yes, I use it quite a bit myself, too!).

Note at least for now I’ll still host the release binaries on google drive, and will still update the old google site (i.e., https://sites.google.com/site/lukxmrminer/), and will still tweet updates on the lukMiner twitter feed (@lukMiner)… but eventually this site “should” replace those two.

At least for now – happy mining!