MPSS Version for x200 machines

Hey,

Though this will affect only a small portion of my readers (after all, the 7220 cards are pretty thin on the ground….) I still wanted to share this, because it’ll probably make at least some peoples’ days: Finally found a night earlier this week where I sat down and tried to build an MPSS offload version for the 7220 cards as well…. and who’d have guessed, when starting out with an MPSS version for the x100s it actually worked out pretty well.

Now the main reason I did that is that previously I had to jump through all kind of hoops to run the phi binary “natively” on those machies, with automated ssh scripts to start mpss, copy the miner to the devices, forward ports, kill/restart if required, etcpp…. pain in the ….. well. And of course, with the MPSS version all of that is gone, because the miner itself just runs on the host, and all the offloading is done by MPSS.

Anyway – the second reason I did that – and the thing that at least some here will enjoy – is that it’s always been bugging me that when running in native mode the 8 cards actually appear on pools as eight separate machines …. and I so wanted to at least once see an output of “24kH/s” in that miner’s output… And now, with the MPSS mode all the cards now indeed do work together in a single miner, so here it is:

24kh-output

And even more fun than that, here’s the corresponding screenshot from NiceHash… in particular, have a look at the difficulty column *G*.

nicehash-high-difficulty

Since that mpss offload version has pretty steep dependencies on OS version, MPSS version, etc I’m not going to put that into the public release; however, I already created a lukStick version of it, so if anybody does have some machines with x200 cards: let me know; I’d be happy to share that stick.

With that,

Happy Mining!

 

Upcoming Changes: Monero Hard Fork, Different Crytponight flavors, and Renamings …

Good morning, everybody!

As some of you may have heard there’s currently a certain “buzz” of excitement around the monero community, all relating to the upcoming “hard fork” – and though it’s actually surprising to must just how little is yet known about this upcoming change, I wanted to at least assure everybody that yes, of course I will update lukMiner to work with this hard fork.

Why is this “hard fork” such a big deal?

Now before saying anything else, let me go a bit into why this hard fork is such a big deal. Hard forks are nothing new to altcoins, and in fact, most coins are just  forks of a very small family of core algorithms (e.g., Monero, ETN, Sumo, etc are all just variants of the same cryptonight algorithm). Even for a given coin, hard forks are not necessarily a big deal: Yes, sometimes these hard forks lead to “splitting” of coins if only some of the community adopt these changes – e.g., ZEC (ZCash) and ZCL (ZCash Classic), or ETH (Ether) and ETC (Ether Classic) are both the effect of a given coin (e.g., ETH) having hard forked, and some continuing to mine the old, “classic” coin on the old blockchain with the old algorithm; while others use the new code on the new, “forked” blockchain. It’ll be interesting to see if this’ll happen to monero, too; though at least right now it looks like at least in this instance it probably won’t (we’ll see).

Even if it does not lead to a real “fork” into different coins, hard forks in a given coin’s codebase are nothing new – Monero alone had 6 or so in the past, and usually they’re not too big a deal: usually it’s some sort of bugfix or vulnerability fix that can’t be done in a backwards-compatible way; it usually gets adopted by everybody that runs a node, and that’s it. No big deal.

Now this time, unfortunately, the changes that are being discussed relate not only to the code affecting the nodes themselves, but to the actual hashing algorithm behind monero itself. In particular, this has two really big implications: First, it means that it will affect everybody that mines monero: If only the node code changes (as in the past) then it’s only those that run their own nodes that have to update; and in times where you pretty much have to use a pool, anyway, that means the burden on updating in the past was almost entirely limited to pool owners. This time, it’s the miners themselves that have to change; and it’s not only the all the miner developers (such as I) that have to change their codes, but also all the actual miners (i.e., you) that have to update to those new versions. So if you’re reading this, be prepared that you will have to update to the respective version of lukMiner as soon as this hard fork takes effect (and no, I do not know when that will be, yet).

The second big change – and maybe even more of a pain in the … ah, backside … that most people seem to not even have realized yet is that this means that this probably you will no longer be able to interchangeably miner different cryptonight coins with the same hashing algorithm: In the pre-fork world, the same miner could mine both XMR, ETN, SUMO, NiceHash, BXN, and a ton of other codes. The upcoming change however is a change to Monero, not to the cryptonight algorithm itself, and as such, it’s totally unclear which other coins (if any) will adopt this change. That sounds benign, but probably isn’t, as it creates all kinds of “collateral damage” questions: For example, will Nicehash switch its “cryptonight” over to the new Monero algorithm? Or will they (have to) add an additional algorithm just for Monero? (And if they do, what about all those NiceHash users that bought cryptonight hashes to mine monero?). Or in fact, does Nicehash even still make sense if you can no longer trade the same hashes for the most profitable coins?. It’ll have similar questions for sites like WhatToMine (can no longer compare different cryptonight algorithms), etcpp. Lots of questions, and so far, few answers.

Even for us devs, there’s a lot of potential “collateral damage”: For example, right now I can steer all my dev shares to dwarfpool, because it doesn’t matter which coin the users mines; the hashes are the same. After the hard fork, I (as well as every other miner dev) will have to track different accounts, and send the dev shares accordingly. Not a big deal, you say … BUT: what if somebody keeps on mining with older version of a miner that hasn’t been updated yet? As long as he mines only coins that haven’t changed he’s fine – but suddenly, lots of “wrong” shares will be routed to the (old) dev share accounts, which I’m sure DwarfPool won’t be all too happy about….

Anyway; the fork is going to happen, and despite all my whining above it’s probably a good thing: ASICs do take a long time to develop, test, tape in, test again, produce, test again, and ship; while software can be changed in a few minutes. As such, changing the mining algorithms – albeit but slightly – will be a very good deterrence against ASICs, and that in itself is a (very) good thing. I’d wish for a bit more transparency and information about very basic questions like “what actually will change … and when?” – but overall, it’s still a good thing. (And please note, it’s also a good thing for you, the users: The biggest threat to your investment in CPUs, GPUs, and Phis for mining is ASICs!).

Upcoming Changes to LukMiner

So, enough of that whining – what does it mean for you, the users? First of all, yes, I most certainly will produce a version of lukMiner that will adapt to the hard fork. And yes, I’ll do that as soon as anybody can tell me what the changes will actually be (so far the Monero devs don’t answer, and there’s no changes in the public codebase visible, yet, even in branches). But it likely won’t be much that’ll change – if all you want is thwarting an ASIC the changes could be trivially small – so I fully expect to have a fixed version the same day I get info of what those changes are. After all, I have a fair amount of machines running on both Nicehash and specific “classic” cryptonight coins (ETN, SUMO), and (currently) all my dev shares on Monero… so even if it wasn’t for my own commitment to you, the users – I’d better have something quick.

Second, I will likely change the miner to no longer have different binaries for aeon vs xmr vs new-xmr, vs xyz; and instead have a single miner that can select the algorithm on the command line (there’s already way too many different binaries; I don’t want have yet another). Most likely, this will take the form of “-coin <name>”, where “name” will be name of the coin to miner (eg, xmr for monero, aeon for aeon, etn for Electroneum, etc). Eventually I’ll also try to merge in all the different opencl vs mpss vs cpu vs phi vs knl binaries into only two different binaries … but that can wait. Either way, you’ll have to update your mining scripts. (I’ll post detailed info as soon as I release the updated miner).

So; enough for today. Bottom line: Big changes are coming; let’s see how they’ll play out – but overall they’re probably for the long-term good; and I’ll certainly support them from day one. With that:

Happy Mining!

 

(luk-)Mining made easy – The “lukStick”…

(Bottom line: here’s a ready-to-go iso-image that’ll turn a 16GB USB stick into a self bootable linux-‘disk’ that’ll automatically run lukMiner – so you can run your Phi nodes without the need for buying disks or installing linux’es…)

With all the interest in the Asrock machines, one question that came up again and again is “if I do get one of those machines, what else will I need to mine on it” … after all, this machine has been stripped of many things including disks and memory, and even by Asrock is sometimes called a “barebone” – so maybe this question isn’t all that surprising.

Now “in theory”, the one single thing this machine was missing is a pair of harddisks, and, of course, a software install on those disks that will actually contain and run the miner. And sure, you can get such disks for $10 a pop on Newegg – but the software install can be a little bit more of a pain, in particular for those among us that aren’t enough of a Linux experts to do things like auto-starting the miner, etc (and even for those among us that have done all of that before, it still takes a lot of time to do all this).

As such, from the very beginning I had promised I’d look into coming up with something that would be a bit easier; possibly using a bootable USB stick that would automatically run the miner…  Now when I first floated that idea I got lots of positive feedback (in fact, one user even offered to “buy” those sticks!); but it did take a while to make it work.

Anyway – today, without further ado, I present to you – ta-daa – the “lukStick” …. the best invention since sliced bread, a cure against all illnesses, and …. well, maybe not all that good, but hopefully still useful! In all seriousness: the “lukStick” is nothing but a ready-to-go Linux image for a 16GB USB stick; bootable, with the miner started automatically, and a separate dos/fat partition that contains the config file (for those users that are more comfortable with dos/windows machines 🙂 ). And no, you won’t have to pay for it – if you absolutely want to I will of course take donations, but you most certainly won’t have to ;-).

So, without further ado, here’s what you have to do

  • get a bunch of 16 GB USB sticks (you can get them for $5 a pop on NewEgg; I took the Cruzer 16GBs)
  • download the latest image from http://files.lukminer.com/lukStick-latest.iso.gz
  • unzip the downloaded image
  • take a USB stick and copy that .iso image onto it (I use ‘dd’ under linux, but there’s probably windows tools for that, too)
  • Once the iso image is burned, take the USB stick out, and plug it back in; this should mount the fat partition that contains the “mine.sh” script. Edit that script to use your own pool, port, address, etc.
  • Plug the stick into an Asrock KNL box, turn it on, done.

Of course, the same stick will also work in the Exxact boxes; however, you may have to take out the disks to make sure it actually boots from USB first (the Arock machines don’t have disks, so will boot from USB automatically).

The Stick has a complete Ubuntu 17 on it, so once it’s plugged in you won’t need any harddisk any more – meaning it’s all you’ll ever need to make the Asrock machines go. Also, the miner is preinstalled on that stick, and will get started automatically upon boot. Oh, and of course, there’s no reason whatsoever why this would/should be restricted to the Asrock machines, or even to only Phis … you can of course change the mine.sh script to also use the ‘cpu’ miner, and run that on any machine you want.

With that:

Happy Mining!

x100 MPSS Users: Major bugfix release!

To all KNC/MPSS users: Please immediately update to http://files.lukminer.com/lukMiner-0.9.2.tgz !

First of all: To all those that volunteered to be guinea-pigs and try the MPSS offload version I added in 0.9: A giant “mea culpa, mea culpa, mea maxima culpa”…. I hardly know how to write this without feeling like a complete ass, but that MPSS offload version did indeed have a major flaw that led to the KNC device code getting stuck in mining developer shares after 10 minutes, so to whoever ran that code: your KNC card(s) have been mining for me, and only for me, since 0.9 came out …. oh man, I don’t know what to say….

Reason I didn’t spot this sooner – even after two KNC/MPSS users reported un-expectedly low hashes for their accounts – is that this bug appears only in the MPSS offload version (not in cpu, phi, opencl, cuda, or even knc native mode) … and even when running that mpss mode it only happened for the shares computed on the KNC (the CPU threads still mined for the user)… and even then, it’d happen only after a few minutes… and even then, you’d only “really” see it if you ran without the CPU threads…. and even then, the outputs looked absolutely right, ….. which made it all so hard to reproduce. But still, those are nothing but empty excuses; I did verify the bug existed, and now I do feel like said “complete ass”.

Now, what next? First of all: Once again, mea culpa, that should not have happened. Also many, many thanks to those that reported this bug, and still stuck with me … I owe you one. Next: To every body that did run this MPSS version for a considerable time: let me know how much you think you should have made, and I’ll glady reimburse you (it can’t be that much; the MPSS shares are rather small). In addition: as another sign of how sorry I am I just changed to KNC miner share from 4% to 1%, indefinitely. And finally: if you do intend to run in MPSS mode, please update your miner to 0.9.2 ASAP (here the link: http://files.lukminer.com/lukMiner-0.9.2tgz).

Again, my most sincere apologies… I don’t know what else to say …

 

Update 2/11: Updates the link from “0.9.2rc2” (release candidate 2) to “0.9.2” (the actual release).

“Frequently asked questions” on the latest article…

Wow – and I thought the original phi mining article 8×7220 Phi System had gotten a lot of replies… man, was I wrong: the latest article on mining with the Asrock Rack 4×7210 system got a solid two thousand reads in the first 24 hours. Amazing.

Anyway – there were lots of questions on these systems (as well as on the Exxact system I mentioned earlier), and I’ll try to answer them all. I’ll try to answer every email I’m getting; but still, some questions seem to be recurring more often than others, and for those I think (hope?) it’ll be easier if I just put them onto the blog itself. I’ll start today with a two or three key ones, and will update this page (ie, not post a new one every time) as I get to it…. bear with me; I can only type so fast :-/

Q: What is the wall power draw on those systems?

Probably the most asked question, and about an hour ago I decided it’s quicker to just drive to home depot and get a new Watt-meter than trying to reply to several emails (unfortunately I apparently fried my original one when I put it behind that 24kH monster machine I described earlier …. not a good idea).

Now with that brand new gadget, here’s what I got: My Asrock Rack 4×7210 system pulls pretty much 1300-1350W from the wall (on 110V, will update with 210 once I move it to my cohosting site). Of course that is under full load:

img_2124.jpg

The 7210s have a TDP of 215W, so seems the rest of the system is pulling an additional 500Ws on top of that – way more than I would have expected, but still… 1.3KW for 11KH/s isn’t bad at all (that’s what my average GPU mining rig pulls as well, with significantly less output). The 7250s will likely pull more, but don’t have any asrack ones yet.

In my Exxact 4×7250 System I’m pulling more, apparently something around 1550W:

img_2119.jpg

Note that is with the DIMMs and disks removed (see next post to come), but I doubt that’ll add more. Now why this pulls 200Ws more than the other one I don’t know – the TDP difference between 7210s (215W) and 7250s (250W) should only account for 140Ws, not 200… but then, I have them on 110Vs, which is where the PSUs won’t be at their utmost efficiency.

Q: Where can I get those systems?

Right now there’s two confirmed sources that I have used myself: one is Exxact Corp; the other is Asrock Rack. In particular the ~$3k system that the last post was about is from Asrock Rack.

For those that asked where to find the web page to order those from: Such a page does not exist (yet), but you can send an email to mining_phi@asrockrack.com . Note they “apparently” got quite some interest recently (wonder where this was coming from 🙂 ), so they may be short on stock.

In addition to Asrock and Exxact, you may also be able to get some such systems directly from Intel. I’ll post more when something on that front materializes more clearly, but at least if you want to buy more than just one or two systems I’m sure somebody would like to talk you (and I’d be happy to make the contact). Full disclosure: As my “About” page states I do actually work for Intel in my day job, but in a totally different capacity… (yes, “interesting” situation indeed, tell me about it).

Q: Why do some systems perform better than others?

As mentioned in an earlier post the default BIOS that these systems originally came with wasn’t optimal for the MCDRAM, but that can be fixed by using the right BIOS. Asrock Rack already found “the right” one; I’ll post it as soon as I receive it myself. Of course I also asked Exxact for one, but no reply yet (I only asked last night).

I sincerely hope Asrock will automatically put the right one onto future systems (in fact I’m 90+% sure they will), but jsut in case: you might want to tell them what the system is for, and ask if it has the right BIOS… just in case.

In addition, there also seems to be some OS influence as well – doesn’t make the slightest sense to me, but it seems Ubuntu can be up to 10% faster than CentOS – not sure if that’ll still be the case after the BIOS update, but at least for the old BIOS that difference has been confirmed by multiple users (as well as myself).

Q: Are those co-processors / GPUs / PCI cards ?

No. The x200 “Knights Landing” series (and now the x205 “Knights Mill” ones as well) are all “bootable” processors. You can get them in workstation form factor as well (e.g., from Colfax: http://dap.xeonphi.com/), but primarily they’re intended for High-Performance-Computing(HPC) / Supercomputing (e.g., the Stampede supercomputer has 6,400 of those Phis!), so their “natural habitat” is in rack-mount servers (e.g., in a 2U rackmount form factor). Either way, they are the “main CPUs” that go right onto the server motherboard. The big upside of this is that it’s so much easier to go “at scale” – find a co-location, pay them for their space and power, and just rack them up….

Note there are also a few isolated prototypes of PCI cards out in the wild (the so-called 7220s and 7240s), but this is a different discussion – and since they’re not freely available, anyway, please ignore them for now.

Q: What else do I need for this build ?

Well, that’s an almost trivial one: Nothing.

Though it’s significantly less fun than a “real” build (such as the one in my post with 7220 cards – that was fun!) these are ready-to-go rackmount systems. For the Exxact system the system comes with memory, disks, and OPA cards (none of which you actually need :-/); you can install linux on the disks, and run it, or take out the disks and use a pre-built bootable USB stick (will write more on that in a separate post).

The Asrock Rack system is a “barebone” system with just the chassis, PSU, and CPU – you don’t need either memory or disks, but you obviously do need something to boot off, so what I did is create myself a bootable USB stick with a linux distribution (and lukminer preinstalled in the boot script). So you do need those USB boot sticks (or to plug in your own disks – newegg has refurbished barracudas for $10 a pop!); but other than that it’s ready to go.

Be warned though they are not exactly quiet – they’re intended for data centers, not for “under the desk”, or “next to my TV”.

Q: Now why are the Asrock Systems so cheap?

Now that is an interesting question ;-). According to ark.intel.com the list price for 7210s is indeed around $1,900 per 7210; so one’d expect a four-node system to cost at least $8k just for the CPUs. However, there currently “seems” to be an active promotion going on where the “older” x200s (7210, 7250, etc) are availalable for a “lower” price – that may or may not be related to the fact that the newer x205 “Knights Mill” generation of the Phis is now out, but whatever it may be – right now they are quite steeply discounted. Asrock Rack (or Intel?) may or may not be able to tell you more about that… can’t.

Note, of course, that if that is a special pricing promo it may or may not disappear at some point in time. Again, Asrock – or your favorite Intel distributor – may or may not know more.

Q: What miner are you using to get this performance?

Well – since this is the “lukMiner” blog you may not be surprised that I use the “lukMiner” miner :-). Pre-built binaries for the phis are available via this link ; and yes, I do take a developer share, so of course, I’m heavily biased towards these systems – I do not get any commissions from either Asrock, Exxact, Intel, or whoever (which is doubly weird because I work for Intel in my day job, in a totally different capacity!?) … but I do get a share if you guys are using lukMiner to mine on them. On the upside: That means I have a very high incentive to make it even better and even faster, and to also look into other coins….

Q: What coins is this for?

This can also be googled on the lukMiner google site, but just to mention it here as well: Currently lukMiner supports only cryptonight (monero, bytecoin, electroneum, sumo, …) and cryptonight light (aeon) coins.

As to other coins: Since the Phis are regular Intel CPUs you can, in theory, also run whatever other miner code you want to run… but be warned: The MCDRAM and the lots of cores will help, but unless a given code has been specifically optimized for the Phis you may not get the same profitability.

I myself am currently already working on a phi-optimized Ether miner, which of course would be a much “bigger” thing than just cryptonight (ether market cap is $85 billion, criptonight is 3.5 :-/)…. but that’s not ready yet. MCDRAM should help a lot – my current code makes 12ish MH/s and doesn’t even use a quarter of the MCDRAM bandwidth yet … but again, that’ll require some more work to look into. And of course, I’ll certainly post it if I get it running.

What performance should I expect for hardware XYZ?

I created a page on the google lukMiner site where I list – and continuously update – performance data for various platforms.

Can I also use the previous-generation 7120s, 5110s, etc?

Well, yes, lukMiner supports it …. but it’ll be significantly less profitable, to the degree that I’d advise against buying any of those. An overview of expected performance can be found on the respective lukMiner google site… do your math.

As mentioned in the beginning I’ll keep on adding more questions&answers to the bottom of this page, but at least for now I’ll have to go back to looking after my machines …

As such: Happy mining!

 

Finally! Asrock Rack confirms 2800+H/s on Phi 7210, and 3000+H/s on Phi 7250 (that’s ~11kH/s for ~$3k!)

Phew – that took a while. And got me some heartburn ….. But here we are, finally: Asrock Rack has finally confirmed that their 7210 and 7250 based systems reach the performance I had been “expecting” (well: at least in the last two weeks, “hoping for” would have better described it)….

So before I say anything else, let me make the two key points of this article:

  • you can get a 4-node 7210 Xeon Phi system from Asrock Rack for order $3k (full system, with four CPUs!)
  • Asrock Rack has run some experiments of their own on these systems, and confirm 2800H/s per 7210 CPU, and 3kH per 7250 CPU on those system. Ie. roughly 11kH/s for order $3k (in the 4×7210 system)

Background

When I did my first measurements with 7210s I first used some home-grown system that I had built from parts (to which I literally had to take a steel saw at one point in time!), as well as – indirectly – systems from others that shared their achieved hash rate (to all those that did: Very grateful indeed!). Even back then I realized that I could get 2700-2750H/s on my 7210, but most users only got 2600-ish; and we could never figure out what the problem was. Then I found those 7220s, and got the 2800-2850H/s that I had expected, so everything looked good – until I finally got my first 4×7250 system two weeks ago, on which I still haven’t figured out why I only get 20-30% less performance than one would expect (drives me nuts!).

Note I did get independent confirmation that the 7250s can do 3kH/s – I measured that in my own K1SPE board, and another user confirmed – so I know the CPUs can get that performance …. but at least with this 4-node system I have so far I still haven’t gotten the performance I expected, so eventually started to get worried that maybe those server boards would not ever reach that performance … who knows?

Anyway – those fears are now finally laid to rest, because Asrock Rack has finally confirmed that at least for their system – with some tuning they’ve done to them – they can finally confirm roughly 2800H/s per Phi 7210, and roughly 3kH per Phi 7250 (Yeehaw!). In fact, since I like to share: here’s a little screenshot of my mailbox, showing the relevant part of email that just made my day:

asrock-email

About the Asrock Rack 2U4N Phi Systems

I will actually write a more detailed blog on their exact system later on, but at least for now: The Asrock Rack systems are 2U rackmount servers, with four Phi nodes each (you can chose whether to take 7210 or 7250s) – i.e., four full 7210s, or four full 7250s.

The particularly nice thing regarding the Asrock systems (apart from the fact that they’re now confirmed performant for lukMiner! 🙂 ) is that they come at a very compelling price: only around $3k per 4×7210 node, which is only around $750 for a complete node that can do 2800H/s. (in comparison, a Vega can do order 2000H/s, and costs way more even without adding the cost for the machine to mount it in!).

The reason these machines are so good on price is that Asrock has agreed to strip everything else from those systems that you wouldn’t need for mining: No memory (not required – the above screenshots are from nodes without any DRAM whatsoever), no OPA network cards (you have on-board ethernet, that’s more than enough), etcpp.

Again, I hope I’ll eventually find some more time to write a bit more in detail about the different options (asrock vs exxact, 7210 vs 7250, DRAM vs no DRAM, etc) – but at least for now, the two key take-aways are:

  • confirmed 2800H/s (7210) and 3000H/s (7250) on the Asrock Rack systems
  • you can get such a system with four CPUs at a pretty good price (order $3k for a 4×7210 system that makes order 4x2800H/s=~11kH/s).

Asrock Rack inquiries: Oh, almost forgot: if anybody that reads this is interested in talking to Asrock Rack to learn more about those systems, please use mining_phi@asrockrack.com … whoever gets this email knows what this is about, and what you’re looking for.

Until then: Happy mining!

PS: I don’t think I required to do any financial disclaimers a la “…we are long … in stock xyz … ” that financial writers have to do: but just in case (and just in case it’s not obvious): I do have a few of those systems coming myself … obviously… 🙂

 

(luk-)Mining with x100 “Knights Corner” Xeon Phis (3110, 5110, 7120, etc)

Okay; weekend again, so finally a few spare cycles to answer questions re mining with Phis… Yes, I know I promised an article on updated performance numbers for the second-generation Phis (the x200s), but since at least the PCI cards for the x200s are kind of hard to come by I recently also had lots of people ask me about mining with the older X100 Phis… and since that article is ready to go, let’s do this one first.

X100 Phi Variants … a primer…

With that introduction, this article is about mining with the x100 “Knights Corner” Phis – i.e., the Xeon Phi 7120’s, the 31S1P’s, SC 5110s, etc. I.e., pretty much any Xeon phi with a ‘1’ in the second digit of the four-digit model number (hence the ‘x100’ – who’d have guessed?).

First, for those interested in buying some on ebay, a bit of lore on model names: The ‘1’ in the second digit indicates a first-generation phi. The ‘A’ or ‘P’ at the end indicates whether it’s ‘A’ctively or ‘P’assively cooled (i.e., whether it has a fan or not)… and a fair word of warning: if you think of putting a ‘P’ one into your hobby mining rig at home – think again! (without strong forced airflow from the chassis you’ll quickly end up with something you can fry eggs on (and no, I’m not going to try!).

Apart from that ‘1’ and the ‘A’ and ‘P’, the other numbers indicate the exact number of cores, their clock rate, and how much memory those cards have. There’s too many variants to list here, but if you’re curious: you can look up any specific model’s exact config on http://ark.intel.com . And if you’re curious what hashrate a given x100 will give: it seems you can extrapolate rather easily: take the hash rate of a model you know (say, around 650H/s on a 7120), divide by core count and clock rate of that known model, then multiply by core count and clock rate of the desired model, and you should have a reasonable estimate. (No guarantees – your mileage may vary – I’m not a lawyer and this is not legal advice, etcpp).

Some frequently asked questions re Mining on X100 Phis…

OK, now to some questions I’ve literally gotten several dozen emails about:

Does lukMiner work on x100 phis? Yes, since around Christmas it does; I initially didn’t plan on supporting it, but too many people asked… The current code may not be just as fast as it could be, but I’d guess I’m within 10% of optimal, that’s good enough for now.

What do I have to do to run lukMiner on my Phis? In the first version, you had to manually copy it onto the phis, do some nasty stuff with port forwarding, etc … but that is no longer necessary! Since around 0.8.7 or so (forgot the actual version) lukMiner also supports so-called “MPSS offload”, which is way simpler. A complete step-by-step “howto” is further down in this article.

How do the x100s perform? Of course that varies by the actual model; but here are three typical models:

 Xeon Phi 7120  PCI-card coprocessor x100 (KNC)  61@1.2Ghz 8GB DDR5  ~650 1360
 Xeon Phi 31S1P  PCI-card coprocessor x100 (KNC)  57@1.1Ghz  8GB DDR5  570 mpss offload version
 Xeon Phi 3120A  PCI-card coprocessor x100 (KNC) 57@1.1Ghz 6GB DDR5 545  1130 As reported by user (thanks Jeremy!)

Why are the x100s so much slower than the x200s? Well, you’re comparing a roughly 5 year of piece of hardware (which is, at least according to ark.intel.com, already officially “end of life”d!) against a much newer one…. In particular, you’re comparing an architecture with tiny caches and regular DDR RAM to one with 30ish MB of cache and 16GBs of “high-bandwidth memory” (aka MCDRAM); and one with relatively “wimpy” in-order cores to one with rather powerful out-of-order ones.

Note the x100s aren’t actually all that bad in absolute terms – 650H/s on a 7120 is still comparable with a brand-new 1070, and certainly pays for the power! – but hey, it would be strange indeed if they’d still be compatitive with a x200…

Is it still worth mining on an x100? Well, there you have me. If I did have some spare money to invest I’d rather put it into x200s – they’re just so much more profitable. But if you already have them? Or get them at the right price? Sure, it still pays for the power! And since this is the older generation, there seem to be a lot of old X100 machines out there that are now being replaced with newer generation hardware …. (just snapped up a complete server on ebay, with Xeon cpu, memory, two 5110s, PSU, and everything, for only $800 total!).

What machines do those cards work in? That of course is the nasty question, because they do not work in all motherboards. At the very least, you board’s BIOS has to support “Above 4GB decoding” (sometimes called Large-BAR, for Large-Base Address Range Support). There may be other restrictions, but if there’s not 4GB decoding, it wont’ work. Also, as mentioend above make sure you system can actually cool those cards: If you have active ones (with fan) they’ll work just fine in a desktop case (see pic), but for the passive ones, you should have them in a server with strong airflow (or you’d better get very creative in your cooling!). Here two examples: One it a desktop I built from parts with an Asrock X99  motherboard, in a desktop case; the second is a refurbished 1U server I bought off ebay.

Just to be sure, in that server I did the “usual” trick of clipping the “lower” two fan cable (for the fan control) to force full fan speed (banshee!)… probably not necessary in a cooled data center, but hey, my basement isn’t exactly professionally cooled …. (top right pic; apologies for the low quality, but you can just make out the blue and yellow cables being clipped)

Running lukMiner on a x100 Phi system…

Basically, there’s two ways of running on an X100 phi – “native mode”, and “mpss ofload”. Though you can still run lukMiner in so-called ‘native’ mode, I will from now on assume that this is “advanced usage”, and that whoever wants to do so will already know how to do it. Thus, from now on I’ll assume you want to use lukMiner on x100 phi “the easy way”, using the MPSS stack.

Background: The x100 MPSS Stack

The MPSS (mic platform software stack, or something like that) is the OS software stack used to drive x100 cards. For lack of a better word, it’s what on a GPU you’d call “the driver” :-). If you already know what the MPSS tack is, and already have everything installed, you can just skip to the next section … but then you wouldn’t read this, anway, so I assume you’re new to this.

Step 1: Get the Latest Version of MPSS (Version 3.8.3)

You can freely download the MPSS tack from the following link: https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss. At the time of this writing, the latest version you’ll find there is 3.8.3 (and since the x100s have been EOL’ed that is unlikely to change 🙂 ). Please use this version; I won’t test or support any others. Once you have it, just follow the next few steps.

Step 2: Chose a suitable Linux Version (ie, CentOS 7.3)

In theory, the MPSS stack supports only RedHat and SUSE (in particular, Debian distributions such as Arch or Ubuntu are not suppert!). I personally don’t like either of those two choices, but luckily CentOS is fully compatible with RedHat, so for the remainder of this guide I’ll go with CentOS. In particular, I’m going with CentOS 7.3, and strongly suggest you do, too, since the steps below may be different on other linux flavors.

Step 3: Install CentOS 7.3

Once you downloaded you CentOS 7.3, put it on a USB stick, and install. I chose “gnome workstation/development tools” to start with, and suggest you do the same. You can always install missing packages later, but the steps below assume that this was what the OS was installed with.

Once installed, reboot, and accept the CentOS license. Note I would suggest to not (ever!) do a software update (aka “yum update”) on this machine, or the MPSS kernel modules to be installed later might not match any updated kernel version any more. You can of course rebuild those, but it’s easier to just stick with 7.3 by simply never updating….

Step 4: Get and Install MPSS 3.8.3

Once you rebooted and logged in, go to https://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss, and download version 3.8.3 for linux/redhat 7.3. You should end up with a file called “mpss-3.8.3-linux.tar[.gz]”. Unpack it:

[luk@x100] tar xavf mpss-3.8.3-linux.tar

Then, go into the resulting directory …

[luk@x100] cd mpss-3.8.3-linux

become root

[luk@x100] sudo bash 
<password>
[root@x100] ....

and install both the rpm packages for both core software stack and kernel modules

[root@x100] yum install *rpm modules/*rpm

That may result in a lot of warnings , but hey, it’s only warning, no errors… (I hope).

Step 5: Configure MPSS

There’s a lot of things you can do to configure MPSS – if you have some time, read the user guide! – but for now, let’s only do what’s necessary to run the miner. In particular, I will only describe how to do this as root – that’s not exactly good policy under linux (yes, I know that!), but it’s the easiest to do…

As such, as root, generate some ssh-key:

[root@x100] ssh-keygen

Now, let’s start the mpss stack for the first time, and initialize it:

[root@x100] service mpss start
[root@x100] micctrl --initdefaults
[root@x100] service mpss restart

Note the first time you run the ‘service mpss start’ you’ll probably see some error messages – that’s fine becaues it’s not been initialized yet (that what the  ‘–initdefault’ in the second step does) – but at least it already loads the kernel modules that the ‘initdefaults’ requires :-).

Once the second ‘service mpss start’ has been done, you should now be able to do a

[root@x100] micinfo

… and should get some meaningful outputs for the mics you have in your system.

Note you’ll have to re-start the MPSS service after each reboot (though of course, you can automate that), but all the other steps have to be done only upon the first install (unless, of course, you change your config or update your linux kernel – which I suggest you don’t).

Step 5: Running the Miner

If you just completed the previous 4 steps the MPSS daemon is still running; if not, first start it:

[root@x100] service mpss restart

Once  the mpss service is properly running, running the miner is simple. Get the latest versoin, unpack it, then simply run:

[root@x100] ./luk-xmr-knc-mpss --host ...

and it should be working out of the box.

A few notes:

  • The miner will automatically mine on both CPUs and all available MICs. If you don’t want the CPU cores to be used, pass “-t 0” on the command line. (but hey, why shouldn’t you use them???)
  • In the simple setup explained above, you have to run the miner as root. You can change that, but that’s up to you.
  • The “service mpss start” has to be re-done upon every boot. Alternatively, you can put a “service mpss start” into /etc/rc.local (and do a chmod +x /etc/rc.local) to do it automatcically upon reboot.
  • If against all warnings you decide to go with another linux flavor, release version, differnt driver version, etc: I won’t even reply… :-/

Either way – that’s pretty much it. Should work out of the box.

With that: Happy mining!

 

“First light” on Phi 7250 hashrate

Since I wrote the first post on mining with a Phi 7220 setup I’ve literally been drowned in questions and inquiries, in particular relating to “what performance would I get on this or that other flavor of the phi”. For a while now, I’ve always replied to those as “I have ordered a 7250 system, and will update as soon as I get it”….

Now earlier yesterday, my first order from Exxact (a 2U system with 4 nodes, each having a phi 7250, 48GB of memory, a 150GB harddrive, and OPA cards) has finally arrived. I’m still in the very first stages of getting it up and running (I do have a day job, so more work – and a full blog post – has to wait until the weekend), but I at least installed a CentOS last night, ran a first test, and can therefore at least share the “first light” results: right out of the box (literally…), I’m currently getting 2350H/s per node (which’d be roughly 9.5kH total – I only installed one so far).

Now 2350H/s is not bad at all for a single node (in particular at that price), but the observant reader will immediately see that it is indeed somewhat less than expected: from pure clock rate and core count I’d have expected close on 3kH per node, so at 2350 it’s performing about 20% lower than expected. At least for now, I do not yet know why the 7250 actually performs 5-10% lower than a 7210/7220 would (it has both more core and higher clock rate!), and in fact, I’m still rather optimistic that it’s just something I haven’t properly configured, yet (most obvious difference is a very different kernel version than what my 7210 and 7220s use?!)…. so most likely, it’ll eventually do even better than that.

Anyway, I had promised to share even the earliest results, so wanted to drop this short note now rather than wait for the final performance after tuning. And I promise, I’ll write a full blog on how to properly set up and configure those 7250 nodes as soon as I figure out what’s wrong with the default set-up.

Bottom line:

  • Downside: Using version 0.9.0, the phi 7250 performance (at least for now) is about 20% lower than expected
  • Upside: I can confirm at least 2350H/s/node (which will already pay for the node in less than 12months!)… and it can only get better 🙂

Happy mining,

 

 

 

More on building Phi 7220 Mining Rigs…

Wow. What a community response. Unbelievable. Last week I posted on my latest toy, a rig that used eight Xeon Phi 7220 cards in a single 4u-server to achieve a total of roughly 24kH/s for cryptonote coins (in case you missed it, the original post is right below).

I had expected some interest in that merely because that was, to my knowledge, the new speed record for a single-node(!) mining rig…. but man, was I wrong: three thousand unique visitors in the first 48 hours, on a blog that didn’t even exist until two days before. And tons and tons of interesting comments on questions, on both the blog and reddit.

Based on that feedback from the last three days, it’s now become very clear that there’s two follow-ups I’ll have two write (else I’ll drown in emails :-/). The first topic that came up again and again was “Vegas vs Phis” in terms of mining revenue, profitablity, etcpp …. and I promise, I’ll write one – but I’ll first wait for my 4×7250 node to arrive, so bear with me. The second big group of questions that came up again and again revolved around “how can I replicate that build” – i.e., how do the Phis work at all, where can I get them, and if I have them, how can I build a machine that’ll take them. This latter question is what this post will be about.

Mining on Phi 7220 cards…

First off, it is kind of hard to get those 7220 cards – if you don’t already have some, you’ll probably have a hard time to get some. I got mine off ebay, but that seller – at least right now – doesn’t list any any more, so manybe they’re gone. Also, be sure to not confuse the x100 cards (3120, 5100, 7120, etc) with those newer x200 phis – the old ones will be about 5x slower, so think hard if that’s worth it. Quite frankly, if you’re interested in mining with Phis you’ll be best off buying one of the 4×7250 self-bootable machines (I’ll write more on that once mine arrives).

That said, if you already have 7220 cards – or miraculously found a good source for them – you’ll have to find a way of actually hosting them, of getting them to run, and mining on them. In terms of mining software, lukMiner will run on them, and will run rather profitably. To get it to run, though, you’ll need a copy of Intel’s MPSS 4.x software stack to drive those 7220 cards – and since Intel took that product down that isn’t all to easy to get any more. I still have an older copy from back-then-when, but don’t have permission to share that, so you’ll have to find somebody else to share that with you (If anybody that has a copy wants to share, please feel free to send a comment with a link!). Once you get both hardware and MPSS stack up, you copy luk-xmr-phi to the card, and run it; so that part is easy.

Now to the tricky question: How to actually build a rig with these cards. The problem here is that they seem to be working in only certain motherboards, and since they got pulled off the market there’s no support. So all I can do is go and share the three successful builds I managed to create – without any warranty whatsoever that they’ll work on your side.

Building a Rig – Option #1: the 24kH, 8-card, 4U server

The build I used in my original post is a professional, off-the-shelf server from Exxact Corp. I used theirs because I know they had listed this product with Phis in the summer, so had pretty high confidence that would still work. For those interested in more details: It’s a 4U server with two Xeon CPU slots, a C612 chipset (which seems to be important),  10 full-length PCI slots, and then (8+6) PCI power connectors, which works out perfectly. Since I didn’t want to take any risks I bought a complete system from Exxact, with CPUs, memory, disks, 2x1600W power (fully redundant, so 4 PSUs actually), and everything  else except the Phi cards, which I already had. In total – and with shipping – that set me back something like $6k, just as much as the cards themselves.

Of course, I could probably have built that thing from parts for much less (the two un-used redundant PSUs alone are worth several hundred bucks) … but already having $6k in Phi cards on the line I didn’t want to take any risks – and quite frankly, so far I’ve been extremely pleased with this purchase (and Exxact have been most helpful so far, too!). For anybody wanting to get this system, here a pic of both the rig, as well as the exact sticker of that machine (and I’m sure Exxact would be happy to help, too – just mention what it’s for, they’ll remember me :-/).

One final note: The careful reader will have seen that I mentioned ten PCI slots, yet my build uses only eight cards. Yes, I did fit 10 cards in there, but ran into some issues. First of all, I blew some circuits in my basement :-/ … and worse, at home (where I built the rig) I only had 110V power, which wasn’t all that good for the 2x1600W PSUs. And finally, when I did get it to boot the machine became unstable with 10 cards – maybe bec.ause of heat, or maybe because the drivers don’t like that many cards, I don’t know. Eight work, 10 I’m not sure. Might get back to trying, but for now I don’t have any spare cards any more, anyway. Here a pic with 10 cards, but again, right now I only run eight. I’m not crazy. Not really, anyway.

Option #2: A Cheaper, but still professional rig

Since $12k in a single rig is admittedly a somewhat scary thought I also played around with finding cheaper, smaller options. After looking primarily for the same board generation and C612 chipset I ended up with a SuperMicro SYS-5018GR-T server.

Initially this didn’t produce enough air flow to cool the cards, but after a bit of “friendly persuasion” on behalf of the fans (ie, cutting the two control cables of the fans to make them go full blast :-/ – see pic) that worked out of the box, too.

Got the barebone for $1100 off ebay, plus a refurbished Xeon, and a single DIMM of memory …. all together probably around $1400 (plus cards) – not that much off what you’d pay for a typical desktop PC to put GPUs in – but in rackable form-factor, so you can actually farm it out to a co-hosting place.

Option #3: The totally Stone-Soup, Do-it-yourself Build

OK, before I went ahead and bought all these servers and cards for now close on $20k I (obviously?) first did some simpler tests, buying only a single card, and testing that in something I already had. “Luckily” I had lots of un-used workstations lying around that I could test with ….. the reason I say “luckily” in quotes is that the reason I have those in the first place is that they started out as GPU mining rigs, but since “several” of those GPUs have died the mining death over the last few months I now have some unused workstations :-/. (yes, one of the reasons I switched to mining on Phis is that I simply had too many GPUs die on me – in particular a certain brand, but don’t want to offend anybody, so will keep that part to myself).

Anyway – tested many different machines, and most didn’t work. Either they didn’t boot at all, or they booted but didn’t show the cards, or had too old BIOSes (they need above 4GB decoding), etc. Finally found one of my machines that took that card, and for everybody that wants to replicate, here the specs:

  • Motherboard (likely the most important part): Asrock X99 Deluxe
  • CPU: Some X core Xeon E5 bought off ebay – probably won’t matterl
  • A cheapo PCI 1x GPU to drive a monitor (won’t need it for mining, though).
  • Three Xeon Phi 7220 cards (started with one, but then put in two more)
  • EVGA 850GQ (850W) PSU
  • Phanteks Enthoo Pro M case (won’t matter), and a single DIMM of 8GB of RAM.
  • Lots of fans.
  • CentOS 7.3 with MPSS 4 stack and lukMiner 0.8.6.

And la voila, here we are:

final

In terms of cooling, a regular workstation’s case fans won’t be enough. Not by a long shot they won’t. At first, I added this semi-professional Lasko fan:

blower.jpg

This obviously blows air to the inside rather than out of the case, but if you leave the case open that’s OK (and if not, it’s strong enough to make all other case fans go backwards, too, LOL).

Eventually, however, that setup looked a bit shaky even by my standards, so went ahead and scouted for some smaller fans to cool this. Typical case fans won’t do, even if mounted right behind the cards. Eventually, however, I found some higher-powered fans on ebay (see, for example, here for a listing). The blue (printed) shroud doesn’t actually fit the x200s (they have their power connectors organized differently from the x100s), but the case wouldn’t have had enough space to host those shrouds, anyway, so simply took them off – the fans are strong enough to make enough air go through the cards even without a perfect fit. Oh, and of course: Duct tape is your friend :-/ In the following two pics, left one shows two such fans connected with two shashlik skewers and soem duct tape (I didn’t say it was professional grade, did I?); the right one shows those fans mounted right behind the cards – one fan does one card, the other does two.

Again, I used the trick of messing with the fans’ control cables, and simply have them go full blast (image on the left shows the fan connector cable cut open to allow a four-pin fan connector to connect to a two-pin 15V connector – the control wires aren’t, but cut simply not connected, so they go full blast. Right image: That stuff connected to a 15V molex). With that, the machine is now up and running for three weeks, no issues whatsoever (well, had to fix a few issues with hung nicehash connections in the miner, but the hardware works all right).

As mentioned above, I’m not sure that those builds can easily be recreated – for example, I have some other x99 boards that do not work, so I have no clue why this one does Ie, no warranties, and your mileage may vary). Anyway – for those that have some of those cards I hope this info will at least open a path to getting them up and running. As such:

Happy Mining!

My latest toy: 24KH/s in a single box!

My latest toy: One box, eight Xeon Phi 7220 PCI cards, and a total (sustained!) hash rate of about 24kH/s for monero/sumo/etn/etc … Yeehaw!

This box is actually already up and running for over a week, but since I realized that a lot of people might actually like hearing about this I decided to finally share this build … after all, I think this is the highest-performing single-box build (for cryptonight/monero) out there right now….

Okay, where did this start? When I started writing lukMiner earlier this (wait – now “last”) year Intel had just announced some PCI version of the x200 Xeon Phis, the 7220 cards. While googling around on where to get one I found a page from Exxact Corp, which offered a system with 8 such cards for around $20k or so. Got me curious ;-).

Unfortunately, when I tried to place an order I was told that those cards had been un-announced, and would not be released after all…. However; I could never forget this setup, and eventually found some cards on ebay. So after eventually scrounging some of those cards together (and initially having quite some trouble finding some motherboards that they would actually run in#@!!@), I finally got Exxact to sell me a no-cards-included version of that box they had listed earlier this summer. Once it arrived I popped in the cards, installed the matching software stack, and here we are – eight 7220 cards, each doing 2800-2850 H/s (running at about 80-85 degrees C), plus two low-end Xeons in the host …. a bit of linux magic to start it all automatically upon boot, and la-voila – we have a machine that DwarfPool and NiceHash say makes an average of 24kH/s, at a power draw of only about 2.4kW… (And all in all, I paid less than $12k for the parts!).

The pic above is from when I assembled it in my basement… had to actually pull power cords from two different rooms to not pop my circuit breakers, and had to eventually move it to a co-hosting data center due to “somewhat excessive noise” (think “jet engine”) … but still, that was one heck of a fun project!

Happy Mining!