Wheels within wheels… or better: Phis within Phis?

Another weekend, another article – this time about a special from-scratch “build” for a cute little desktop machine that contains a bootable Xeon Phi 7250 as main processor, and – to top it off – also some Phi 7220A KNL cards… so “wheel within wheels”, indeed!

This article is actually kind-of long overdue – part of it starts as early as early 2017(!), and even when it comes to writing I had initially planned an article for as early as January (as the time stamp on some of the photos indicated), but then forgot about it …. though truth be told, the second half of the article about adding the cards (which likely is being the more fun one) didn’t happen until more recently, anyway. So, better late than never.

Now before I write anything else, let me be clear that at least for this article it’s not exactly clear how “practical” it will be for the average reader. There was a ton of work in that one, most of the pieces are insanely hard to get, and in combination I don’t think that was the most profitable use of my time …. but it’s also been a huge amount of fun, so I’m not ruing it, either.

How did this all start …

The base motivation for this particular build was exactly the same as what several others have since experienced: I found some cheap KNL “ES/engineering sample” processors on Ebay (7250s, I think), thought that this would be a very cheap and easy way of building my own Phi box, and just bought them. After all, $300 for a $2k list price processor? And all one needs is a motherboard, plug it in, and done, right? How hard could this be? In particular, that was in early 2017, before all the pricing specials came out (and when coin prices where sky high), so this sounded like a really good deal.

Now once I had these Phis the “get a motherboard and plug it in” turned out to be a little bit harder than I had thought – back then, there were only two boards available that could take those chips: The “Adams Pass” server boards for the 2u4n Adams Pass servers, and the Supermicro “K1SPE” desktop boards that are also in those Supermicro SYS5031 KNL Desktop systems (and in the Colfax “Ninja” DAP systems – which were super-cool, by the way). Of those, the AP systems were near impossible to get in barebone form, and though I finally found one single node (for a horrendous price!) from some canadian reseller I never ever managed to get a chassis for it, and never figured out how to power up that board without said chassis (which has the power connectors for those nodes ๐Ÿ˜ฆ ).

So, option 2 it is: Build a desktop machine (which was the preferred option, anyway, since you can actually put that under your desk without going deaf). Now for that I needed a Supermicro K1SPE board, which turned out to be equally hard to find – apparently Supermicro tried to enforce some policy that those could be sold only in fully assembled systems (far too expensive), so finding one took forever – but eventually, I did.

OK, got a board, got a CPU, plug it in, power it up…. except that you need a fan, and try find one that fits this board…. sigh. Tried leaving off the fans and just blowing lots of cool air on it, and that made it at least boot – but only for a few seconds, then it shut off (surprise). After lots and lots of googline I eventually found a distributor in China that sold a heat sing that claimed to be compatible with this socket, except it wasn’t. The combination of “steel saw”, “file”, and “lots of sweat” eventually did convince it to fit the board’s screw holes, but by that time I had taken the CPU in and out so often (it gets stuck to the heat sink via the cooling paste :-/) that I seem to have bent some pins, and the board wouldn’t boot any more. Oh shucks.

Now, end of 2017 I went to the SC17 (“Supercomputing 2017”) conference in Denver, and while there, also strolled past the “CoolIT Systems” booth: CoolIT was the company that built the water-coolers for the Colfax DAP systems (as the big sticker on those coolers very clearly states), and though they didn’t sell those coolers individually, they were – upon asking – helpful enough to help me get one (well, maybe it didn’t hurt that I had worked with them in the past, in another capacity). Maybe not the most practical way for the average person to get hold of one, but either way – I turned lucky, and got one.

So, now having a cooler, it was time to reactivate this build. Sent the board in for repair (another $80, plus shipping, plus hours on Supermicro’s support web page), and finally got it back. And this is where the story starts.

Part 1: Building a Phi Desktop System

OK, at this point – Jan 2018 – I had the following pieces: A Xeon Phi 7250 bought off ebay; a Supermicro K1SPE board; and a CoolIT water-cooler that fit this board.

So, step 1: Take out my brand-new(ly repaired) K1SPE board, and plop in the 7250:

Step 2: Temporarily attach the water-cooler, and screw it on (tip: it might be easier to do this after you put it into a case :-/) :

Now, to test it all out, hook up some PSU (I used an EVGA Supernova 850). The board connects as usual, but for the cooler I couldn’t find a good way to connect so used a fan extension able that plugs into a SATA molex:

Now, to make it boot, I first had to find and read the manual: Since the board is not a regular ATX board it also doesn’t fit the regular power button etc connectors from a regular ATX case …. so had to read the manual to find the pins that the power button would go to if I only had the right proprietary SuperMicro case (you can see which two pins to short from the image below). Note it actually takes a while to boot – it doesn’t just boot up upon the first touching of those pins, you actually have to hold them shorted for a few seconds. Weird, but that’s what it is. Either way, eventually it booted.

Note I did add in a GPU as well – just for testing – but of course that’s not required, since the board does come with a VGA on-board graphics port (and in fact, for the BIOS config it’s easier if you use the VGA port, since the GPU “may” not yet be active to show it by that time. Also don’t get worried if it takes forever to boot – that is normal…. can be up to a few minutes until it beeps and enters BIOS, at least in my board. Either way, eventually it did boot, and I could enter the BIOS to change MCDRAM to cache/quadrant mode:

Interestingly my BIOS complained about something with KNM incompatibility – which doesn’t make the slightest sense because I have a Knights Landing CPU – not a Knights Mill one (at least as far as I know!?), so that was funny. Either way, it eventually did allow me to make the changes, anyway (just showing that for the fun of it ๐Ÿ™‚ ), so everything turned out well.

Rebooted, added a harddisk, installed, and used this as a development machine for quite a while (initially even without a case – who needs a case, anyway!?).

Step 2: Putting it into a case

OK, though one doesn’t need a case, I thought that for the article it’d be way nicer to actually put it into one … so took a Phanteks Enthoo case I had from another (GPU) build, and put that in. Now why an actual section on that? Because as with everything “special built” it turned out it wan’t all that easy, of course.

The problem with fitting it into a case is that the board is not an ATX form factor board. It has about the same dimensions – so it physically fits – but …. but…. the screws to screw the board into the case are in different positions (at least some of them), so they don’t exactly fit. Now some do fit, so my first reaction was simply “I’ll just use the ones that fit” – after all, why would a board need more than two or three screws, anyway – it’s not going to fall out, is it? Well, let me give a clear warning here: The problem is not which board holes do not have any screws fitting them, but which “expected” screws have no holes on the motherboard: turns out there’s a few of those little thingys where the screws are supposed to go into in places where the board doesn’t have holes, and these do scratch the board from the bottom – which I only realized because it no longer wanted to boot (probably shorted something). Turns out I was lucky, and no lasting damage was done – but I did definitely have to take the board out again, find all the “offending” screws, and take them out.

Second problem was that after taking out all the wrong screws there’s so few screws that the board in some areas touches the case (in particular when you press on it to insert PCI cards, etc), so after the previous experience I decided to first cut out some insulation (the foam that comes in most motherboard shipping boxes), and put that in between board and case metal. Screwed on the remaining three or so screws that did have a fitting hole, and perfect. Squeezing in the water cooler (I put it at the top, so it can readiate out on the case’s top) was a bit tricky, but eventually – with some gentle nudges – it actually did fit. So here’s the complete system:

IMG_0486

Part 3: Wheels within Wheels

Now those first two parts were actually done in January 2018 (actually, over the Christmas/New Years’ holidays), and initially, I just put in some GPUs to use the machine as a plain old desktop box. But of course, if you already have some Phi-powered motherboard, wouldn’t it be fun to also use some Phi PCI cards in that? Phis within Phis?

Well, when I first built this machine I had already had some 7220P passive Phi cards from ebay – but those turned out to be horribly hard to cool in a desktop setup (see related article), and though I did think of moving the turbo-fans from that desktop build over into the KNL box it simply didn’t fit. So at least initially, I abandoned this idea – I was doubtful it’d even work, and I wasn’t going to spent all the work trying to rig something up that wouldn’t work.

Now later this year, as I wrote in another article I finally got my hands on some actively cooled Phi PCI cards, and ever since then wanted to play around with trying those in this machine – if I could get Z170 board to take one, maybe the phi board would take one, too?

Well, some time last week somebody that also reads this blog eventually asked me exactly this question: Do the K1SPE boards take Phi cards? And with that question, I finally decided to finally just try it out. And turns out, it does: First tried only one card, removed the old harddisk (which didn’t have MPSS on it), and tried with a regular lukStick … and turns out, it worked like a charm. Now getting bold, I also plugged in the second card, and lo and behold, that one still works. Amazed – all kind of boards have issues with those cards, and the one board that takes two cards without the slightest hitch probably wasn’t even ever intended to work with them.

Here a picture, of the full system: Note I used an old Celeron plastic box as a “spacer” to improve airflow into the cards (else they are back on back).

img_0492.jpg

With this experiment successful, only one thing had to be fixed: The existing luk-mpss-knl miner for mpss offload actually expected a regular (ie, non-KNL) host CPU, and thus got only low hash rate on the host CPU. Had to build a special version that ported the MPSS offload also into the luk-phi miner, but with that, lo and behold, we not have ca 9 kH in that one box: ca 2.6k from the host CPU (it’s centos, after all,ย andย runs the MPSS offload), and two times 3.2kH from the cards…. which is nigh on perfect:

OK, that’s it for today – really have to get some food now…. I do want to stress again that this is probably not the most practical build to try and replicate – but I sure do hope you’ll enjoy it, anyway… it’s been super fun to build, so hope you’ll have some fun just by reading.

With that: Happy Mining!

PS: One of the fun things of this build is that it’s still using the EVGA Supernova 850G: A 850W power supply for a total of three KNLS (of 300W TDP each), plus board, case, fans, etc…..

 

Published by

lukMiner

To learn more about me, look at the "About" page on http://lukminer.org

6 thoughts on “Wheels within wheels… or better: Phis within Phis?”

  1. Why Phi 7220A KNL are so picky about motherboards? Do they use some special features of PCIe that are not present in most CPUs/chipsets/BIOS?

    Like

    1. I don’t fully understand that yet, myself. At least part of the reason is that they require above 4GB decoding, which not all board BIOS’es offer, in particular in some older ones. Even for those that do offer that feature, another common problem with all PCI cards (GPUs included) is the number of PCI lanes different models require – I strongly suspect that the main reason the Z170 boards take 1 but not 2 of those cards is exactly that issue (the Celeron CPU doesn’t have enough lanes to cover 2×16). Part of it may also be driver related – since these are discontinued products there is no driver support other than what already runs – for example, I have a 390 based board in which the cards seem to come up just fine, but in which my lukstick kernel doesn’t recognize the ethernet port – and I can’t update the lukstick kernel because then the driver wouldn’t work any more :-/. All that said, there are now a fair number of people that have gotten them to work just fine – so just the fact that I know of _some_ boards that don’t work doesn’t mean that they never do, either.

      Like

    1. Well, I was lucky and got a lot of them; but I don’t have enough power to run them. I’ve already powered down all my older GPU miners (the phis are way more profitable), but even with that there’s only so much current you can draw in an american house before the breakers are popping. I could show you some pictures of two of my wall outlets …. completely melted! (unbelievable, if you come from europe)

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s