Since I wrote the first post on mining with a Phi 7220 setup I’ve literally been drowned in questions and inquiries, in particular relating to “what performance would I get on this or that other flavor of the phi”. For a while now, I’ve always replied to those as “I have ordered a 7250 system, and will update as soon as I get it”….
Now earlier yesterday, my first order from Exxact (a 2U system with 4 nodes, each having a phi 7250, 48GB of memory, a 150GB harddrive, and OPA cards) has finally arrived. I’m still in the very first stages of getting it up and running (I do have a day job, so more work – and a full blog post – has to wait until the weekend), but I at least installed a CentOS last night, ran a first test, and can therefore at least share the “first light” results: right out of the box (literally…), I’m currently getting 2350H/s per node (which’d be roughly 9.5kH total – I only installed one so far).
Now 2350H/s is not bad at all for a single node (in particular at that price), but the observant reader will immediately see that it is indeed somewhat less than expected: from pure clock rate and core count I’d have expected close on 3kH per node, so at 2350 it’s performing about 20% lower than expected. At least for now, I do not yet know why the 7250 actually performs 5-10% lower than a 7210/7220 would (it has both more core and higher clock rate!), and in fact, I’m still rather optimistic that it’s just something I haven’t properly configured, yet (most obvious difference is a very different kernel version than what my 7210 and 7220s use?!)…. so most likely, it’ll eventually do even better than that.
Anyway, I had promised to share even the earliest results, so wanted to drop this short note now rather than wait for the final performance after tuning. And I promise, I’ll write a full blog on how to properly set up and configure those 7250 nodes as soon as I figure out what’s wrong with the default set-up.
- Downside: Using version 0.9.0, the phi 7250 performance (at least for now) is about 20% lower than expected
- Upside: I can confirm at least 2350H/s/node (which will already pay for the node in less than 12months!)… and it can only get better 🙂
26 thoughts on ““First light” on Phi 7250 hashrate”
Thanks for your test! Did you ever tested 7290? Very interesting what it can do
I don’t have any myself, and though I’d have access to some on my workplace I can’t use those for personal purposes, so simply don’t know. If you ever do run, please share results; I’d be happy to put up a page that lists performance on all models.
Actually, if you look at the specs:
– Xeon Phi 7250: 34 tiles / 68 cores (+ 4 tiles / 8 cores for yield recovery)
– Xeon Phi 7210: 32 tiles / 64 cores (+ 6 tiles / 12 cores for yield recovery)
It seems like the die is the same or is similar and they just activated just 2 extra tiles / 4 cores. And the fact that there are less cores for yield recovery somehow affects that. Basically it may be that with 32/64 you get the peak for cryptonight pipeline and beyond that you get slow down because of over-saturation….
Pure speculation 🙂
Good theory (I had throught of that, too, in the past 🙂 ), but the 72*20*s actually do scale in core count, so over-saturation likely isn’t the issue. Also, I tried running with fewer threads to reduce pressure, and that didn’t help…
Another user just posted he confirms 3kH with Ubuntu 16, so likely it’s simply a kernel issue (I used (the Intel-recommended) CentOS on my test, and that usually has much older kernels :-/).
i have test it on 7250 and it can run at about 3kh/s on ubuntu 16.04 .on centos7 i had run a 2350h/s too,just a different kernel(os).
THANK YOU for this update!
I had already suspected that OS/kernel version is part of the issue (I also usually run Ubuntu1604, but had tried CentOS7 on the latest 7250 run…) but good to hear somebody confirm that!
I have the same Exxact server and running on Ubuntu server 16.04, but can only get 2.4 kh/s. How did you do it? Any BIOS or other tricks to do? Thanks a lot in advance for your help.
could u tell me the 7250 box price?
It obivously depends on who you order from, what config you get, and how well you negotiate.
For me, so far I have two things ordered:
a) a 4x-7250 system from exxact; full system with SSDs, memory, OPA cards, etc; ready to plug in, for something between $5.5k and $6k. That’s about $1400 bucks per node.
b) a 4×7210 “minimalistic” system from Asrock Rack (minimalistic meaning _including_ CPUs, but no memory, disk etc (which I wouldn’t need)), for about $3k. That’s pretty much exactly $750 (using 7210s, though, not 7250)
Which dealer/website did you use for ordering the asrock rack 2u4n-f/x200? I searched and could find any useful info. Thx!
I ordered directly from Asrock. I asked for permission to share the contact address, will update when I get a reply.
Look at the latest blog post re mining on the asrock rack systems – there’s an email to follow …
Can you answer why the 7120p’s and 5110’s are 5x slower? Is it the fewer cores and slower clock speed?
Those x100s are 5 year old chips; they not only have lower peak flops, they also have a different instruction set, “smaller”/simpler cores, and in particular, no MCDRAM. In fact, it’s fascinating that with the right code those 5 year old chips are still “comparable” to a modern GTX1070 in terms of monero performance ….
Use the BIOS Luke. Усе the BIOS 😉
It’s about memory mode setup. Don’t forget that PCIe cards don’t have DDR4 memory. Only HB MCDRAM.
Probably you need to switch to CACHE or FLAT mode. And try to set All2All mode.
In extreme cases, I would remove all DDR4 memory.
LOL. “Yce” the bios … a _double_ insider joke, that’s cool ;-).
As to the bios: yes and no. Yes in that your suggestions are absolutely right – it’s about the mcdram, and by default it should be in cache mode (i prefer quadrant over all2all, but doesn’t matter) – and even that the best and easiest way is to simply rip out the DIMMs altogether. All 100% correct (you seem to know those KNLs! 😉 )…. the ‘no’ comes in the sense of ‘already done, that alone won’t do it’.
“Apparentely” ther’s also a OS component (ubunto 10% faster than centos; hugepages verified enabled in both versions). Also there seems to be _some_ additional bios influence that’s _not_ as simple as changing mcdram mode: I have two “identical” K1SPE boards (with slightly different bios versions), and one performs around 15% higher than the other, with same OS (in fact, same disk, just moved over), same processor, etcpp. Verified with both 7210 and 7250. (and yes, both boards use same bios settings wherever the same options is shown in both bios versions).
Now do you have an answer for _that_ ?
Care to explain the YCE reference ?
What is it about knights landing that makes it effective for cryptonight based coins? Is it the mcdram or the aes-ni ?
Do you expect similar efficiency with other crypto algorithms ? Scrypt / ethash / lyra2 / blakes etc? Why or why not ?
“Yce” – In kyrillic, the “Y” is pronounces “yu”, and the “c” is pronounced “s” – so from a russian poster, “Yce the bios, luke” gets pronounced “(y)use the bios, luke” – so at least a double-pun ;-).
As to the performance, it’s partly aes-ni, but even more the MCDRAM and vector units that make the difference. Currently working on ethash – that’s _so_ much bigger than cryptonight – and here again the vector units and mcdram should make a difference: curreingly doing 11.5MH/s ethereum and not even using a fifth of the MCDRAM bandwidth, yet (still very early code; not exactly optimal, yet :-/).
Really good stuff. Can you only mine CPU based coins with these or can GPU based coins also be mined profitably?
I have some ether code “mostly” ready, but can’t currently find the time to finish it up
what’s the performance ether on phi ,thanks
Had to start with completely new code to make good use of the vector units, so this is still “work in progress”. The first numbers I’m seeing are about 11ish MH/s, but that’s with code that’s not even close to optimal. Right now I’m only at a fifth or so of MCDRAM bandwidth, and most of the code is still horribly scalar, in which case it’s a pretty good sign that it’s already doing a double-digit number of hashes. Will update when I get more time to look into this (right now I don’t 😦 )
Yea the flexibility to mine other coins is keeping me on the fence from investing in the Exxact system. Electroneum seems to be a solid backup coin to mine if monero has any unexpected fallouts.
Luk, I’m running the Exxact 7250 servers but I’m only getting 2180H/s per node. I set the memory to cache mode. I’m on CentOS (about to try LukStick) but you mentioned you were getting 2350 w/ CentOS, so something still doesn’t seem right. Anything else I need to configure?
“cache” and “quadrant” are important in the bios, but that you should already have. I’ve had somebody once tell me he got different results with memory in vs memory out (I don’t have memory, so don’t know), but that _shouldn’t_ make a difference. One other difference of course could be kernel and/or bios/firmware version…