x100 MPSS Users: Major bugfix release!

To all KNC/MPSS users: Please immediately update to http://files.lukminer.com/lukMiner-0.9.2.tgz !

First of all: To all those that volunteered to be guinea-pigs and try the MPSS offload version I added in 0.9: A giant “mea culpa, mea culpa, mea maxima culpa”…. I hardly know how to write this without feeling like a complete ass, but that MPSS offload version did indeed have a major flaw that led to the KNC device code getting stuck in mining developer shares after 10 minutes, so to whoever ran that code: your KNC card(s) have been mining for me, and only for me, since 0.9 came out …. oh man, I don’t know what to say….

Reason I didn’t spot this sooner – even after two KNC/MPSS users reported un-expectedly low hashes for their accounts – is that this bug appears only in the MPSS offload version (not in cpu, phi, opencl, cuda, or even knc native mode) … and even when running that mpss mode it only happened for the shares computed on the KNC (the CPU threads still mined for the user)… and even then, it’d happen only after a few minutes… and even then, you’d only “really” see it if you ran without the CPU threads…. and even then, the outputs looked absolutely right, ….. which made it all so hard to reproduce. But still, those are nothing but empty excuses; I did verify the bug existed, and now I do feel like said “complete ass”.

Now, what next? First of all: Once again, mea culpa, that should not have happened. Also many, many thanks to those that reported this bug, and still stuck with me … I owe you one. Next: To every body that did run this MPSS version for a considerable time: let me know how much you think you should have made, and I’ll glady reimburse you (it can’t be that much; the MPSS shares are rather small). In addition: as another sign of how sorry I am I just changed to KNC miner share from 4% to 1%, indefinitely. And finally: if you do intend to run in MPSS mode, please update your miner to 0.9.2 ASAP (here the link: http://files.lukminer.com/lukMiner-0.9.2tgz).

Again, my most sincere apologies… I don’t know what else to say …

 

Update 2/11: Updates the link from “0.9.2rc2” (release candidate 2) to “0.9.2” (the actual release).

Published by

lukMiner

To learn more about me, look at the "About" page on http://lukminer.org

15 thoughts on “x100 MPSS Users: Major bugfix release!”

  1. First, no apology is required. You made full disclaimer this was a work in progress. You owe me nothing, if I’m making something from you hard work.
    We will still owe you in the end and reduction was greatly appreciated. Without Lukminer, there is no Phi mining.

    Like

    1. Thank you indeed! I am always amazed by the community of this blog’s readers: I have yet to see a _single_ negative comment or feedback in any of my posts or comments (and that in an age of hate-posts and trolls pretty much everywhere else!)…. and even in this instance, where the fault was clearly mine, only supportive comments. Highly appreciated!

      Like

  2. I totally agree with MrPoet, you owe me nothing.

    I have a question and an issue for you.
    The issue is that the -h/–help argument doesn’t work, the question is, how can I run the opencl version without using the cpu? I tried -t 0 but one core goes 100% anyway. Is there an argument to select which opencl devices to use?

    Thanks again for your work

    PS: The knc native miner dev fee are still 4%, I keep using this version 🙂

    Like

  3. Thanks for reporting; just filed those to my bug tracker.
    – The “-h” is a bug – seems that broke in the rewrite for 0.9x. Will fix for next release.
    – The “-t 0” is the right thing to do for not using the CPUs – the reason one core remains running is likely the polling for results. I’ll see if I can reduce that, shouldn’t be too hard.
    – Dev share for the native version – sigh, yes, you’re right, I only changed the MPSS binary; will fix in next version!

    Like

  4. THANKS!! I set up Centos 7.3. with PHI 3120A and Intel i3 4130 with your tutorial, works really good!

    [15:18:41] knc device #0: share FOUND (nonce 0x00022A19)! (hash rate this thread = 541.797H/s)
    [15:18:41] submitting share w/ difficulty 15420
    [15:18:41] -> share *accepted*: 218/218 (100.00%) – total hashrate 612.97H/s (may take a while to converge)

    [root@localhost ~]# micinfo
    MicInfo Utility Log
    Created Tue Feb 13 15:16:25 2018

    System Info
    HOST OS : Linux
    OS Version : 3.10.0-514.el7.x86_64
    Driver Version : 3.8.3-1
    MPSS Version : 3.8.3

    Host Physical Memory : 3667 MB

    Device No: 0, Device Name: mic0

    Version
    Flash Version : 2.1.02.0391
    SMC Firmware Version : 1.17.6900
    SMC Boot Loader Version : 1.8.4326
    Coprocessor OS Version : 2.6.38.8+mpss3.8.3
    Device Serial Number : ADKC33300924

    Board
    Vendor ID : 0x8086
    Device ID : 0x225d
    Subsystem ID : 0x3608
    Coprocessor Stepping ID : 2
    PCIe Width : x16
    PCIe Speed : 5 GT/s
    PCIe Max payload size : 256 bytes
    PCIe Max read req size : 512 bytes
    Coprocessor Model : 0x01
    Coprocessor Model Ext : 0x00
    Coprocessor Type : 0x00
    Coprocessor Family : 0x0b
    Coprocessor Family Ext : 0x00
    Coprocessor Stepping : C0
    Board SKU : C0PRQ-3120/3140 P/A
    ECC Mode : Enabled
    SMC HW Revision : Product 300W Active CS

    Cores
    Total No of Active Cores : 57
    Voltage : 985000 uV
    Frequency : 1100000 kHz

    Thermal
    Fan Speed Control : On
    Fan RPM : 3000
    Fan PWM : 52
    Die Temp : 82 C

    GDDR
    GDDR Vendor : Elpida
    GDDR Version : 0x1
    GDDR Density : 2048 Mb
    GDDR Size : 5952 MB
    GDDR Technology : GDDR5
    GDDR Speed : 5.000000 GT/s
    GDDR Frequency : 2500000 kHz
    GDDR Voltage : 1501000 uV

    mic0 (temp):
    Cpu Temp: ……………. 81.00 C
    Memory Temp: …………. 64.00 C
    Fan-In Temp: …………. 42.00 C
    Fan-Out Temp: ………… 65.00 C
    Core Rail Temp: ………. 64.00 C
    Uncore Rail Temp: …….. 63.00 C
    Memory Rail Temp: …….. 63.00 C

    Like

    1. Nice! I remember “from the olden days” (when I worked on KNCs in my day job) how much of a challenge installing the MPSS stack could be, so glad those step-by-step instructions make it easier!
      In particular glad to hear from somebody that actually benefitted from it – it’s actually a surprising amount of work writing those articles, so it’s the hearing from somebody that is was actually useful that makes it worth it! Enjoy!

      Like

  5. “I only changed the MPSS binary; will fix in next version!”

    Awesome! I’ve been looking forward to running native to see if there’s any difference from MPSS.

    Like

    1. “Shouldn’t” be any major difference (other than the network setup getting more tricky (in native mode the KNC themselves need network set, while in MPSS mode only the host does).

      Fair warning, though: Those KNCs are _not_ as well tested as the other platforms, because I only run one pair of them, in a machine I don’t often get to. In particular, one user already reported that he _thinks_ that from time to time the MPSS version runs into some issues (to be more exact: when the miner restarts on dropped connection – nicehash, my bane – then “apparently” the performance after that restart is lower – probably two instances running on the KNC in parallel :-/). Will take a while to fix :-/

      Like

      1. I’ve noticed dropped connection issues on both 0.9.0 and 0.9.2. I’m not sure if it’s because it’s switching to the dev share, or if it runs into connection problems, and just returns to the dev share by default, but every time I’ve seen it, it looks like this:

        [12:58:49] semi-fatal error in serving stratum 1: could not read from socket!
        … cancelling active job and reconnecting
        [12:58:49] connecting to pool xmr-usa.dwarfpool.com:8080
        [12:58:49] semi-fatal error in serving stratum 2: could not read from socket!
        … cancelling active job and reconnecting
        [12:58:49] connecting to pool xmr-usa.dwarfpool.com:8080

        Could this be a problem with the pool you’re using? I can’t imagine that there’s different socket code for the dev side and the user side.

        Also, 0.9.0 would segfault quite a bit. Upgrading to 0.9.2 seems to have fixed that. I dumped a log:

        https://pastebin.com/Zfs38My4

        Probably irrelevant now, but I thought it worth mentioning. I can provide core dumps if necessary.

        I’m happy to help debug native mode a bit more. Instead of port forwarding, I’m going to bridge them on to the local ethernet and let them get an address from the network’s DHCP server. Then they’ll be accessible directly from anywhere on my network.

        Lastly, you listed hash rates for 3120 and 7120. I’d like to report that the 5120 averages 580H/s.

        Like

      2. Hey, Playa,

        Re the 5120 – just put it on the web page… thank you!

        Re the “connection errors” – that does indeed look like dwarfpool is frequently dropping the connections; however, it’s actually (in this case) nothing to be afraid of: Unlike other miners lukMiner doesn’t hold a single connection that it sometimes swtiches from devshares to user shares, but instead actually holds multiple connections at once; and will pick from those in parallel (depending on the miner share). In that case, if one of those drops it will give that message and reconnect, but does that _not_ mean that the miner is _only_ connected to this connection! What I presume is happening – in particular since you seem to be using KNCs – is that for longer times there’s nothing mined (or at least, found) at all on the devshares, so dwarfpool says “Hey, there’s nothing happening on this connection, I’ll drop it”… and the miner will simply reconnect as soon as this connection is detected as dead. That’s perfectly OK, and it’s only the output that’s misleading.

        Re the core dumps in 0.9.0 – yes, that was a nasty bug: it did try the reconnect just as in 0.9.2, but the connection was refcounted, and when it threw the exception that recount got decreased, so it would try to connect to a pool whose memory was alrady deallocated …. which sometimes works, but often obviously doesnt 🙂

        Like

  6. Just discovered lukminer and thank you for your work.
    Is it possible to select the device to use?
    I have a gpu RIG and mining many crypto, so it could be nice to select the device.

    Like

    1. Hey, Christian,

      Yes, you can. Apparently the help output is somewhat broken in the current output (I’ll fix that next version), but ’til then:
      “-cl 2,4,5” or “–cl-devices 2,4,5” would tell the miner to (only) use cl devices 2, 4, and 5 (you’ll see in the output which ones those are).

      With “-t 0” you can force it to _not_ use the host CPUs (though usually, why not?).

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s