My rig with a BIOSTAR TB250-BTC board was constantly logging PCIe Bus Error messages under /var/log/kern.log
and /var/log/sys.log
. About twenty GBs worth or log files!
Beyond the logging errors, I couldn’t have more than four GPUs attached until I performed the below fix. FYI, I am using five ZOTAC GeForce GTX 1060 AMP Edition (model: ZT-P10600B-10M) cards.
The solution: you need to enable “Miner Mode” in the BIOS Settings for the board.
- During boot hold the delete key until you enter the motherboard setup.
- Once in, navigate to: Chipset => Miner Mode => Set to [Enabled]
For reference, here’s the error that was filling my logs:
pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000 pcieport 0000:00:1c.7: [ 0] Receiver Error (First) pcieport 0000:00:1c.7: AER: Corrected error received: id=00e7 pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e7(Receiver ID)
Doing research online led me down a couple of paths that are NOT needed, and revolved around adding pci flags to /etc/default/grub
. Some red-herring suggestions were:
- GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash pci=nommconf”
- GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash pci=nomsi”
Lesson for the future: After building rigs it would be worth seeing if errors are being perpetually written to the /var/log/ directory. You may not realize it until you either run out of space or if the error finally manifests itself in a way that will cause you to investigate. In my case it was added a fifth GPU.