02-11-2017 07:30 PM - last edited on 02-24-2017 09:55 AM by BiggAl
As a partial follow up to the previous topic UEFI Bug - m900 Tiny - NVMe SSD - PCIe AER I finally determined what was causing my NVMe drive to run at such high temperatures. It appears that ASPM is disabled in UEFI on the m900 Tiny with no visible way to enable it. This causes both NVMe and the 8260 WiFi to run much hotter than it should. The ASPM is enabled by default and working on the ThinkPad Yoga 260 which is why its temperatures are so much lower. The NVMe drive with no ASPM runs 31C hotter at idle and the 8260 WiFI runs 21C hotter. Running that hot doesn't leave much head room for the drive under actual use, and the above temperatures are after already adding a heatsink to the NVMe drive. This issue happens under both Windows 10 and Linux 4.10.
I would suspect this wouldn't be too hard to fix in UEFI.
8260 WiFi - 28C Yoga - 49C m900 Tiny
NVMe - 20C Yoga - 51C m900 Tiny
Mod: "It would be helpful to hear from other customers with M900 Tiny systems with NVMe SSDs experiencing unexpected high temperatures and their observations." in this thread below.
Mod: edited Subject to add M.2 for greater clarity
Solved! Go to Solution.
02-14-2017 08:04 AM - edited 02-14-2017 08:04 AM
Here's an pertinent article/test by Puget Systems of how a M.2 SSD drive will throttle under high loads and temps.
Ultra-fast drives like the Samsung 960 Pro M.2 drive are rated for absolutely amazing performance (up to 3,500 MB/s!), but one issue with these drives is that they will throttle when put under a heavy load. When that happens, the speed of the drive is greatly reduced and in a worst case situation may even end up slower than a standard SATA-based SSD.
Keep in mind that the throttling we saw was largely due to the controller on the M.2 drive overheating. Our testing was done with the system at idle, so if your system has a number of hot components and poor ventilation around the M.2 drive, a 960 Pro (or any M.2 NVMe drive for that matter) will likely throttle faster than what we recorded.
02-14-2017 09:47 PM - edited 02-14-2017 09:54 PM
The reason NVMe drives tend to throttle is that the PCIe controller on the SSD gets hot extremely quickly if ASPM is disabled/broken in the UEFI, as is the case in the m900 Tiny. Under Linux you can manually disable ASPM, if you want to on a per port/device basis, which makes it easy to see the difference in temperatures. Unfortunately Linux can't override ASPM being disabled by UEFI before boot.
For the Yoga 260, which has working ASPM, you can disable ASPM in Linux via:
#00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1) (prog-if 00 [Normal decode])
setpci -s 00:1c.0 0x50.B=0x40
#00:1c.2 PCI bridge: Intel Corporation Device 9d12 (rev f1) (prog-if 00 [Normal decode])
setpci -s 00:1c.2 0x50.B=0x40
#00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) (prog-if 00 [Normal decode])
setpci -s 00:1c.4 0x50.B=0x40
#02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
setpci -s 02:00.0 0x80.B=0x40
#04:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a)
setpci -s 04:00.0 0x50.B=0x40
#05:00.0 Non-Volatile memory controller: Intel Corporation Device f1a5 (rev 03) (prog-if 02 [NVM Express])
setpci -s 05:00.0 0x80.B=0x40
This results in the idle temperatures going from:
8260 wifi - 28C -> 43C
NVMe - 20C -> 34C
So about 14-15C difference in temperature just from the one setting, no other changes. The Yoga 260 is still cooler than the m900 Tiny but that is likely due to differences in the way thermal dissipation is handled (and 15w vs 35w cpu) and at that point isn't really a big deal.
It would be really nice if this would be fixed in UEFI as adding a heatsink to a NVMe drive in particular may void its warranty, the sticker says as much that you have to remove to put the heatsink on the drive. Without a heatsink on my NVMe drive in a m900 Tiny it idles in the warning range and when used goes into the critical range, an extra 14C of head room would make a huge difference.
02-14-2017 11:40 PM
I ran Firmware Test Suite and noticed that there is a table in ACPI called 'LUFT' which appears to be an executable of some sort that has a way to enable ASPM but I don't know how to actually run the program or how it is exposed to the system itself.
Anyone happen to know?
LUFT @ bc4c3840 (354786 bytes) ---- [000h 0000 4] Signature : "LUFT" [004h 0004 4] Table Length : 000569E2 [008h 0008 1] Revision : 01 [009h 0009 1] Checksum : E3 [00Ah 0010 6] Oem ID : "LENOVO" [010h 0016 8] Oem Table ID : "TC-FW " [018h 0024 4] Oem Revision : 00001670 [01Ch 0028 4] Asl Compiler ID : "AMI " [020h 0032 4] Asl Compiler Revision : 00010013
As shown here:
0d10: 09 cd 21 b8 01 4c cd 21 54 68 69 73 20 70 72 6f ..!..L.!This pro 0d20: 67 72 61 6d 20 63 61 6e 6e 6f 74 20 62 65 20 72 gram cannot be r 0d30: 75 6e 20 69 6e 20 44 4f 53 20 6d 6f 64 65 2e 0d un in DOS mode..
14fa0: 00 63 00 65 00 73 00 00 00 14 45 00 6e 00 61 00 .c.e.s....E.n.a. 14fb0: 62 00 6c 00 65 00 20 00 41 00 53 00 50 00 4d 00 b.l.e. .A.S.P.M. 14fc0: 00 00 14 45 00 6e 00 61 00 62 00 6c 00 65 00 20 ...E.n.a.b.l.e. 14fd0: 00 41 00 53 00 50 00 4d 00 20 00 73 00 75 00 70 .A.S.P.M. .s.u.p 14fe0: 00 70 00 6f 00 72 00 74 00 20 00 66 00 6f 00 72 .p.o.r.t. .f.o.r 14ff0: 00 20 00 61 00 6c 00 6c 00 20 00 74 00 68 00 65 . .a.l.l. .t.h.e 15000: 00 20 00 64 00 6f 00 77 00 6e 00 73 00 74 00 72 . .d.o.w.n.s.t.r 15010: 00 65 00 61 00 6d 00 73 00 20 00 70 00 6f 00 72 .e.a.m.s. .p.o.r 15020: 00 74 00 73 00 20 00 61 00 6e 00 64 00 20 00 65 .t.s. .a.n.d. .e 15030: 00 6e 00 64 00 70 00 6f 00 69 00 6e 00 74 00 73 .n.d.p.o.i.n.t.s 15040: 00 20 00 64 00 65 00 76 00 69 00 63 00 65 00 73 . .d.e.v.i.c.e.s
02-19-2017 09:43 AM
Lenovo Engineering checked with the Development team and determined that ASPM is a feature for laptop battery life, and is not used in ThinkCentre design. As a result, this feature or the lack thereof would not be responsible for a change in temperature or behavior.
It would be helpful to hear from other customers with M900 Tiny systems with NVMe SSDs experiencing unexpected high temperatures and their observations.
02-19-2017 05:46 PM
Engineering was half-right, if ASPM is disabled on a laptop you will see EXTREMELY poor battery life. However for PCIe devices especially M.2 NVMe SSDs you need working ASPM to keep them from overheating even in desktop systems. Over the past few years since NVMe SSDs came out many vendors have enabled ASPM support in their desktops, that previously hadn't, due to this fact. You may not see many other Tiny users complaining about this issue as they would need a m900 Tiny to see the problem. The other Tiny systems do not support NVMe drives, and by the way that is one of the main upgrade reasons for going from the m700 Tiny -> m900 Tiny. Larger systems if they support M.2 NVMe drives also might have sufficient active cooling (fans) to keep the drives from overheating, though they would still use a lot more power than they should.
Also note that many of the references below are to power savings, more power usage directly corresponds to higher heat output, which in a Tiny case with no real airflow causes NVMe SSD drives to quickly overheat. Also the last point below would lead to much higher CPU temperatures at idle as well since the CPU can't go into lower power modes when ASPM does not work.
The following review was from several years ago when the first NVMe SSDs were coming out, while using a Haswell test system, prior to ASPM being commonly enabled on desktop systems. Even at that point Intel's own Ivy Bridge desktop motherboards supported ASPM properly.
"Furthermore, we have a clear indication of at least one motherboard bug. PCI Express Active State Power Management (ASPM) is a feature that allows a PCIe link to be slowed down to save power, something that is quite useful for a SSD that experiences long idle periods. ASPM can be activated in just the downstream direction (CPU to device) or in both directions. The latter is what offers significant power savings for a SSD. Our testbed motherboard offers options to configure ASPM, but when enabling the more aggressive bidirectional ASPM level, it locks up very frequently. I tried to test ASPM on my personal Haswell-based machine with a different motherboard from a different vendor, but it didn't offer any option to enable ASPM.
Using a slightly older Ivy Bridge machine with an Intel motherboard, I was able to confirm that the 950 Pro doesn't have any issues with ASPM, and that it does offer significant power savings. However, I wasn't able to dig for further power savings on that system, and all of the power measurements reported with the performance benchmarks in this review were performed on our usual testbed with ASPM off, as it has been for all previous reviews.
Motherboard power management bugs are tragically common in the desktop space, and devices that incorrectly implement ASPM are common enough that it is seldom enabled by default. As PCIe peripherals of all kinds become more common, the industry is going to have to shape up in this department, but for now consumers should not assume that ASPM will work correctly out of the box."
Not supporting ASPM in UEFI apparently also keeps Skylake CPUs from entering lower power states as Intel requires for long term reliabity/use. Comments on the thread noted that another vendors laptop had similar problems until they enabled ASPM in their UEFI. I should also note that I can not reach lower than the PC3 power saving state either with my m900 Tiny. That problem does not occur on the Yoga 260 which has ASPM enabled in UEFI.
"I've now got a Skylake platform with a nice shiny NVMe device, so Serial ATA policy isn't relevant (the platform doesn't even expose a SATA controller). The deepest power saving state I can get into is PC3, despite Skylake supporting PC8 - so I'm probably consuming about 40% more power than I should be. And nobody seems to know what needs to be done to fix this. I've found no public documentation on the power management dependencies on Skylake. Turning on everything in Powertop doesn't improve anything."
The above post was directly pertaining to U/Y chips which are laptop chips but the same warning is in the S datasheet which is what the m900 Tiny uses.
"Caution: Long term reliability cannot be assured unless all the Low-Power Idle States are enabled"
03-07-2017 02:33 PM - edited 04-28-2017 06:41 AM
Over in the Gaming forum, a Y900 desktop (Skylake) with a Samsung 960 M.2 512GB in a built-in slot, is running between 26 and 29C. It does have a dual front fans, but temps are lower than I expected, if ASPM is disabled.
03-09-2017 10:26 AM
After lurking around as a guest on these forums for a while I finally felt it was time to register, to share my experiences on this.
For a while now I have monitored the temperatures (see graph below) of the Samsung 950 NVMe disk running in a vertically free standing M900 Tiny desktop (always on, with latest BIOS and updates from Microsoft as well as Intel/Lenovo installed), in a room where the temperature is around 18-20°C. During this period the Tiny has mainly been used for surfing the web (Firefox with approximately 50 open tabs), streaming music (local and remote), and watching some on-line video clips (YouTube etc) from time to time.
In the graph you can see that the temperatures on idle (i.e. with only Firefox and CrystalDiskInfo running) usually are around 53°C (e.g. immediately after logging in - if Firefox was NOT running it could drop to 52°C). Surfing the web (depending on the site) quickly raises the temperature to between 56-60°C, watching in-page video clips etc may raise the temperature to between 64-67°C. Higher temperatures usaully have been reached in conjuncion with cloning disk image, and/or updating BIOS and installing some updates. I wish I knew what caused the peaking temperatures above 70°C (with 76°C around midnight on the fifth of March).
During the entire period the Microsoft NVMe driver (not Samsung's) has been in use.
On the first of March 2017 the Firmware of the Samsung 950 PRO NVMe disk was updated with the latest Firmware Version (2B0QBXX7), the BIOS to FWKT6AA, and the latest Intel drivers from Lenovo were applied.
I would estimate CPU utilization (according to System Explorer) on average to be below 6%, and usually only 1-2%.
Please let me know if there is anything else I can help you with concerning this issue.
Kind regards // ThinkErEr
03-12-2017 11:04 PM
I have the system idle for the last 10 minutes or so. NVMe Samsung SSD 950 is at 52C (125F), and i76700T are at 37-38C.
03-20-2017 04:23 PM - edited 03-20-2017 10:31 PM
I had to RMA my Intel 600p 1TB due to it dying. I don't know if the temperature had anything to do with it. The new drive idles (running off USB stick nothing on ssd at all) at 63C. As soon as you do anything it goes above throttle point up to 75C then throttling kicks in and slows down a LOT to keep it below 70C. Even just reads cause it to immediately throttle. Also note the throttling causes a HUGE performance degradation when the whole point of buying a m900 Tiny and NVMe drive is for higher performance. It also immediately starts logging to SMART log that the drive is going over temperature while in the m900 Tiny.
Over 300 seconds as shown in the chart below is the following read performance:
ThinkCentre m900 Tiny - 165MB/s (51 GB, 47 GiB)
ThinkPad Yoga 260 - 694MB/s (212 GB, 198 GiB)
The ThinkCentre performance is particularly pathetic as high end hard drives (with platters!) can do more than that. And not enabling ASPM in the UEFI causes it to run at 23% of its normal performance.
I have included a chart showing the temperature difference between the ThinkPad Yoga 260 and ThinkCentre m900 Tiny, both with the Intel 600p 1TB NVMe SSD drive in them. The first 10 seconds of the graph are at the idle temperature.