Showing results for 
Search instead for 
Do you mean 
Reply
Token Ring
Posts: 79
Registered: ‎12-01-2016
Location: US
Message 21 of 86 (2,240 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

[ Edited ]

Tonight I decided to pull the full pci info on all the devices I have easily available to me. They all have ASPM support enabled in "PCIe Link Capability" even an 11 year old system, which is not surprising since it is required by the specification to be enabled. However, the older systems do not have ASPM support enabled in "PCIe Link Control" probably due to buggy hardware from that time, which is perfectly fine.

 

P965 - Broadwater - Jun 2006

P55 - Ibex Peak - Sep 2009

Z87 - Lynx Point - Jun 2013

 

AMD Steamroller (Kaveri) - Jul 2014

 

The last one is actually a Lenovo H50-55 Desktop system.

Red Hat - SSME
ThinkCentre m900 Tiny - ThinkPad W541 - ThinkPad Yoga 260
Token Ring
Posts: 79
Registered: ‎12-01-2016
Location: US
Message 22 of 86 (2,236 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

[ Edited ]

The following is what I sent to support tonight. Hopefully they can get clarification to the UEFI team soon.

 

==

 

What we know:

 

1. Lenovo ThinkCentre m900 Tiny UEFI/BIOS is buggy and has ASPM Support in "PCIe Link Capability" force disabled.
  a. This causes all PCIe devices to run much hotter than they otherwise should. In the m900 Tiny this is the Intel 8260 wifi, and m.2 NVMe SSD.
  b. Without ASPM support m.2 NVMe SSDs run about 30-40C hotter than they otherwise would. The heat increase for larger SSDs (1TB) is enough to kill them while doing writes.
  c. Since the UEFI/BIOS is shared across the entire m700/m800/m900 line, including full size machines, this bug is probably affecting a lot more users than currently realize there is a serious issue.
  d. (Technician) noted that he believes that the same UEFI/BIOS code is also shared with the new m710/m910 (Kaby Lake) line which means this bug is likely present in them as well.
  e. However, there is a change in "M1AKT15A" for the m710/m910 that just says "Enhance Tiny NVME SSD support." which might be related to this issue but it is currently unclear.

 

2. As expected changing the hidden UEFI/BIOS setting under "Setup -> Devices -> PCH-IO Configuration -> PCI Express Configuration -> PCI Express Root Port # -> ASPM Support" does not help to fix the problem.
  a. This is the wrong item to change.
  b. The default is "auto" which should determine what to set the value for "PCIe Link Control" to so this option shouldn't even need changing in the first place.
  c. (Technician) tested a beta UEFI/BIOS that exposed the ability to change the above value which as expected did not make any difference in temperature of the m.2 NVMe SSD.
  d. This is due to the fact that the item is changing "PCIe Link Control" and it can not do anything since "ASPM Support" has already been fully and completely disabled via "PCIe Link Capability".

 

3. There is no UEFI/BIOS "Setup" item to turn on and off "ASPM Support" at the "PCIe Link Capability" level because it is actually a violation of the PCIe Base Specification to be disabled on PCIe root ports.
  a. The fact it is disabled on the m900 Tiny and not on any other line of systems, so far inspected, leads me to believe this is a inadvertent bug in the PCIe platform init code for PCIe root ports.
  b. I do not believe this point has been communicated properly to the UEFI/BIOS team!

 

Solution:

 

1. Find in the PCIe platform init code where "ASPM support" is set for "PCIe Link Capability" on the PCIe root ports and set it to "ASPM L0s L1" instead of "ASPM not supported".
  a. Note this does not actively enable "ASPM L0s L1" on all root ports, it just sets the ability to actually use it at all.
  b. The UEFI/BIOS then configures "PCIe Link Control" when (#2 above) is set to the "auto" setting what it should be actively set to. Or it can be overridden via the hidden setting as long as its not previously disabled via "PCIe Link Capability"
  c. No change should be needed for the "Setup -> Devices -> PCH-IO Configuration -> PCI Express Configuration -> PCI Express Root Port # -> ASPM Support" as mentioned before since the default is 'auto'.
  d. For more details see the "PCI Express Base Specification Revision 3.0" Sections 5.4.1 and 7.8.6

 

==

Red Hat - SSME
ThinkCentre m900 Tiny - ThinkPad W541 - ThinkPad Yoga 260
Token Ring
Posts: 79
Registered: ‎12-01-2016
Location: US
Message 23 of 86 (2,231 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

[ Edited ]

If you look at the Lenovo H50-55 PCI configuration in the pci.zip you will see the following below. Notice "PCIe Link Capability" is set to "ASPM L0s L1" for all ports as required by PCIe Base Specification. However only some of the devices have ASPM actually turned on via "PCIe Link Control" probably due to lack of proper support in hardware for those items.

 

The m900 Tiny has everything disabled which is the bug.

 

Note: This probably actually affects all m700/m800/m900 systems not just the m900 Tiny as they share the same UEFI/BIOS rom file. It just happens to have killed my m.2 NVMe SSD in a m900 Tiny.

 

--

 00:03.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port (prog-if 00 [Normal decode])

 

LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

 

00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Hudson PCI to PCI bridge (PCIE port 0) (prog-if 00 [Normal decode])


LnkCap: Port #247, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-

 

00:15.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Hudson PCI to PCI bridge (PCIE port 2) (prog-if 00 [Normal decode])


LnkCap: Port #2, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+

 

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)


LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

 

04:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter


LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+

L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+

Red Hat - SSME
ThinkCentre m900 Tiny - ThinkPad W541 - ThinkPad Yoga 260
Highlighted
Token Ring
Posts: 79
Registered: ‎12-01-2016
Location: US
Message 24 of 86 (2,119 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

[ Edited ]

Tracked down when the bug was added. UEFI v5F.

 

CHANGES for FWKT5FA
  - [Important] Update includes security fixes.
  - Fix Windows Event Viewer reports WHEA_Logger Event ID 17 error with some kind of configurations.
  - Support IPV4 and IPV6 boot when Boot Mode is set as "UEFI Only".

 

It disabled PCIe ASPM and introduced the PCIe AER errors going continously, probably due to that. I think the issue they were trying to work around was a buggy SSD causing PCIe AER errors as mine appeared to be doing that before the 109 firmware update, after updating to 109 I only get the errors starting with v5F. However disabling PCIe ASPM just made the situation much worse. And the proper way to work around buggy ASPM in devices is to disable it via LnkCtl for the specific device, eg a blacklist, not disable it in the PCIe root port's LnkCap for every device in the system.

 

Running v5A instead of anything later (v5F - v70) causes my PCIe SSD to idle at 37C instead of 63C and my Intel 8260 wifi to idle at 34C instead of 51C. A difference of 26C on the SSD and 17C on the wifi!

 

Or a huge drive eating regression with v5F+ depending on the point of view...

 

Now we just have to wait on engineering to actually fix their mess.

 

 

(Adjusted wifi temps after letting it idle longer, it cooled down more than expected)

Red Hat - SSME
ThinkCentre m900 Tiny - ThinkPad W541 - ThinkPad Yoga 260
Punch Card
Posts: 9
Registered: ‎02-25-2017
Location: SE
Message 25 of 86 (2,104 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

Hm, that is interesting, and a very good find. Remembering you mentioned your ruined NVMe SSD, and once again seeing that WHEA_Logger Event Fix mentioned amongst the CHANGES for FWKT5FA, brings back some old memories and observations from the past that might be relevant to share.

 

I started using the system (M900 Tiny) with my Samsung SSD 950 PRO 512GB around mid August last year (2016) and only used it occasionally, hibernating inbetween, until around mid December, when I installed the version of the BIOS mentioned above. Until that WHEA_Logger issue was finally fixed by a BIOS update around mid February, and thereafter, I left the system be always on, initially in order to avoid those annoying WHEA_Logger entries filling up the log file. I remembered being puzzled by the fact that Samsung Magician showed suspiciously high values for Total Bytes Written, and never managed to figure out the cause for this, at that time, and started to monitor the growth by taking pictures and screen shots. During the autum and until the WHEA_Logger issue was finally fixed in February, my system managed to consume 7TB of Total Bytes Written. From then on, and until today, having used the system, more or less on a daily basis, the system has only being consuming an additional 0.2TB of writing, and is now showing 7.2TB of Total Bytes Written.

 

The above could be a strong indication towards the possibility of ruining, or at least severely devaluating the limited life time of, a NVMe ssd, by using one of those pre-February/FWKT6AA (?) BIOS:es.

ThinkCentre M900 Tiny, i7, 16GB, Samsung 950 PRO M.2 NVMe 512GB
awd
Paper Tape
Posts: 1
Registered: ‎05-16-2017
Location: US
Message 26 of 86 (1,944 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

I was about to order an M710 Tiny with an M.2 NVMe SSD & 8265 WiFi, but now it sounds like this is a poor choice due to temperature stress. Really appreciate all the debug work documented here, but so frustrating to see that Lenovo has not yet addressed the problem.  Not being able to use an SSD reliably in this system is a shocking limitation and has me reconsidering other brands.

 

Any confirmation from other users that the new M710/M910 series also suffers from this problem, as suspected.

 

Any word on when/if this critical issue might be resolved??

Token Ring
Posts: 79
Registered: ‎12-01-2016
Location: US
Message 27 of 86 (1,890 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t


awd wrote:

Any confirmation from other users that the new M710/M910 series also suffers from this problem, as suspected.

 

Any word on when/if this critical issue might be resolved??


I asked in another top level thread for 'lspci' output from the m710/m910 but haven't heard from anyone yet. I think I may bump the post to get more eyes on it. Support thinks that they have the same problem but aren't completely sure. Also there is still no ETA on a fix, meanwhile they have already released another UEFI update without fixing it.

Red Hat - SSME
ThinkCentre m900 Tiny - ThinkPad W541 - ThinkPad Yoga 260
What's DOS?
Posts: 1
Registered: ‎05-19-2017
Location: DE
Message 28 of 86 (1,850 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

I got the same problem. Poor performance and high temperatures. 

M900 Tiny, i5, 250 GB 960 Evo & 1TB 850 Evo 

 

 

 

 

Token Ring
Posts: 79
Registered: ‎12-01-2016
Location: US
Message 29 of 86 (1,775 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

[ Edited ]

Apparently Lenovo has now blocked me from contacting email support. I tried from both email accounts I had contacted them with previously and both are blocked. They also did not answer their direct phone line. I had emailed them today to see if there was any further information available and it was bounced. I then emailed them from a third email account that they had previously not seen and it went through (no bounce), but with no response yet.

 

Very professional guys

 

I'd recommend requesting your money back if you have a m900 and intend to use a NVMe they very clearly seem to have no intent to actually fix this issue.

 

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to postmaster.

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

                   The mail system

<ab***@lenovo.com>: host cluster9.us.messagelabs.com[216.82.251.41] said:
    553-Message filtered. Refer to the Troubleshooting page at
    553-http://www.symanteccloud.com/troubleshooting for more 553 information.
    (#5.7.1) (in reply to end of DATA command)

<at***@lenovo.com>: host cluster9.us.messagelabs.com[216.82.251.41] said:
    553-Message filtered. Refer to the Troubleshooting page at
    553-http://www.symanteccloud.com/troubleshooting for more 553 information.
    (#5.7.1) (in reply to end of DATA command)

<ew***@lenovo.com>: host cluster9.us.messagelabs.com[216.82.251.41] said:
    553-Message filtered. Refer to the Troubleshooting page at
    553-http://www.symanteccloud.com/troubleshooting for more 553 information.
    (#5.7.1) (in reply to end of DATA command)

<sm***@lenovo.com>: host cluster9.us.messagelabs.com[216.82.251.41] said:
    553-Message filtered. Refer to the Troubleshooting page at
    553-http://www.symanteccloud.com/troubleshooting for more 553 information.
    (#5.7.1) (in reply to end of DATA command)

 

The error message indicates that your email was blocked as spam by our Signaturing anti-spam filter. Please try the following steps to resolve the issue:

I. Check if your sending IP is on any spam lists, search for “spam database lookup”. You are looking for any 3rd party lists that may have received spam from your mail server. If your IP address is on any of these block lists, please make a removal request as soon as possible, once removed please retry sending your mail. 

II. Ensure your mail server is not open relay, search for “Email Open Relay Tester” and choose from any number of testers. 

III. If your internet line is provided by DSL or Cable that shares IP’s with residential users, please ensure your mail server sends to your ISP’s smart host instead of direct to the internet. This reduces the potential of your email being detected in error as coming from a Trojan infected home user machine. 

IV. Ensure the email you are sending does not contain any spam content (i.e. forwarded spam or ‘spamvertised’ URL’s).

V. Ensure your mail server is configured correctly.

VI. Ensure you have no virus infected machines on your network that are being used to send spam through your mail server.

VII. Ensure you have no exploitable web scripts on your web servers that could be abused to send spam. The most commonly used one is PHP contact scripts which spammers use to abuse the PHP “mail()” function, giving them free reign to send what they want.

VIII. Make sure any ‘opt-in’ newsletters contain an ‘opt-out’ link to be certain users can easily unsubscribe. If the problem persists, please contact the recipient by other means and request that your email address be added to their Symantec Cloud approved senders list.

If the problem persists, please contact the recipient by other means and request that your email address be added to their Symantec Cloud approved senders list. Additionally, a legitimate email which has been incorrectly given a verdict of spam can be submitted to Symantec for analysis and filter review. Please review the False Positive submission article to submit a false positive message for analysis.

 

Red Hat - SSME
ThinkCentre m900 Tiny - ThinkPad W541 - ThinkPad Yoga 260
Token Ring
Posts: 79
Registered: ‎12-01-2016
Location: US
Message 30 of 86 (1,757 Views)

Re: M900 Tiny: UEFI Bug - M.2 NVMe SSD & 8260 WiFi - ASPM disabled - Much hotter temperatures (t

I finally got a response back via my third email account that they are still looking into the issue. They are discussing with AMI wether its actually a spec violation or not to have ASPM disabled on the root port implying they don't want to actually fix it if they can get away with it. Smiley Mad My contact is going to be on vacation soon and I will be too, so I doubt I will hear anything else back until at least June 12th.

Red Hat - SSME
ThinkCentre m900 Tiny - ThinkPad W541 - ThinkPad Yoga 260
Top kudoed Authors