There have been a lot of reports last week about AMD's Bypass Translation Buffer (TLB) bug and its impact on quad-core Opteron chips. According to foreign media reports, AMD is preparing a kernel for the 64-bit Red Hat Enterprise Linux, known as Upgrade 4. Patches. Unlike AMD's BIOS (Basic Input Output System) fixes and microcode updates, which are reported to reduce performance by 10-20%, Linux patches are said to have a performance loss of less than 1%. However, we also learned that users will have to sign a non-disclosure agreement in order to get the patch.
After the situation was confirmed, AMD released the source code of the patch on the x86-64.org mailing list. However, this code is based on the current situation and will not be modified, with another warning that it is not fully suitable for mainstream systems:
Due to the powerful intrusive nature of this patch and the very small number of users affected (you'll know this if part of your system is affected), we do not recommend using this patch on regular Linux systems. This patch is not intended for mainstream users, nor is it a Linux product for sale! This patch has undergone minimal functional testing. Each user must evaluate it before use to ensure that it meets the necessary quality standards.
In a previously published post on the same mailing list, AMD employee Elsie Wahlig also warned that the patch is "not recommended for upstream products." Wahlig mentioned that the patch was developed by AMD's Operating System Research Center group for Linux 2.6.23.8 and provided a detailed description of the error:
The description of error 298 is as follows: "The processor operation may not be atomic, changing the accessed or dirty bytes in the page translation table entry from address 0b to 1b in the second level cache. Within a small time interval , other cache operations may cause invalid page translation table entries to be installed in the L3 cache before the modified replicated data is returned to the L2 cache. Additionally, if the cache line is detected during this interval, the processor will not. Perform unrelated cache operations on accessed or dirty bytes, and possibly erroneous data. The system may report a Level 3 cache protocol error through a machine detection event. In this case, the MC4 status. The content of the register (MSR 0000_0410) will be B2000000_000B0C0F or BA000000_000B0C0F. The content of the MC4 address register (MSR 0000_0412) will be 26h. "
Wahlig described how the Linux patch works, which also bypasses the BIOS workspace and emulates "accessed and dirty bytes" to prevent erroneous data from building file headers:
The kernel patch solution depends on the root cause of the L2 cache eviction issue. The problem is only revealed when the TLB needs to set an A or D bit in a page table entry. If the TLB never needs to set an A or D bit, the error will not occur. By emulating the A and D bits using the currently writable bits, the patch will ensure that the actual A and D bits are often preset. This is accomplished by forcing an error page when a page is first accessed for which the emulation A bit is not set, and when a writable page is first write accessed for which the emulation D bit is not set. The simulated A and D bits are stored in bit registers, which are usually available to the operating system as page table entries.
AMD stopped issuing the patch in a more tactful statement than initially expected, but the company did not issue a "pass" to all Linux users to avoid the performance loss caused by the BIOS fix.