Getting Started: Familiarize yourself with server-related parameters

Author：Eve Cole Update Time：2009-07-24 15:49:26

1. Server processor frequency

The main frequency of the server processor is also called the clock frequency. The unit is MHz and is used to indicate the computing speed of the CPU. CPU main frequency = FSB × multiplication factor. Many people think that the main frequency determines the running speed of the CPU. This is not only one-sided, but also for servers, this understanding is also biased. So far, there is no definite formula that can realize the numerical relationship between the main frequency and the actual computing speed. Even the two major processor manufacturers Intel and AMD have great disputes on this point. We start from Intel Looking at the product development trends, it can be seen that Intel attaches great importance to strengthening the development of its own main frequency. Like other processor manufacturers, someone once compared it with a 1G Transmeta processor. Its operating efficiency is equivalent to a 2G Intel processor.

Therefore, the main frequency of the CPU is not directly related to the actual computing power of the CPU. The main frequency indicates the speed of the digital pulse signal oscillation in the CPU. We can also see examples of this in Intel's processor products: 1 GHz Itanium chips can perform almost as fast as 2.66 GHz Xeon/Opteron, or 1.5 GHz Itanium 2 is about as fast as 4 GHz Xeon/Opteron. The computing speed of the CPU also depends on the performance indicators of various aspects of the CPU's pipeline.

Of course, the main frequency is related to the actual computing speed. It can only be said that the main frequency is only one aspect of CPU performance and does not represent the overall performance of the CPU.

2. Server front-side bus (FSB) frequency

The front-side bus (FSB) frequency (i.e. bus frequency) directly affects the speed of direct data exchange between the CPU and memory. There is a formula that can be calculated, that is, data bandwidth = (bus frequency × data bandwidth)/8. The maximum bandwidth of data transmission depends on the width and transmission frequency of all data transmitted simultaneously. For example, the current Xeon Nocona that supports 64-bit has a front-side bus of 800MHz. According to the formula, its maximum data transmission bandwidth is 6.4GB/second.

The difference between FSB and FSB frequency: The speed of FSB refers to the speed of data transmission, and the FSB is the speed of synchronous operation between the CPU and the motherboard. In other words, the 100MHz FSB specifically refers to the digital pulse signal oscillating ten million times per second; while the 100MHz front-side bus refers to the amount of data transmission that the CPU can accept per second, which is 100MHz×64bit÷8Byte/bit=800MB/ s.

In fact, the emergence of the "HyperTransport" architecture has changed the actual front-side bus (FSB) frequency. We previously knew that the IA-32 architecture must have three important components: Memory Controller Hub (MCH), I/O Controller Hub and PCI Hub, such as Intel’s typical chipsets Intel 7501 and Intel7505 chipsets, which are dual Xeons. The processors are tailor-made. The MCH they contain provides the CPU with a front-side bus frequency of 533MHz. With DDR memory, the front-side bus bandwidth can reach 4.3GB/second.

However, as processor performance continues to improve, it also brings many problems to the system architecture. The "HyperTransport" architecture not only solves the problem, but also improves the bus bandwidth more effectively, such as AMD Opteron processors. The flexible HyperTransport I/O bus architecture allows it to integrate the memory controller, so that the processor does not transmit data through the system bus. The chipset exchanges data directly with the memory. In this case, I don’t know where to start talking about the front-side bus (FSB) frequency in AMD Opteron processors.

3. Processor FSB

The FSB is the base frequency of the CPU, and its unit is also MHz. The CPU's FSB determines the running speed of the entire motherboard. To put it bluntly, in desktop computers, what we call overclocking refers to overclocking the CPU's FSB (of course, under normal circumstances, the CPU multiplier is locked). I believe this is well understood. But for server CPUs, overclocking is absolutely not allowed. As mentioned earlier, the CPU determines the running speed of the motherboard. The two run synchronously. If the server CPU is overclocked and the FSB is changed, asynchronous operation will occur. (Many desktop motherboards support asynchronous operation.) This will cause the entire server to run asynchronously. System instability.

In most current computer systems, the FSB is also the synchronous running speed between the memory and the motherboard. In this way, it can be understood that the CPU's FSB is directly connected to the memory to achieve a synchronous running state between the two. It is easy to confuse FSB and FSB frequency. Let’s talk about the difference between the two in the following FSB introduction.

4. CPU bit and word length

Bit: Binary is used in digital circuits and computer technology, and the codes are only "0" and "1". Whether "0" or "1" is a "bit" in the CPU.

Word length: In computer technology, the number of binary digits that the CPU can process at one time per unit time (at the same time) is called the word length. Therefore, a CPU that can process data with a word length of 8 bits is usually called an 8-bit CPU. In the same way, a 32-bit CPU can process binary data with a word length of 32 bits per unit time. The difference between byte and word length: Since commonly used English characters can be represented by 8-bit binary, 8 bits are usually called a byte. The length of the word length is not fixed, and the length of the word length is different for different CPUs. An 8-bit CPU can only process one byte at a time, while a 32-bit CPU can process 4 bytes at a time. Similarly, a 64-bit CPU can process 8 bytes at a time.

5. Frequency multiplication coefficient

The multiplication factor refers to the relative proportional relationship between the CPU main frequency and the FSB. Under the same FSB, the higher the frequency multiplier, the higher the CPU frequency. But in fact, under the premise of the same FSB, a high-multiplier CPU itself is of little significance. This is because the data transmission speed between the CPU and the system is limited. A CPU that blindly pursues high multipliers and obtains a high main frequency will have an obvious "bottleneck" effect - the maximum speed at which the CPU obtains data from the system cannot satisfy the CPU's computing requirements. speed. Generally speaking, except for the engineering samples, Intel's CPUs have multipliers locked, but AMD has not locked them before.

6.CPU cache

Cache size is also one of the important indicators of the CPU, and the structure and size of the cache have a great impact on the CPU speed. The cache in the CPU runs at an extremely high frequency, generally operating at the same frequency as the processor, and its working efficiency is much greater than that of the system memory and harddisk. In actual work, the CPU often needs to read the same data block repeatedly, and the increase in cache capacity can greatly improve the hit rate of reading data within the CPU without having to search for it in the memory or hard disk, thereby improving system performance. . However, due to factors such as CPU chip area and cost, the cache is very small.

L1 Cache (level one cache) is the first level cache of the CPU, which is divided into data cache and instruction cache. The capacity and structure of the built-in L1 cache have a greater impact on the performance of the CPU. However, the cache memory is composed of static RAM and has a complicated structure. When the CPU die area cannot be too large, the capacity of the L1 cache is not sufficient. Probably made too big. The capacity of the L1 cache of a general server CPU is usually 32-256KB.

L2 Cache (second level cache) is the second layer cache of the CPU, which is divided into internal and external chips. The internal on-chip L2 cache runs at the same speed as the main frequency, while the external L2 cache only runs at half the main frequency. The L2 cache capacity will also affect the performance of the CPU. The principle is that the bigger the better. The largest capacity of the current home CPU is 512KB, while the L2 cache of the CPU on servers and workstations is as high as 256-1MB, and some are as high as 2MB or 3MB. .

L3 Cache (three-level cache) is divided into two types. The early one was external, and the current ones are built-in. Its actual effect is that the application of L3 cache can further reduce memory latency and improve processor performance when calculating large amounts of data. Reducing memory latency and improving large-data computing capabilities are helpful for games. In the server field, adding L3 cache still has a significant improvement in performance. For example, a configuration with a larger L3 cache will use physical memory more efficiently, so it can handle more data requests than a slower disk I/O subsystem. Processors with larger L3 caches provide more efficient file system cache behavior and shorter message and processor queue lengths.

In fact, the earliest L3 cache was applied to the K6-III processor released by AMD. The L3 cache at that time was limited by the manufacturing process and was not integrated into the chip, but was integrated on the motherboard. The L3 cache, which can only be synchronized with the system bus frequency, is actually not much different from the main memory. Later, the L3 cache was used by Intel's Itanium processor for the server market. Then there are P4EE and Xeon MP. Intel also plans to launch an Itanium2 processor with 9MB L3 cache, and later a dual-core Itanium2 processor with 24MB L3 cache.

But basically the L3 cache is not very important to improve the performance of the processor. For example, the Xeon MP processor equipped with 1MB L3 cache is still not the opponent of Opteron. It can be seen that the increase of the front-side bus is more effective than the increase of the cache. Performance improvements.

[Cut-Page]

7. CPU extended instruction set

The CPU relies on instructions to calculate and control the system. Each CPU is designed with a series of instruction systems that match its hardware circuits. The strength of instructions is also an important indicator of the CPU. The instruction set is one of the most effective tools to improve the efficiency of microprocessors. From the current mainstream architecture, the instruction set can be divided into two parts: complex instruction set and simplified instruction set. From the perspective of specific applications, such as Intel's MMX (Multi Media Extended), SSE, SSE2 (Streaming-Single instruction multiple data -Extensions 2), SEE3 and AMD's 3DNow! are all extended instruction sets of the CPU, which respectively enhance the multimedia, graphics and Internet processing capabilities of the CPU.

We usually refer to the extended instruction set of the CPU as the "CPU instruction set". The SSE3 instruction set is also the smallest instruction set currently. Previously, MMX contained 57 commands, SSE contained 50 commands, SSE2 contained 144 commands, and SSE3 contained 13 commands. Currently, SSE3 is also the most advanced instruction set. Intel Prescott processors already support the SSE3 instruction set. AMD will add support for the SSE3 instruction set to future dual-core processors. Transmeta processors will also support this instruction set.

8. CPU core and I/O operating voltage

Starting from the 586CPU, the working voltage of the CPU is divided into two types: core voltage and I/O voltage. Usually the core voltage of the CPU is less than or equal to the I/O voltage. The size of the core voltage is determined based on the CPU's production process. Generally, the smaller the production process, the lower the core operating voltage; I/O voltages are generally 1.6~5V. Low voltage can solve the problems of excessive power consumption and excessive heat generation.

9. Manufacturing process

The micron of the manufacturing process refers to the distance between circuits within the IC. The trend in manufacturing processes is towards higher density. Higher-density IC circuit designs mean that ICs of the same size can have circuit designs with higher density and more complex functions. Now the main ones are 180nm, 130nm and 90nm. Recently, officials have stated that there is a 65nm manufacturing process.

10. Instruction set

(1)CISC instruction set

CISC instruction set, also known as complex instruction set, the English name is CISC, (abbreviation of Complex Instruction Set Computer). In a CISC microprocessor, each instruction of the program is executed serially in order, and the operations in each instruction are also executed serially in order. The advantage of sequential execution is simple control, but the utilization rate of various parts of the computer is not high and the execution speed is slow. In fact, it is the x86 series (that is, IA-32 architecture) CPU produced by Intel and its compatible CPUs, such as AMD and VIA. Even the new X86-64 (also called AMD64) belongs to the category of CISC.

To know what an instruction set is, we have to start with today's X86 architecture CPU. The X86 instruction set was specially developed by Intel for its first 16-bit CPU (i8086). The CPU in the world's first PC—i8088 (simplified version of i8086) launched by IBM in 1981 also used X86 instructions. At the same time, the computer The X87 chip was added to improve floating-point data processing capabilities. From now on, the X86 instruction set and the X87 instruction set will be collectively referred to as the X86 instruction set.

Although with the continuous development of CPU technology, Intel has successively developed newer i80386, i80486, up to the past PII Xeon, PIII Xeon, Pentium 3, and finally to today's Pentium 4 series, Xeon (excluding Xeon Nocona) , but in order to ensure that the computer can continue to run various applications developed in the past to protect and inherit rich software resources, all CPUs produced by Intel continue to use the X86 instruction set, so its CPUs still belong to the X86 series. Since the Intel X86 series and its compatible CPUs (such as AMD Athlon MP,) all use the X86 instruction set, today's huge lineup of X86 series and compatible CPUs has been formed. x86CPU currently mainly includes Intel server CPU and AMD server CPU.

(2)RISC instruction set

RISC is the abbreviation of "Reduced Instruction Set Computing" in English, which means "reduced instruction set" in Chinese. It was developed on the basis of the CISC instruction system. Someone tested the CISC machine and showed that the frequency of use of various instructions is quite different. The most commonly used instructions are some relatively simple instructions, which only account for 20% of the total number of instructions. But the frequency of occurrence in the program accounts for 80%. A complex instruction system will inevitably increase the complexity of the microprocessor, making the development of the processor long and costly. And complex instructions require complex operations, which will inevitably reduce the speed of the computer. Based on the above reasons, RISC CPUs were born in the 1980s. Compared with CISC CPUs, RISC CPUs not only streamlined the instruction system, but also adopted something called "superscalar and super-pipeline structure", which greatly increased parallel processing capabilities. .

The RISC instruction set is the development direction of high-performance CPUs. It is opposed to traditional CISC (Complex Instruction Set). In comparison, RISC has a unified instruction format, fewer types, and fewer addressing methods than complex instruction sets. Of course, the processing speed is greatly improved. At present, CPUs with this instruction system are commonly used in mid-to-high-end servers, especially high-end servers all use CPUs with the RISC instruction system. The RISC instruction system is more suitable for UNIX, the operating system of high-end servers. Now Linux is also a UNIX-like operating system. RISC-type CPUs are not compatible with Intel and AMD CPUs in software and hardware.

At present, the CPUs that use RISC instructions in mid-to-high-end servers mainly include the following categories: PowerPC processors, SPARC processors, PA-RISC processors, MIPS processors, and Alpha processors.

(3)IA-64

There has been a lot of debate about whether EPIC (Explicitly Parallel Instruction Computers) is the successor to the RISC and CISC systems. Taking the EPIC system alone, it is more like an important step for Intel's processors to move towards the RISC system. Theoretically, the CPU designed by the EPIC system can handle Windows application software much better than Unix-based application software under the same host configuration.

Intel's server CPU using EPIC technology is Itanium (development codename: Merced). It is a 64-bit processor and the first in the IA-64 series. Microsoft has also developed an operating system code-named Win64 and supports it in software. After Intel adopted the set, so the IA-64 architecture using the EPIC instruction set was born. IA-64 is a huge improvement over x86 in many aspects. It breaks through many limitations of the traditional IA32 architecture and achieves breakthrough improvements in data processing capabilities, system stability, security, usability, and considerable rationality.

The biggest flaw of IA-64 microprocessors is their lack of compatibility with x86. In order for Intel's IA-64 processors to better run software from two generations, it has used IA-64 processors (Itanium, Itanium2... ) introduces the x86-to-IA-64 decoder, which can translate x86 instructions into IA-64 instructions. This decoder is not the most efficient decoder, nor is it the best way to run x86 code (the best way is to run x86 code directly on the x86 processor), so the performance of Itanium and Itanium2 when running x86 applications Very bad. This has also become the fundamental reason for the emergence of X86-64.

(4)X86-64 (AMD64 / EM64T)

Designed by AMD, it can handle 64-bit integer operations at the same time and is compatible with the X86-32 architecture. It supports 64-bit logical addressing and provides the option of converting to 32-bit addressing; however, the data operation instructions default to 32-bit and 8-bit, and provides the option of converting to 64-bit and 16-bit; supports general-purpose registers, if it is a 32-bit operation , it is necessary to expand the result to a complete 64 bits. In this way, there is a difference between "direct execution" and "conversion execution" in the instruction. The instruction field is 8 bits or 32 bits, which can avoid the field being too long.

The creation of x86-64 (also called AMD64) is not groundless. The 32-bit addressing space of x86 processors is limited to 4GB of memory, and IA-64 processors are not compatible with x86. AMD fully considers the needs of customers and enhances the functions of the x86 instruction set so that this instruction set can support 64-bit computing modes at the same time. Therefore, AMD calls their structure x86-64. Technically, in order to perform 64-bit operations in the x86-64 architecture, AMD has introduced a new R8-R15 general-purpose register as an expansion of the original Use these registers. The original registers such as EAX and EBX have also been expanded from 32 bits to 64 bits. Eight new registers have been added to the SSE unit to provide support for SSE2. The increase in the number of registers will lead to performance improvements. At the same time, in order to support both 32- and 64-bit codes and registers, the x86-64 architecture allows the processor to work in the following two modes: Long Mode (long mode) and Legacy Mode (genetic mode). Long mode is divided into two sub-modes: Mode (64bit mode and Compatibility mode). The standard has been introduced in AMD's Opteron server processors.

This year, EM64T technology that supports 64-bit was also launched. Before it was officially named EM64T, it was IA32E. This is the name of Intel's 64-bit extension technology to distinguish the X86 instruction set. Intel's EM64T supports 64-bit sub-mode, which is similar to AMD's X86-64 technology. It uses 64-bit linear plane addressing, adds 8 new general-purpose registers (GPRs), and adds 8 registers to support SSE instructions. Similar to AMD, Intel's 64-bit technology will be compatible with IA32 and IA32E. IA32E will only be used when running a 64-bit operating system. IA32E will be composed of 2 sub-modes: 64-bit sub-mode and 32-bit sub-mode, which are backward compatible with AMD64. Intel's EM64T will be fully compatible with AMD's X86-64 technology. Now the Nocona processor has added some 64-bit technology, and Intel's Pentium 4E processor also supports 64-bit technology.

It should be said that both are 64-bit microprocessor architectures compatible with the x86 instruction set, but there are still some differences between EM64T and AMD64. The NX bit in the AMD64 processor will not be provided in Intel processors.

11. Superpipeline and superscalar

Before explaining superpipeline and superscalar, let's first understand the pipeline. The pipeline was first used by Intel in the 486 chip. The assembly line works like an assembly line in industrial production. In the CPU, an instruction processing pipeline is composed of 5-6 circuit units with different functions, and then an X86 instruction is divided into 5-6 steps and then executed by these circuit units respectively, so that one instruction can be completed in one CPU clock cycle. , thus increasing the computing speed of the CPU. Each integer pipeline of the classic Pentium is divided into four levels of pipeline, namely instruction prefetching, decoding, execution, and writing back results. The floating point pipeline is divided into eight levels of pipeline.

Superscalar uses built-in multiple pipelines to execute multiple processors at the same time. Its essence is to trade space for time. The super pipeline is to complete one or more operations in one machine cycle by refining the pipeline and increasing the main frequency. Its essence is to exchange time for space. For example, the Pentium 4's pipeline is as long as 20 stages. The longer the steps (stages) of the pipeline are designed, the faster it can complete an instruction, so it can adapt to CPUs with higher operating frequencies. However, an excessively long pipeline also brings certain side effects. It is very likely that the actual computing speed of a CPU with a higher frequency will be lower. This is the case with Intel's Pentium 4, although its main frequency can be as high as 1.4G or more. , but its computing performance is far inferior to AMD's 1.2G Athlon or even Pentium III.

12. Package form

CPU packaging is a protective measure that uses specific materials to solidify the CPU chip or CPU module in it to prevent damage. Generally, the CPU must be packaged before it can be delivered to the user. The packaging method of the CPU depends on the CPU installation form and device integration design. From a broad classification point of view, CPUs usually installed using Socket sockets are packaged using PGA (grid array), while CPUs installed using Slot x slots are all packaged using SEC (Single-sided junction box) form of packaging. There are also packaging technologies such as PLGA (Plastic Land Grid Array) and OLGA (Organic Land Grid Array). Due to increasingly fierce market competition, the current development direction of CPU packaging technology is mainly cost saving.

[Cut-Page]