Common server failures
1. The main reasons why the server cannot start:
Mains or power line failure (power outage or poor contact)
Power supply or power module failure
Memory failure (usually accompanied by alarm sound)
CPU failure (generally there will be an alarm sound)
Motherboard failure
Interrupt conflicts caused by other plug-in cards
2. The server cannot be started?
Check whether the power cord and various I/O wiring are connected properly.
Check whether the motherboard is powered on after connecting the power cord.
Set the server to the minimum configuration (only connect a single CPU, minimal memory, and only connect the monitor and keyboard) and directly short the motherboard switch jumper to see if it can start.
Check the power supply, unplug all power interfaces, short-circuit the green and black wires of the mainboard power supply port of the power supply, and see if the power is turned on.
If it is judged that the power supply is normal, you need to use the replacement method to troubleshoot. The replacement method is to start with the easiest-to-replace accessories (memory, CPU, motherboard) in the minimal configuration.
3. System restarts frequently?
Reasons for frequent system restarts:
Power failure (judgment and solution by replacement method)
Memory failure (can be detected from BIOS error report)
Network port data traffic is too large (work pressure is too high)
Software failure (solved by updating or reinstalling the operating system)
4. Determining and handling server crash failures:
Server crash failures are difficult to determine and are generally divided into two aspects: software and hardware:
Software failure
Hardware failure
Software failure
First check the system log of the operating system. You can use the system log to determine some of the causes of the crash.
Causes of computer viruses.
A crash caused by a bug or vulnerability in the system software. This kind of failure needs to be made after judging that the hardware is faultless, and the software provider needs to provide help.
If the software is used improperly or the system working pressure is too high, you can ask the customer to appropriately reduce the working pressure of the server to see if it can be solved.
Hardware failure
Hardware conflict
Power failure or insufficient power supply can be determined by comparing and calculating all load power values of the server power supply.
Hard drive failure (check for bad sectors by scanning the hard drive surface)
Memory failure (can be judged by the error report in the motherboard BIOS and the error message of the operating system)
Motherboard failure (use the replacement method to determine)
CPU failure (using replacement method)
Board card failure (usually a SCSI/RAID card or other PCI device may also cause the system to crash, and the replacement method can be used to determine the solution)
Note: After the system crash fault is resolved, a certain pressure copy test must be performed within a period of time to check whether the fault has been completely resolved.
5. When installing the operating system, it prompts that the hard disk cannot be found?
Cause of failure:
No physical hard disk device
Hard drive cable connection problem
The hard disk controller driver is not installed or the driver is inconsistent.
6. How to obtain the driver?
Use the random CD to create the corresponding driver
[Cut-Page]
7. Why can’t the hard disk controller driver be loaded even with the correct driver?
Check whether the hostraid function is enabled.
8. After installing a newly purchased hard disk into the machine, the machine fails to pass the self-test?
Remove the new hard drive and see if the machine can pass the self-test;
Check whether the ID number of the newly added hard disk is the same as that of the original hard disk. If the ID number of the hard disk is the same, the self-test will fail.
9. How to format a SCSI hard drive?
If there is an operating system: use the disk management tool to format;
If there is no operating system: Format on the SCSI management control interface;
Take the ADAPTEC Raid card as an example: power on - when the CTRL+A message appears, press CTRL+A to enter - select channel A
- Check SCSI UTILITY - The hard drive will be detected - Check the hard drive to be detected
-Select FORMAT to fully format the hard drive
Select VERIFY to test the hard disk and check whether there are bad sectors.
Note: Do not interrupt or power outage when formatting the hard disk, otherwise the disk will be damaged.
10. There is a RAID card machine in the Aisino series. When one of the hard drives does not work properly and a RAID alarm occurs, but the system can run normally, what should I do?
Use a new hard drive, ensuring that the capacity is greater than or equal to the hard drive that is not working properly. It is best to replace it with a hard drive of the same model.
Common faults related to RAID cards
Category 1: There is a problem with the RAID card itself
RAID information is often lost, the hard disk often goes offline, REBUILD cannot be performed, and the hard disk cannot be detected during power-on self-check or it takes a long time.
Typical fault A:
After completing RAID1 and installing the operating system, everything was normal. However, when the system was restarted for the second time, an alarm sounded. After inspection, it was found that a hard disk was offline. After REBUILD, it returned to normal, but it was offline again after restarting. It was suspected that the hard drive was faulty, and there were no problems after checking the hard drive. Finally, the RAID card was replaced and the fault was solved.
Typical fault B:
The machine often freezes and sometimes starts very slowly. Observing the system log, we found that there was an error message when the system started: device /devices/scsi/port0 did not respond during the transmission waiting time. After replacing the RAID card, it returned to normal.
Category 2: Problems with the hard drive itself
The performance is that the hard disk is offline, the status in the RAID array is DEAD, or when doing REBUILD, the progress cannot be continued after a certain progress.
Typical faults:
After the hard disk goes offline, when doing REBUILD, an error message appears when reaching 20% and cannot continue. After confirming that the offline hard disk, hard disk box and SCSI cable are all working normally, the online hard disk is verified and bad sectors are found. Repair the hard disk and redo REBUILD to restore it to normal.
Category 3: Contact issues with hard drive boxes or modules
This kind of problem often manifests itself as the RAID card not detecting the hard disk at all. This kind of problem is relatively simple, but there are some issues that need to be paid attention to when dealing with machines related to hard disk boxes.
Typical faults:
The hard drive cannot be detected in the RIAD card. I connected the SCSI cable to the ULTRA160 interface on the motherboard. The fault persisted. I pulled out the hard drive box (excluding the bracket behind the hard drive box) and replaced it. The fault persisted. I replaced the hard drive, but it still didn't work. Finally, I removed the bracket (non-hot-swappable part) behind the hard drive box and found that a pin on the 80PIN interface on the rear bracket was bent. I straightened the bent pin and returned it to normal.
11. Why can’t the ID number of the SCSI hard disk used on the server be set to 7?
In the SCSI controller, ID=7 is set to the hard disk controller by default, so the ID number of the hard disk cannot be set to 7.
12. Why can’t I pass the power-on self-test?
Solution:
Turn off the power of the machine, open the chassis, and use the jumper cap of the "COMS CLEAR" jumper to short-circuit the other two pins of the "COMS CLEAR" jumper (refer to the motherboard manual for jumpers).
Power on the machine and perform self-test. After the machine self-test is completed, it will report that the CMOS has been cleared. Then turn off the power of the machine and reset the jumper.
Restart the machine
13. Physical memory slot error
Solution:
Turn on the computer - press F2 to enter "SETUP" - "ADVANCED" - "MEMORY CONFIGURATION" and press Enter - "CLEAR DIMM ERRORS" and press Enter directly.
14. Why does the processor report an error or only one processor is found during the self-test?
Solution:
Power on-->Press F2 key to enter "SETUP"
1. Go to "MAIN" --〉"PROCESSOR" --〉"CLEAR PROCESSOR ERRORS [ ]": Set the value of this option to "YES";
2. Click "ADVANCED " --> "RESET CONFIGURATION DATA [ ] ": Set the value of this option to " YES";
3. Click "SERVER " --> "PROCESSOR RESET [ ] ": Set the value of this option to " YES";
4. Click "SERVER " --> "SYSTEM MANAGEMENT ": Enter --> "CLEAR EVENTLOG [ ] ": Set the value of this option to " YES"
5. Press F10 to save and exit.
[Cut-Page]7. Why can’t the hard disk controller driver be loaded even with the correct driver?
Check whether the hostraid function is enabled.
8. After installing a newly purchased hard disk into the machine, the machine fails to pass the self-test?
Remove the new hard drive and see if the machine can pass the self-test;
Check whether the ID number of the newly added hard disk is the same as that of the original hard disk. If the ID number of the hard disk is the same, the self-test will fail.
9. How to format a SCSI hard drive?
If there is an operating system: use the disk management tool to format;
If there is no operating system: Format on the SCSI management control interface;
Take the ADAPTEC Raid card as an example: power on - when the CTRL+A message appears, press CTRL+A to enter - select channel A
- Check SCSI UTILITY - The hard drive will be detected - Check the hard drive to be detected
-Select FORMAT to fully format the hard drive
Select VERIFY to test the hard disk and check whether there are bad sectors.
Note: Do not interrupt or power outage when formatting the hard disk, otherwise the disk will be damaged.
10. There is a RAID card machine in the Aisino series. When one of the hard drives does not work properly and a RAID alarm occurs, but the system can run normally, what should I do?
Use a new hard drive, ensuring that the capacity is greater than or equal to the hard drive that is not working properly. It is best to replace it with a hard drive of the same model.
Common faults related to RAID cards
Category 1: There is a problem with the RAID card itself
RAID information is often lost, the hard disk often goes offline, REBUILD cannot be performed, and the hard disk cannot be detected during power-on self-check or it takes a long time.
Typical fault A:
After completing RAID1 and installing the operating system, everything was normal. However, when the system was restarted for the second time, an alarm sounded. After inspection, it was found that a hard disk was offline. After REBUILD, it returned to normal, but it was offline again after restarting. It was suspected that the hard drive was faulty, and there were no problems after checking the hard drive. Finally, the RAID card was replaced and the fault was solved.
Typical fault B:
The machine often freezes and sometimes starts very slowly. Observing the system log, we found that there was an error message when the system started: device /devices/scsi/port0 did not respond during the transmission waiting time. After replacing the RAID card, it returned to normal.
Category 2: Problems with the hard drive itself
The performance is that the hard disk is offline, the status in the RAID array is DEAD, or when doing REBUILD, the progress cannot be continued after a certain progress.
Typical faults:
After the hard disk goes offline, when doing REBUILD, an error message appears when reaching 20% and cannot continue. After confirming that the offline hard disk, hard disk box and SCSI cable are all working normally, the online hard disk is verified and bad sectors are found. Repair the hard disk and redo REBUILD to restore it to normal.
Category 3: Contact issues with hard drive boxes or modules
This kind of problem often manifests itself as the RAID card not detecting the hard disk at all. This kind of problem is relatively simple, but there are some issues that need to be paid attention to when dealing with machines related to hard disk boxes.
Typical faults:
The hard drive cannot be detected in the RIAD card. I connected the SCSI cable to the ULTRA160 interface on the motherboard. The fault persisted. I pulled out the hard drive box (excluding the bracket behind the hard drive box) and replaced it. The fault persisted. I replaced the hard drive, but it still didn't work. Finally, I removed the bracket (non-hot-swappable part) behind the hard drive box and found that a pin on the 80PIN interface on the rear bracket was bent. I straightened the bent pin and returned it to normal.
11. Why can’t the ID number of the SCSI hard disk used on the server be set to 7?
In the SCSI controller, ID=7 is set to the hard disk controller by default, so the ID number of the hard disk cannot be set to 7.
12. Why can’t I pass the power-on self-test?
Solution:
Turn off the power of the machine, open the chassis, and use the jumper cap of the "COMS CLEAR" jumper to short-circuit the other two pins of the "COMS CLEAR" jumper (refer to the motherboard manual for jumpers).
Power on the machine and perform self-test. After the machine self-test is completed, it will report that the CMOS has been cleared. Then turn off the power of the machine and reset the jumper.
Restart the machine
13. Physical memory slot error
Solution:
Turn on the computer - press F2 to enter "SETUP" - "ADVANCED" - "MEMORY CONFIGURATION" and press Enter - "CLEAR DIMM ERRORS" and press Enter directly.
14. Why does the processor report an error or only one processor is found during the self-test?
Solution:
Power on-->Press F2 key to enter "SETUP"
1. Go to "MAIN" --〉"PROCESSOR" --〉"CLEAR PROCESSOR ERRORS [ ]": Set the value of this option to "YES";
2. Click "ADVANCED " --> "RESET CONFIGURATION DATA [ ] ": Set the value of this option to " YES";
3. Click "SERVER " --> "PROCESSOR RESET [ ] ": Set the value of this option to " YES";
4. Click "SERVER " --> "SYSTEM MANAGEMENT ": Enter --> "CLEAR EVENTLOG [ ] ": Set the value of this option to " YES"
5. Press F10 to save and exit.