Date: Fri, 19 May 1995 12:06:36 -0600 From: Mark Weil [Kodak FE Abq] Message-Id: <9505191806.AA01073@zia.West.Sun.COM> To: Paul Caskey Subject: scsi HISTORY: While preparing an IPC to be taken home I encountered some problems using a Sun 1.05GB 3.5" disk. The solution may be interesting to some. SYSTEM CONFIGURATION: Sun IPC --- internal Sun 1.05GB 3.5" disk (id=3, sd0) Empty --- external Seagate WREN V 500MB 5.25" disk (id=1, sd1) SunOS --- external Sun 150MB .25" tape drive (id=4, st0) The intent was to copy the OS from the WREN V to the 1.05GB internal disk so the IPC could be taken home. PROBLEM SUMMARY: During the disk-to-disk dump (500MB external to 1.05GB internal) messages in the console window indicated (at least to me) very bad problems. The messages were of the form: Jan 4 14:45:07 sass001 vmunix: sd0a: Error for command 'write' Jan 4 14:45:07 sass001 vmunix: sd0a: Error Level: Retryable Jan 4 14:45:07 sass001 vmunix: sd0a: Block 17256, Absolute Block: 17256 Jan 4 14:45:07 sass001 vmunix: sd0a: Sense Key: Aborted Command Jan 4 14:45:07 sass001 vmunix: sd0a: Vendor 'SEAGATE' error code: 0x47 These messages were repeated with a different block number and were being generated on the order of, say, 10e6 per second (lots!!!). I interpeted these messages to mean the disk was bad, and the operation was NOT successful. I assumed the resulting 1.05GB disk would not be useable as an OS disk. I called Mark @ Sun and discussed the problem with him. He located a wonderful "fix it" document on the SunSolve CD. I have included that document at the end of this one. SOLUTION SUMMARY: The "fix" was to use the adb utility and modify the kernel to operate the SCSI bus at only asynch (slow) speeds. adb -w /vmunix scsi_options?W 58 $q Now reboot the system and repeat tests. The subsequent disk-to-disk dumps were successful - no errors were generated! OTHER APPLICATIONS: I have seen a similiar problem on moat when booting and also during the monthly disk-to-disk dumps. The messages are a little different, but still indicate the need for the "fix". The messages on moat were of the form: Dec 6 15:19:17 moat vmunix: esp0: data transfer overrun Dec 6 15:19:17 moat vmunix: State=DATA Last State=DATA_DONE Dec 6 15:19:17 moat vmunix: Latched stat=0x11 intr=0x10 fifo 0x1 Dec 6 15:19:17 moat vmunix: last msg out: ; last msg in: IDENTIFY Dec 6 15:19:17 moat vmunix: DMA csr=0x80000000 Dec 6 15:19:17 moat vmunix: addr=fff0c000 last=fff05e01 last_count=61ff Dec 6 15:19:17 moat vmunix: Cmd dump for Target 1 Lun 0: Dec 6 15:19:17 moat vmunix: cdb=[ 0x8 0x0 0x7a 0x20 0x50 0x0 0x0 0x0 0x0 0x0 ] Dec 6 15:19:17 moat vmunix: pkt_state 0xf pkt_flags 0x0 pkt_statistics 0x3 Dec 6 15:19:17 moat vmunix: cmd_flags=0x21 cmd_timeout 35 Dec 6 15:19:17 moat vmunix: Mapped Dma Space: Dec 6 15:19:17 moat vmunix: Base = 0x2000 Count = 0xa000 Dec 6 15:19:17 moat vmunix: Transfer History: Dec 6 15:19:17 moat vmunix: Base = 0x2000 Count = 0xa000 Dec 6 15:19:17 moat vmunix: current phase 0x26=DATAIN stat=0x1 0x61ff Dec 6 15:19:17 moat vmunix: current phase 0x1b=RESEL stat=0x7 0x1 0x0 Dec 6 15:19:17 moat vmunix: current phase 0x5=MSG_IN stat=0x7 0x4 Dec 6 15:19:17 moat vmunix: current phase 0x28=DISCONNECT stat=0x7 0xa000 Dec 6 15:19:17 moat vmunix: current phase 0x2c=SAVEDP stat=0x7 0xa000 Dec 6 15:19:17 moat vmunix: current phase 0x26=DATAIN stat=0x11 0xa000 Dec 6 15:19:17 moat vmunix: current phase 0x1b=RESEL stat=0x17 0x1 0x0 Dec 6 15:19:17 moat vmunix: current phase 0x5=MSG_IN stat=0x17 0x4 Dec 6 15:19:17 moat vmunix: current phase 0x28=DISCONNECT stat=0x17 0xa000 Dec 6 15:19:17 moat vmunix: current phase 0x2c=SAVEDP stat=0x17 0xa000 Dec 6 15:19:17 moat vmunix: current phase 0x20=SELECT stat=0x10 0x1 0x0 Dec 6 15:19:17 moat vmunix: current phase 0x1=CMD_START stat=0x10 0x8 0x20 Dec 6 15:19:17 moat vmunix: current phase 0xb=CMD_CMPLT stat=0x17 0x400 Dec 6 15:19:17 moat vmunix: current phase 0x27=STATUS stat=0x17 0x0 Dec 6 15:19:17 moat vmunix: current phase 0xb=CMD_CMPLT stat=0x13 Dec 6 15:19:17 moat vmunix: current phase 0x26=DATAIN stat=0x11 0x400 Dec 6 15:19:17 moat vmunix: esp0: Target 1.0 reducing sync. transfer rate Dec 6 15:19:17 moat vmunix: esp0: Reverting to slow SCSI cable mode Dec 6 15:19:17 moat vmunix: sd1: SCSI transport failed: reason 'data_ovr': retrying command I applied the kernel fix today and will reboot moat this evening. I will examine the boot messages for errors, but I expect none. THINGS TO REMEMBER: The "fix" outlined in this document is a change to the kernel. Any time a kernel is rebuilt this change will have to be included. For example you rebuild a kernel to increase the MAXUSERS variable, you must remember to apply the "adb" patch outlined above prioir to rebooting. According to the Help Document located by Mark @ Sun these messages do not indicate a Fatal problem - everything should work fine - the fact that a message appears in the console window every 1/10th of a second can't be good for performance! HELP DOCUMENT LOCATED BY Mark@Sun: INFODOC ID : 1109 SYNOPSIS : guidelines for support of fast (10MB/sec) SCSI systems DETAIL DESCRIPTION : SCSI Configurations using Single-Ended Devices SCOPE: The high performance SCSI devices now available provide the capability of significantly improving system performance for some applications. One of the special capabilities of these devices is the ability to transfer data at a 10-megabyte- per-second data rate using the "fast SCSI" synchronous transfer timings defined by the SCSI-2 standard. These high performance SCSI devices are fully compatible with standard SCSI devices and will operate in almost all normal SCSI configurations. Some SCSI enclosures, cables, and terminators do not take into account the special loading and impedance matching requirements for fast SCSI. The attachment of such peripherals may cause systems using fast SCSI devices to operate incorrectly. Such nonconforming SCSI cables and enclosures include some of Sun's early designs and some third-party cables, terminators, and peripheral device enclosures. The installation manuals for all fast SCSI devices and all new Sun installation manuals contain the strong recommendation that fast SCSI devices not be placed on the same SCSI port with SCSI components that do not conform with the requirements for fast SCSI. This paper provides recommendations for the technical modifications that can be made in a SCSI system to allow the operation of fast SCSI and nonconforming enclosures, cables, or terminators on the same system. SOLUTION SUMMARY: 1.0 IDENTIFICATION OF SUN SYSTEMS REQUIRING SPECIAL ATTENTION Differential SCSI host adapters and devices, including the DSBE/S card and the Differential SCSI Data Center Disk Tray, are all designed to meet fast SCSI requirements and will operate at 10 Megabytes per second. The maximum total cable length of a differential SCSI system is 25 meters. The installation guides for the SCSI devices indicate the equivalent cable length of the device. SCSI host systems that operate at 5 megabytes per second, including all Sun SPARC-based systems developed prior to the SPARCsystem 10, will support any presently defined configuration of 5 megabyte SCSI devices. A fast SCSI device can be installed on such systems, since the host and the fast SCSI device automatically negotiate the proper operational speed. Fast SCSI devices attached to 5 megabyte hosts will only operate at 5 megabytes, but the capacity and access latency improvements provided by many such devices can still improve the flexibility and performance of such systems. Single-ended SCSI systems operating at 5 megabytes have a maximum total cable length of 6 meters. 1.1 SCSI systems and host adapters that operate at 10 megabytes per second, including the SPARCsystem 600MP series, the SPARCsystem 10, and the FSBE/S host adapter, will support any presently defined configuration of 5 megabyte devices. Again, the host will determine automatically that the devices are 5 megabyte per second devices and negotiate the proper operational speed with each device. SCSI host systems that operate at 10 megabytes per second and have at least one fast SCSI device attached require that the entire SCSI port configuration be composed of components that will support fast SCSI. The components include cables, device enclosures, and terminators. The recent Sun SCSI products, including the Desktop Storage Pack, the Desktop Storage Module, and SCSI Expansion Pedestal are devices and enclosures that meet the fast SCSI requirements. The regulated terminator (Sun part number 150-1785-02) meets the fast SCSI requirements. The host will negotiate with the 10 megabyte devices to perform 10 megabyte transfers and with each of the other devices to perform transfers at their preferred rates. Single-ended SCSI systems operating at 10 megabytes using the proper components have a maximum total cable length of 6 meters, in accordance with the proposed SCSI-3 standard. 1.2 Those Sun enclosures with the three-row 50-pin D connector, including the External Storage Module, do not meet the fast SCSI requirements. Those Sun enclosures with the Centronics-style 50-pin flat ribbon contact connector, including the Front Load 1/2-inch Tape Drive, do not meet the fast SCSI requirements. The Sun SCSI terminators other than 150-1785-02 do not meet the fast SCSI requirements. Section 4 of this paper defines the steps that must be taken to assure reliable operation of fast SCSI systems containing combinations of fast SCSI devices and components that do not meet the fast SCSI requirements. The maximum total cable length for such systems should not exceed 6 meters. SUMMARY OF SYSTEM REQUIREMENTS TABLE 1 | SCSI Host | fast SCSI | 5 Mbyte SCSI | Special | | Type | device | device | Modifications | | | installed? | installed? | Required? | |_____________|_____________|________________|_______________| | | | | | | 5 megabyte | don't care | don't care | no | |_____________|_____________|________________|_______________| | | | | | | 10 megabyte | no | don't care | no | |_____________|_____________|________________|_______________| | | | | | | 10 megabyte | yes | all conform | no | | | | to fast SCSI | | | | | requirements | | |_____________|_____________|________________|_______________| | | | | | | 10 megabyte | yes | one or more | yes | | | | don't conform | see section 4 | | | | to fast SCSI | | | | | requirements | | |_____________|_____________|________________|_______________| 2.0 IDENTIFICATION OF MIXED VENDOR SYSTEMS REQUIRING SPECIAL ATTENTION SCSI peripheral devices, connectors, and cables provided by companies other than Sun are not tested by Sun in the fast SCSI environment. If any of the following symptoms occur when using such devices in Sun fast SCSI systems, it may be because the peripheral device, related components, or the configuration does not conform to the fast SCSI requirements. The steps described in section 4 can usually be used to correct these symptoms if the components meet the standard SCSI requirements. The system will usually continue operating normally, even if these errors do occur, because as part of the software error recovery, the SCSI data rate is slowed to allow reliable operation. The maximum total cable length for such devices should be 6 meters if they properly follow the recommendations of the SCSI standards committee. CHART OF SYMPTOMS RELATED TO SCSI DEVICES NOT MEETING FAST SCSI REQUIREMENTS Sun OS 4.1.3 Examples of the warning system messages that occur during boot are contained in the appendix to this paper. The key words of one symptom are: Target 1.0 reducing sync. transfer rate SCSI transport failed: reason 'reset': retrying command Target 1.0 reverting to async. mode SCSI transport failed: reason 'reset': retrying command A second symptom may be: Current command timeout for Target 3 Lun 0 Cmd dump for Target 3 Lun 0: Target 3.0 reducing sync. transfer rate SCSI transport failed: reason 'reset': retrying command A third symptom may be: Error for command 'read' Error Level: Retryable Sense Key: Aborted Command Vendor 'XXYYZZ' error code: 0x47 Sun Solaris 2.x Examples of the warning system messages that occur during boot are contained in the appendix to this paper. The key words of one symptom are: WARNING: .... SCSI bus DATA IN phase parity error WARNING: .... Error for command 'read' Error Level: Retryable Sense Key: Aborted Command ...... A second symptom may be: WARNING: .... SCSI transport failed: reason 'timeout':retrying command The present negotiated data rate in kilobytes per second can be determined for a disk by requesting the necessary data with the prtconf command as shown below. If the negotiated rate is lower than expected, error recovery procedures may have been executed because of nonconforming devices in the configuration. # prtconf -v esp, unit #0 Driver software properties: name length <4> value <0x00002710>. The value 0x00002710 is 10000 kilobytes per second in decimal. If the boot process was not observed, the boot messages are stored in the file /var/adm/messages for reference. The messages can be displayed by performing the command: # dmesg | more 3.0 METHODS FOR MANAGING FAST SCSI SYSTEMS WITH NONCONFORMING COMPONENTS 3.1 Follow installation recommendations The use of fast SCSI hosts and fast SCSI peripherals provides significant performance improvements for some types of applications. To take full advantage of those performance improvements, the installation guides for SCSI devices recommend that only those components and peripheral devices supporting fast SCSI requirements be installed on a fast SCSI port. If nonconforming devices must also be installed on a host, a separate SCSI host adapter should be installed and all the nonconforming devices should be installed on that SCSI port, isolated from all the fast SCSI devices that are running on fast SCSI host adapters. 3.2 Actively terminate SCSI configurations containing the ESM The External Storage Module (ESM) is a special case, since it conforms to the fast SCSI requirements except for its adapter cable and terminator. The following procedure allows the correct termination of the External Storage Module and allows correct fast SCSI operation for all fast SCSI devices installed on the SCSI port as well as normal synchronous operation for the devices installed in the ESM. One or two ESMs may be installed in the middle of a string of SCSI devices. Use a Desktop Storage Pack or Desktop Storage Module with a regulated terminator (Sun part number 150-1785-02) as the device farthest away from the host on the SCSI port. Connect the ESM's into the string of SCSI devices using 0.8 m Sun cables. (Sun part number 530-1829-01, Rev.51). Do not exceed the maximum total cable length of 6 meters. 3.3 Slow all SCSI ports to asynchronous operation. For all other fast SCSI hosts attaching devices that do not conform with the fast SCSI requirements, the operating system should be modified to run all SCSI ports in asynchronous mode. This slower mode fully interlocks all the SCSI data transfer signals and provides for reliable operation of the Extended Storage Module at the end of a SCSI bus. It allows Sun configurations containing both fast SCSI drives and nonconforming devices to operate reliably on fast SCSI ports. If the system configuration meets the standard SCSI requirements, reliable operation can usually be provided with third-party components and peripherals as well. The slower data rate applies to all SCSI ports on the system. Some applications may show a decrease in performance because of the slower data rate. For 4.1.x. OS: To change to the slower asynchronous data rate, type: adb -w /vmunix scsi_options?W 58 $q then reboot the system. To turn synchronous transfer back on at the highest possible speed, use the same procedure, replacing the middle line with: scsi_options?W 178 For Solaris 2.x: To change to the slower asynchronous data rate, add the following line to /etc/system file: set scsi_options = 0x58 then reboot the system. To turn synchronous transfer back on at the highest possible speed without using tagged queueing, change the scsi_options line to: set scsi_options = 0X178 To turn synchronous transfer back on at the highest possible speed allowing tagged queueing (if available in the operating system), change the scsi_options line to: set scsi_options = 0X1f8 APPENDIX A SAMPLES OF 4.1.3 ERROR MESSAGES In this example, target 1 (sd1 on esp0) is a fast scsi disk Sep 16 15:53:23 b34a vmunix: esp0: Target 1.0 reducing sync. transfer rate Sep 16 15:53:23 b34a vmunix: sd1: SCSI transport failed: reason 'reset': retrying command Sep 16 15:53:23 b34a vmunix: esp0: Current command timeout for Target 1 Lun 0 Sep 16 15:53:23 b34a vmunix: esp0: State=DATA_DONE (0xa), Last State=DATA (0x9) Sep 16 15:53:23 b34a vmunix: esp0: Cmd dump for Target 1 Lun 0: Sep 16 15:53:23 b34a vmunix: esp0: cdb=[0x8 0x0 0x7e 0x0 0x10 0x0 0x0 0x0 0x0 0x0] Sep 16 15:53:23 b34a vmunix: esp0: Target 1.0 reverting to async. mode Sep 16 15:53:23 b34a vmunix: sd1: SCSI transport failed: reason 'reset': retrying command or Sep 16 15:57:41 b34a vmunix: sd3 at esp0 target 0 lun 0 Sep 16 15:57:41 b34a vmunix: sd3: Sep 16 16:01:12 b34a vmunix: esp0: Current command timeout for Target 3 Lun 0 Sep 16 16:01:12 b34a vmunix: esp0: State=DATA_DONE (0xa), Last State=DATA (0x9) Sep 16 16:01:12 b34a vmunix: esp0: Cmd dump for Target 3 Lun 0: Sep 16 16:01:12 b34a vmunix: esp0: cdb=[0x8 0x0 0x0 0x0 0x7e 0x0 0x0 0x0 0x0 0x0] Sep 16 16:01:12 b34a vmunix: esp0: Target 3.0 reducing sync. transfer rate Sep 16 16:01:12 b34a vmunix: sd0: SCSI transport failed: reason 'reset': retrying command Sep 16 16:01:12 b34a vmunix: sd1: SCSI transport failed: reason 'reset': retrying command or Sep 16 16:36:51 b34a vmunix: sd3c: Error for command 'read' Sep 16 16:36:51 b34a vmunix: sd3c: Error Level: Retryable Sep 16 16:36:51 b34a vmunix: sd3c: Block 1386, Absolute Block: 1386 Sep 16 16:36:51 b34a vmunix: sd3c: Sense Key: Aborted Command Sep 16 16:36:51 b34a vmunix: sd3c: Vendor 'MICROP' error code: 0x47 SAMPLES OF SOLARIS 2.x ERROR MESSAGES In this example internal disk 1 (target 1) is a 10 MB/sec disk: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0): SCSI bus DATA IN phase parity error WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1, 0 (sd1): Error for command 'read' Error Level: Retryable Block 59640, Absolute Block: 59640 Sense Key: Aborted Command Vendor 'SEAGATE' error code: 0x48 (), 0x0 or: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1, 0 (sd1): SCSI transport failed: reason 'timeout': retrying command APPENDIX B TABLE OF DEVICES, SYSTEMS, AND THEIR FAST-SCSI CHARACTERISTICS SYSTEMS AND HOST ADAPTERS Official Name SCSI Data Rate SPARCsystem 10 fast SCSI 424 Megabyte internal Disk 5 MByte SCSI 1.05 Gigabyte internal Disk fast SCSI SPARCstation 1 5 MByte SCSI SPARCstation 1+ 5 MByte SCSI SPARCstation IPC 5 MByte SCSI SPARCstation SLC 5 MByte SCSI SPARCstation IPX 5 MByte SCSI SPARCstation ELC 5 MByte SCSI SPARCstation 2 5 MByte SCSI SPARCserver 4/330 5 MByte SCSI SPARCserver 4/370 5 MByte SCSI SPARCserver 4/390 5 MByte SCSI SPARCserver 630MP presently fast SCSI SPARCserver 670MP presently fast SCSI SPARCserver 690MP presently fast SCSI SBus SCSI Host Adapter 5 MByte SCSI SBE/S Host Adapter 5 MByte SCSI FSBE/S Host Adapter fast SCSI DSBE/S Host Adapter differential fast SCSI PERIPHERALS Official Name Common Name SCSI Data Rate Desktop Storage Pack Lunchbox 207 Megabyte Disk 5 MByte SCSI 424 Megabyte Disk 5 MByte SCSI Sun CD ROM 5 MByte SCSI 150 Megabyte 1/4" Tape 5 MByte SCSI Desktop Storage Module Dinnerbox 1.3 Gigabyte Disk 5 MByte SCSI 2.3 Gigabyte 8 mm Tape Drive 5 MByte SCSI 5.0 Gigabyte 8 mm Tape Drive 5 MByte SCSI SCSI Expansion Pedestal Bullwinkle 1.3 Gigabyte Disk 5 MByte SCSI 2.3 Gigabyte 8 mm Tape Drive 5 MByte SCSI 5.0 Gigabyte 8 mm Tape Drive 5 MByte SCSI Sun CD ROM 5 MByte SCSI 2.1 Gigabyte Disk differential fast SCSI Differential SCSI Data Center Disk Tray Tarzan 2.1 Gigabyte Disk differential fast SCSI Front Load Tape Drive 1/2" tape 5 MByte SCSI External Storage Module P-Box 5 MByte SCSI KEYWORDS : SCSI configurations, using single-ended devices PRODUCT : Prphl