Using the sas2ircu utility from LSI, we can blink the drive LED to help ID the failed drive correctly. Of course this requires a LSI card. Some LSI cards may need to use the sas3ircu utility instead. There have been some reports from the interwebs that this utility failed to blink the correct drive, but I have not experienced this myself.
As always use the supercomputer between your ears to ensure the physical serial and the serial reported by the system match, etc etc.
[root@jetstore] ~# sas2ircu list
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
0 SAS2308_2 1000h 87h 00h:06h:00h:00h 1000h 3020h
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
1 SAS2308_2 1000h 87h 00h:81h:00h:00h 1000h 3020h
SAS2IRCU: Utility Completed Successfully.
Back to the sas2ircu utility in a moment. We need to first acquire the serial number of the failed disk. For a system that is multipath, we can find the actual dev names by running the following to locate a disk in the fail state:
[root@jetstore] ~# gmultipath list | grep -i -B 10 fail
Consumers:
1. Name: da43
Mediasize: 3000592982016 (2.7T)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
2. Name: da16
Mediasize: 3000592982016 (2.7T)
Sectorsize: 512
Mode: r1w1e1
State: FAIL
Now we can see da16 is failed. Time to get the serial number of that disk. Or da43. they are the same just multipaths.
[root@jetstore] ~# smartctl -a /dev/da16 | grep Serial
Serial number: WMC1F0D5T1DF
Save that serial number for the next step.
Smartctl also outputs other useful information about the drive, statistics, etc. Worth checking out, but not relevant here.
Next, we can display the disks attached to one of those controllers. Be sure to input the correct serial number in the grep command:
[root@jetstore] ~# sas2ircu 0 display | grep -C 10 WMC1F0D5T1DF
Device is a Hard disk
Enclosure # : 3
Slot # : 20
SAS Address : 50000c0-f-01f9-f6eb
State : Ready (RDY)
Size (in MB)/(in sectors) : 2861588/5860533167
Manufacturer : WD
Model Number : WD3001FYYG-01SL3
Firmware Revision : VR08
Serial No : WDWMC1F0D5T1DF
GUID : N/A
Protocol : SAS
Drive Type : SAS_HDD
Get the enclosure and slot # of the failed drive and turn the led on:
sas2ircu 0 locate 3:20 ON
Turn the led off:
sas2ircu 0 locate 3:20 OFF
NOTE: If you are replacing a disk that is multipath, e.g. you see something like the following when you offline and remove a disk, ensure that the LED above is OFF or GEOM_MULTIPATH will not pickup the new disk as multipath. See the below log for what happens when a disk is inserted with the LED blinking Vs not blinking:
----------start drive detach event (already offline)------------
Aug 14 14:05:31 jetstore mps1: mpssas_prepare_remove: Sending reset for target ID 27
Aug 14 14:05:31 jetstore da43 at mps1 bus 0 scbus10 target 27 lun 0
Aug 14 14:05:31 jetstore da43: <WD WD3001FYYG-01SL3 VR08> s/n WMC1F0D5T1DF detached
Aug 14 14:05:31 jetstore GEOM_MULTIPATH: da43 in disk17 was disconnected
Aug 14 14:05:31 jetstore mps1: GEOM_MULTIPATH: all paths in disk17 were marked FAIL, restore da16
Aug 14 14:05:31 jetstore Unfreezing devq for target ID 27
Aug 14 14:05:31 jetstore GEOM_MULTIPATH: da16 is now active path in disk17
Aug 14 14:05:31 jetstore GEOM_MULTIPATH: da43 removed from disk17
Aug 14 14:05:31 jetstore (da43:mps1:0:27:0): Periph destroyed
Aug 14 14:05:31 jetstore mps0: mpssas_prepare_remove: Sending reset for target ID 38
Aug 14 14:05:31 jetstore da16 at mps0 bus 0 scbus2 target 38 lun 0
Aug 14 14:05:31 jetstore da16: <WD WD3001FYYG-01SL3 VR08> s/n WMC1F0D5T1DF detached
Aug 14 14:05:31 jetstore GEOM_MULTIPATH: da16 in disk17 was disconnected
Aug 14 14:05:31 jetstore mps0: GEOM_MULTIPATH: out of providers for disk17
Aug 14 14:05:31 jetstore Unfreezing devq for target ID 38
Aug 14 14:05:31 jetstore GEOM_MULTIPATH: da16 removed from disk17
Aug 14 14:05:31 jetstore GEOM_MULTIPATH: destroying disk17
Aug 14 14:05:31 jetstore GEOM_MULTIPATH: disk17 destroyed
Aug 14 14:05:31 jetstore (da16:mps0:0:38:0): Periph destroyed
----------end detach event-------------
----------start insert with LED BLINKING - note no GEOM_MULTIPATH----------
Aug 14 14:10:27 jetstore da16 at mps0 bus 0 scbus2 target 50 lun 0
Aug 14 14:10:27 jetstore da16: da43 at mps1 bus 0 scbus10 target 39 lun 0
Aug 14 14:10:27 jetstore syslog-ng[1426]: Error processing log message: <WD WD3001FYYG-01SL3 VR08> Fixed Direct Access SPC-4 SCSI device
Aug 14 14:10:27 jetstore da43: da16: Serial Number WMC1F0D9UX1U
Aug 14 14:10:27 jetstore syslog-ng[1426]: Error processing log message: <WD WD3001FYYG-01SL3 VR08> Fixed Direct Access SPC-4 SCSI device
Aug 14 14:10:27 jetstore da16: 600.000MB/s transfersda43: Serial Number WMC1F0D9UX1U
Aug 14 14:10:27 jetstore da43: 600.000MB/s transfersda16: Command Queueing enabled
Aug 14 14:10:27 jetstore da16: 2861588MB (5860533168 512 byte sectors)
Aug 14 14:10:27 jetstore da43: Command Queueing enabled
Aug 14 14:10:27 jetstore da43: 2861588MB (5860533168 512 byte sectors)
Aug 14 14:10:27 jetstore ses3: da43,pass47: Element descriptor: 'Slot 21'
Aug 14 14:10:27 jetstore ses0: da16,pass18: Element descriptor: 'Slot 21'
Aug 14 14:10:27 jetstore ses3: da43,pass47: SAS Device Slot Element: 1 Phys at Slot 20
Aug 14 14:10:27 jetstore ses0: da16,pass18: SAS Device Slot Element: 1 Phys at Slot 20
Aug 14 14:10:27 jetstore ses3: phy 0: SAS device type 1 id 0
Aug 14 14:10:27 jetstore ses0: phy 0: SAS device type 1 id 1
Aug 14 14:10:27 jetstore ses3: phy 0: protocols: Initiator( None ) Target( SSP )
Aug 14 14:10:27 jetstore ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Aug 14 14:10:27 jetstore ses3: phy 0: parent 50030480003c273f addr 50000c0f0137b686
Aug 14 14:10:27 jetstore ses0: phy 0: parent 50030480003c27bf addr 50000c0f0137b687
-------end insert with LED BLINKING-------
------start insert with LED off----------------
Aug 14 14:28:53 jetstore da16 at mps0 bus 0 scbus2 target 50 lun 0
Aug 14 14:28:53 jetstore da43 at mps1 bus 0 scbus10 target 39 lun 0
Aug 14 14:28:53 jetstore da16: da43: <WD WD3001FYYG-01SL3 VR08> Fixed Direct Access SPC-4 SCSI device
Aug 14 14:28:53 jetstore syslog-ng[1426]: Error processing log message: <WD WD3001FYYG-01SL3 VR08> Fixed Direct Access SPC-4 SCSI device
Aug 14 14:28:53 jetstore da16: Serial Number WMC1F0D9UX1U
Aug 14 14:28:53 jetstore da43: Serial Number WMC1F0D9UX1U
Aug 14 14:28:53 jetstore da16: 600.000MB/s transfersda43: 600.000MB/s transfers
Aug 14 14:28:53 jetstore da16: Command Queueing enabled
Aug 14 14:28:53 jetstore da43: Command Queueing enabled
Aug 14 14:28:53 jetstore da16: 2861588MB (5860533168 512 byte sectors)
Aug 14 14:28:53 jetstore da43: 2861588MB (5860533168 512 byte sectors)
Aug 14 14:28:53 jetstore ses3: da43,pass47: Element descriptor: 'Slot 21'
Aug 14 14:28:53 jetstore ses0: da16,pass18: Element descriptor: 'Slot 21'
Aug 14 14:28:53 jetstore ses3: da43,pass47: SAS Device Slot Element: 1 Phys at Slot 20
Aug 14 14:28:53 jetstore ses0: da16,pass18: SAS Device Slot Element: 1 Phys at Slot 20
Aug 14 14:28:53 jetstore ses3: phy 0: SAS device type 1 id 0
Aug 14 14:28:53 jetstore ses0: phy 0: SAS device type 1 id 1
Aug 14 14:28:53 jetstore ses3: phy 0: protocols: Initiator( None ) Target( SSP )
Aug 14 14:28:53 jetstore ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Aug 14 14:28:53 jetstore ses3: phy 0: parent 50030480003c273f addr 50000c0f0137b686
Aug 14 14:28:53 jetstore ses0: phy 0: parent 50030480003c27bf addr 50000c0f0137b687
Aug 14 14:29:07 jetstore GEOM_MULTIPATH: disk17 created
Aug 14 14:29:07 jetstore GEOM_MULTIPATH: da16 added to disk17
Aug 14 14:29:07 jetstore GEOM_MULTIPATH: da16 is now active path in disk17
Aug 14 14:29:07 jetstore GEOM_MULTIPATH: da43 added to disk17
------end insert with LED off----------------