Saturday, December 26, 2015

Linux: Use smartctl To Check Disk Behind Adaptec RAID Controllers

http://www.cyberciti.biz/faq/linux-checking-sas-sata-disks-behind-adaptec-raid-controllers

I can use the "smartctl -d ata -a /dev/sdb" command to read hard disk health status directly connected to my system. But, how do I read smartctl command to check SAS or SCSI disk behind Adaptec RAID controller from the shell prompt on Linux operating system?

You need to use the following syntax to check SATA or SAS disk which are typically simulate a (logical) disk for each array of (physical) disks to the OS. /dev/sgX can be used as pass through I/O controls providing direct access to each physical disk for Adaptec raid controllers.

Is my Adaptec RAID card detected by Linux?

Type the following command:
# lspci | egrep -i 'raid|adaptec'
Sample outputs:
81:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)

Download and install Adaptec Storage Manager

You need to install Adaptec Storage Manager for your Linux distribution as per installed RAID card. Visit this site to grab the software.

SATA Health Check Disk Syntax

To scan disk, enter:
# smartctl --scan
Sample outputs:
/dev/sda -d scsi # /dev/sda, SCSI device
So /dev/sda is one device reported as SCSI device. This RAID device is made of 4 disks located in /dev/sg{1,2,3,4}. Type the following smartclt command to check disk behind /dev/sda raid:
# smartctl -d sat --all /dev/sgX
# smartctl -d sat --all /dev/sg1

Ask the device to report its SMART health status or pending TapeAlert message if any, run:
# smartctl -d sat --all /dev/sg1 -H
For SAS disk use the following syntax:
# smartctl -d scsi --all /dev/sgX
# smartctl -d scsi --all /dev/sg1
### Ask the device to report its SMART health status or pending TapeAlert message ###
# smartctl -d scsi --all /dev/sg1 -H

Sample outputs:
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Device: SEAGATE  ST3146855SS      Version: 0002
Serial number: xxxxxxxxxxxxxxx
Device type: disk
Transport protocol: SAS
Local Time is: Wed Jul  7 04:34:30 2010 CDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
 
Current Drive Temperature:     24 C
Drive Trip Temperature:        68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 1857385803
  Blocks received from initiator = 1967221471
  Blocks read from cache and sent to initiator = 804439119
  Number of read and write commands whose size <= segment size = 312098925
  Number of read and write commands whose size > segment size = 45998
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 13224.42
  number of minutes until next internal SMART test = 42
 
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   58984049        1         0  58984050   58984050       3151.730           0
write:         0        0         0         0          0   9921230881.600           0
verify:     1308        0         0      1308       1308          0.000           0
 
Non-medium error count:        0
No self-tests have been logged
Long (extended) Self Test duration: 1367 seconds [22.8 minutes]
 
Here is another output from SAS based disk called /dev/sg2
# smartctl -d scsi --all /dev/sg2 -H
Sample outputs:
Fig.01: How To Check Hardware Raid Status in Linux Command Line
Fig.01: How To Check Hardware Raid Status in Linux Command Line

Replace /dev/sg1 with your disk number. If you've raid 10 array with 4 disks than:
  • /dev/sg0 - RAID 10 controller (you will not get any info or /dev/sg0).
  • /dev/sg1 - First disk in RAID 10 array.
  • /dev/sg2 - Second disk in RAID 10 array.
  • /dev/sg3 - Third disk in RAID 10 array.
  • /dev/sg4 - Fourth disk in RAID 10 array.

How do I run hard disk check?

Type the following command:
# smartctl -t short -d scsi /dev/sg2
# smartctl -t long -d scsi /dev/sg2

Where,
  1. -t short : Run short test.
  2. -t long : Run long test.
  3. -d scsi : Specify scsi as device type.
  4. --all : Show all SMART information for device.

How do I use Adaptec Storage Manager?

Another simple command to just check basic status is as follows:
# /usr/StorMan/arcconf getconfig 1 | more
# /usr/StorMan/arcconf getconfig 1 | grep State
# /usr/StorMan/arcconf getconfig 1 | grep -B 3 State

Sample outputs:
----------------------------------------------------------------------
      Device #0
         Device is a Hard drive
         State                              : Online
--
         S.M.A.R.T.                         : No
      Device #1
         Device is a Hard drive
         State                              : Online
--
         S.M.A.R.T.                         : No
      Device #2
         Device is a Hard drive
         State                              : Online
--
         S.M.A.R.T.                         : No
      Device #3
         Device is a Hard drive
         State                              : Online
 
Please note that newer version of arcconf is located in /usr/Adaptec_Event_Monitor directory. So your full path must be as follows:
# /usr/Adaptec_Event_Monitor/arcconf getconfig [AD | LD [LD#] | PD | MC | [AL]] [nologs]
Where,
 Prints controller configuration information.

    Option  AD  : Adapter information only
            LD  : Logical device information only
            LD# : Optionally display information about the specified logical device
            PD  : Physical device information only
            MC  : Maxcache 3.0 information only
            AL  : All information (optional)

How do I check the health of my Adaptec RAID array itself on Linux?

\
Simply use the following command:
# /usr/Adaptec_Event_Monitor/arcconf getconfig 1
OR (older version)
# /usr/StorMan/arcconf getconfig 1
Sample outputs:
Fig.02:  Device #1 is Online, while Device #2 is Failed i.e. you have a degraded array.
Fig.02: Device #1 is Online, while Device #2 is Failed i.e. you have a degraded array.

See also:

No comments:

Post a Comment