What is SMART?
SMART (Self-Monitoring, Analysis, and Reporting Technology) is a built-in monitoring and reporting technology included in computer hard disk drives (HDDs) and solid-state drives (SSDs).
SMART was introduced in the 1990s and quickly adopted by all disk manufacturers to detect and report indicators of drive reliability. The intent was to diagnose failure modes in returned disk drives. Later, it was extended to help predict imminent disk failures.
How SoftRAID Uses SMART
SoftRAID runs SMART checks:
- Once per day automatically
- Every time you launch the SoftRAID application
What SoftRAID monitors: While disk drives store and report a whole range of SMART values, only a few are actually associated with imminent drive failure. Based on extensive studies by Google and Backblaze analyzing hundreds of thousands of drives, SoftRAID monitors these critical parameters:
For all drives (HDDs and SSDs):
- SMART test status (pass/fail)
- Reallocated sectors
- Failed reallocations
- Pending reallocations
- Unreliable sectors
For flash media only (SSDs/NVMe):
- Media wear indicator
- Media worn out status
Additional monitoring: SoftRAID also tracks I/O errors, which indicate communication failures between your Mac and the drive. While not a “SMART” parameter, I/O errors can indicate hardware issues with the drive, enclosure, cables, or volume directory damage.
What devices can SoftRAID get SMART data from?
SoftRAID can retrieve SMART data from most modern storage devices, but compatibility varies by connection type and macOS version:
Thunderbolt-connected drives:
- All drives in Thunderbolt enclosures (including all OWC Thunderbay models)
- Full SMART data available
- Most reliable SMART reporting
NVMe drives:
- Most NVMe drives support SMART reporting
- Includes Apple internal SSDs (though not officially documented by Apple)
- Full wear data and health indicators available
USB-connected drives:
- macOS Tahoe (26.x) or later required for SMART over USB
- Most modern USB drives and enclosures supported
- Some older USB controllers and bridge chips do not pass SMART data to macOS
- If SMART status shows “test unavailable,” the USB controller doesn’t support SMART reporting
eSATA and SAS drives:
- Full SMART data available when properly connected
If SMART is unavailable: For drives that don’t support SMART (or USB drives on pre-Tahoe macOS), SoftRAID maintains its own power-on hours counter and tracks I/O errors to monitor drive health.
SMART Test Failure (Immediate Action Required)
What It Means
If you receive an alert that a disk has failed the SMART test, the drive has already failed and cannot recover.
Critical: A disk that fails SMART can cause data corruption on other drives in your enclosure.
What to Do
- Stop using the disk immediately
- Remove the disk from your enclosure as soon as possible
- Replace the disk with a new, certified drive
There may be time to copy data from the failing disk before it stops functioning completely, but this is risky. The drive could fail completely at any moment.
If your volume is RAID 4, RAID 5, or RAID 1+0: Your volume should continue functioning with one disk missing, allowing you to replace the failed drive and rebuild.
This applies to all drive types: HDDs, SSDs, and NVMe drives.
Predicted Failure Modes
Predicted failures indicate your drive is likely to fail soon but has not completely failed yet. These warnings give you time to replace the drive proactively.
Understanding Sectors
A drive is comprised of “chunks” of data called sectors. They are typically 512 bytes or 4KB (4096 bytes) in size. A “checksum” is added to each sector to help reconstruct the data if the sector cannot be read.
When a drive can no longer reliably read a sector, the data is automatically moved to another location on the drive. This is called a ”reallocated sector.”
Reallocated Sectors (HDDs only)
What it means: The drive has found bad sectors and moved the data to spare sectors elsewhere on the disk.
Why it matters: In the early days of disk drives, reallocated sectors were common and drives could have hundreds. However, as disks became more reliable, reallocated sectors became rare. Google’s study of 100,000 drives confirmed that a disk with even a single reallocated sector is 20-60 times more likely to fail within the next 60 days than a normal drive.
What to do: Replace the drive immediately. Think of this like a diagnosis of a serious health condition – the drive might last another year, or it might fail in the next 24 hours.
Timing considerations:
- Just happened + low count (1-5 sectors): You probably have time to order a new drive and schedule replacement
- Count is growing or reaches mid-teens (15+): Remove the drive immediately to avoid more serious data issues
- After power outage or long storage: Sometimes drives reallocate a sector or two after power events or sitting unpowered for months. While the drive may last many months, replace it immediately and consider using it only as a secondary or tertiary backup drive.
Failed Reallocations
What it means: The drive attempted to reallocate bad sectors but failed. The drive cannot move data to spare sectors.
Why it matters: This state generally means the drive has days or weeks before complete failure.
What to do: Replace this drive immediately. Do not wait.
Pending Reallocations
What it means: The drive tried multiple times to read a sector, used the checksum to attempt data recovery, and has now marked this sector for reallocation. It’s unclear whether this is permanent or the drive can recover.
What to do:
- Certify the disk using SoftRAID’s disk certification feature
- If the disk passes certification without reallocating these sectors, you can return it to service
- If certification fails or sectors get reallocated, replace the drive immediately
Unreliable Sectors
What it means: The drive had to retry reading data from these sectors. This can happen due to:
- Unexpected cable disconnection
- Thunderbolt bus eject/reset
- Power surge or brownout
- Other external interruptions
The disk marks these sectors as “unreliable” but may recover.
What to do:
- Certify the disk using SoftRAID’s disk certification feature
- If the disk passes certification without reallocating these sectors, you can return it to service
- Disks with “unreliable” sectors often pass certification and clear this error condition
- If certification fails, replace the drive
Power-On Hours (POH) Warnings
Hard drives have a limited operational lifespan. Most industry experts suggest replacing mission-critical drives after 20,000-25,000 hours of use.
Why this matters: As drives age, their failure rate increases. For example:
- Year 1-2: ~1.5-2% annual failure rate
- Year 3+ (20,000-25,000 hours): 5-10% or higher annual failure rate
- 40,000+ hours: Very few drives survive past this point
- 50,000+ hours: 90% of drives have failed
SoftRAID alerts when drives reach these thresholds so you can replace them proactively before failure.
Our recommendations:
- Desktop/external HDDs: Begin replacement planning at 25,000-30,000 POH
- Server HDDs: Same recommendation (25,000-30,000 POH for mission-critical environments)
- These are conservative recommendations based on real-world reliability data
Flash Media Failures (SSDs/NVMe)
Flash media fails differently than mechanical hard drives.
Media Wear
How it works:
- Flash media is designed to constantly reallocate bad sectors as part of normal wear-leveling
- Reallocation counts are normal and don’t predict failure like they do for HDDs
- Flash drives have a limited number of spare sectors
- When spare sectors run low, the drive is likely to fail imminently
SoftRAID’s media wear indicator:
- Starts at 100% (brand new drive)
- Gradually decreases: 90%, 80%, 70%, etc.
- Represents percentage of spare sectors remaining
Performance:
- Flash drives should perform well down to 10% media wear remaining
- Below 10%: Replace the drive immediately
Media Worn Out
What happens: When a flash drive runs out of spare sectors, it has failed. Flash drives are designed to continue operating in read-only mode when spare sectors are exhausted, but we have also seen drives stop working completely without warning.
Critical: Never let flash media reach 0% media wear. Unpredictable behavior may occur, including:
- Sudden complete failure
- Data corruption
- Read-only mode (no writes possible)
I/O Errors
What is an I/O error? An I/O error means a read or write operation failed to complete. This is a communication error, not necessarily a drive failure.
What SoftRAID does:
- Reports every I/O error
- Saves I/O error counters to disks and volumes (when possible)
- Displays error count in disk and volume tiles
Common Causes of I/O Errors
- The disk is failing or has failed
- The disk or enclosure temporarily “hung” (became unresponsive)
- Disks were ejected or unplugged during an I/O operation
- Kernel panic - macOS crashed and was unable to complete operations
- Damaged volume directory - the disk was asked to read/write to an impossible location
How to Investigate I/O Errors
I/O errors should be treated seriously but need investigation. An I/O error does not automatically mean your disk failed – it means communication failed.
Steps to diagnose:
- Clear the I/O error counter:
- Select the affected disk(s) or volume in SoftRAID
- Go to Utilities menu → Clear I/O Counters
- Select ”Errors only”
- Monitor for patterns:
- Continue using your system normally
- Watch for new I/O errors
- Note which disk(s) generate errors
- Interpret the results:
- Same disk repeatedly: Likely a disk hardware problem – replace the disk
- Same enclosure slot with different disks: Enclosure problem – may need to replace enclosure
- All disks simultaneously: Likely filesystem corruption, cable issue, or enclosure-wide problem
- No new errors: May have been a transient issue (kernel panic, power event, etc.)
See our FAQ: “I/O Errors – What is an I/O error?” for detailed troubleshooting steps.
Important Note
If you get I/O errors on disks that have reallocated sectors or are predicted to fail, the I/O errors are likely the result of disk failure. Replace these disks immediately.
Summary: When to Replace Drives
Replace Immediately
- SMART test failure - Drive has already failed
- Failed reallocations - Days/weeks before failure
- Reallocated sectors - Any reallocated sector, even one
- Media wear <10% (SSDs/NVMe) - Running out of spare sectors
- Media worn out (SSDs/NVMe) – Drive at end of life
- Repeated I/O errors on same disk with other predicted failure indicators
Plan Replacement Soon
- Pending reallocations - Certify disk first; replace if certification fails
- Unreliable sectors - Certify disk first; replace if certification fails
- POH >25,000-30,000 - Mission-critical environments should replace proactively
Monitor Closely
- POH 20,000-25,000 - Begin planning replacement
Single I/O error after known event (kernel panic, power loss) – Clear counter and monitor
Research Foundation
Our SMART monitoring and replacement recommendations are based on extensive research:
Google’s 2007 study (100,000+ drives):
- Drives with reallocated sectors: 20-60x more likely to fail within 60 days
- First scan error: 39x increased failure risk
- 36% of failed drives showed no SMART warnings - backups remain essential
Backblaze ongoing data (2013-present, 300,000+ drives):
- 2024 annual failure rate: 1.57% overall
- Failure rates increase with drive age
- Specific models show varying reliability patterns
Getting Help
If you need assistance diagnosing SMART warnings or I/O errors:
Include:
- SoftRAID Tech Support Report (attach after receiving response from support staff)
- Which SMART warning you received
- I/O error patterns (if applicable)
- Steps you’ve already taken
Extended Warranty on OWC Thunderbay Pre-Certified Drives
Note: When you purchase a Thunderbay bundled with drives, OWC pre-certifies the drives inside your enclosure. As a result, we are able to offer a 3 year replacement warranty for any drive showing reallocated sectors or other serious SMART failures.
