Home
Knowledge Base
SoftRAID
Predicted failure and SMART Failures

Predicted failure and SMART Failures

What is SMART?

SMART (Self-Monitoring, Analysis, and Reporting Technology) is a built-in monitoring and reporting technology included in computer hard disk drives (HDDs) and solid-state drives (SSDs).

SMART was introduced in the 1990s and quickly adopted by all disk manufacturers to detect and report indicators of drive reliability. The intent was to diagnose failure modes in returned disk drives. Later, it was extended to help predict imminent disk failures.

How SoftRAID Uses SMART

SoftRAID runs SMART checks:

Once per day automatically
Every time you launch the SoftRAID application

What SoftRAID monitors: While disk drives store and report a whole range of SMART values, only a few are actually associated with imminent drive failure. Based on extensive studies by Google and Backblaze analyzing hundreds of thousands of drives, SoftRAID monitors these critical parameters:

For all drives (HDDs and SSDs):

SMART test status (pass/fail)
Reallocated sectors
Failed reallocations
Pending reallocations
Unreliable sectors

For flash media only (SSDs/NVMe):

Media wear indicator
Media worn out status

Additional monitoring: SoftRAID also tracks I/O errors, which indicate communication failures between your Mac and the drive. While not a “SMART” parameter, I/O errors can indicate hardware issues with the drive, enclosure, cables, or volume directory damage.

What devices can SoftRAID get SMART data from?

SoftRAID can retrieve SMART data from most modern storage devices, but compatibility varies by connection type and macOS version:

Thunderbolt-connected drives:

All drives in Thunderbolt enclosures (including all OWC Thunderbay models)
Full SMART data available
Most reliable SMART reporting

NVMe drives:

Most NVMe drives support SMART reporting
Includes Apple internal SSDs (though not officially documented by Apple)
Full wear data and health indicators available

USB-connected drives:

macOS Tahoe (26.x) or later required for SMART over USB
Most modern USB drives and enclosures supported
Some older USB controllers and bridge chips do not pass SMART data to macOS
If SMART status shows “test unavailable,” the USB controller doesn’t support SMART reporting

eSATA and SAS drives:

Full SMART data available when properly connected

If SMART is unavailable: For drives that don’t support SMART (or USB drives on pre-Tahoe macOS), SoftRAID maintains its own power-on hours counter and tracks I/O errors to monitor drive health.

SMART Test Failure (Immediate Action Required)

What It Means

If you receive an alert that a disk has failed the SMART test, the drive has already failed and cannot recover.

Critical: A disk that fails SMART can cause data corruption on other drives in your enclosure.

What to Do

Stop using the disk immediately
Remove the disk from your enclosure as soon as possible
Replace the disk with a new, certified drive

There may be time to copy data from the failing disk before it stops functioning completely, but this is risky. The drive could fail completely at any moment.

If your volume is RAID 4, RAID 5, or RAID 1+0: Your volume should continue functioning with one disk missing, allowing you to replace the failed drive and rebuild.

This applies to all drive types: HDDs, SSDs, and NVMe drives.

Predicted Failure Modes

Predicted failures indicate your drive is likely to fail soon but has not completely failed yet. These warnings give you time to replace the drive proactively.

Understanding Sectors

A drive is comprised of “chunks” of data called sectors. They are typically 512 bytes or 4KB (4096 bytes) in size. A “checksum” is added to each sector to help reconstruct the data if the sector cannot be read.

When a drive can no longer reliably read a sector, the data is automatically moved to another location on the drive. This is called a ”reallocated sector.”

Reallocated Sectors (HDDs only)

What it means: The drive has found bad sectors and moved the data to spare sectors elsewhere on the disk.

Why it matters: In the early days of disk drives, reallocated sectors were common and drives could have hundreds. However, as disks became more reliable, reallocated sectors became rare. Google’s study of 100,000 drives confirmed that a disk with even a single reallocated sector is 20-60 times more likely to fail within the next 60 days than a normal drive.

What to do: Replace the drive immediately. Think of this like a diagnosis of a serious health condition – the drive might last another year, or it might fail in the next 24 hours.

Timing considerations:

Just happened + low count (1-5 sectors): You probably have time to order a new drive and schedule replacement
Count is growing or reaches mid-teens (15+): Remove the drive immediately to avoid more serious data issues
After power outage or long storage: Sometimes drives reallocate a sector or two after power events or sitting unpowered for months. While the drive may last many months, replace it immediately and consider using it only as a secondary or tertiary backup drive.

Failed Reallocations

What it means: The drive attempted to reallocate bad sectors but failed. The drive cannot move data to spare sectors.

Why it matters: This state generally means the drive has days or weeks before complete failure.

What to do: Replace this drive immediately. Do not wait.

Pending Reallocations

What it means: The drive tried multiple times to read a sector, used the checksum to attempt data recovery, and has now marked this sector for reallocation. It’s unclear whether this is permanent or the drive can recover.

What to do:

Certify the disk using SoftRAID’s disk certification feature
If the disk passes certification without reallocating these sectors, you can return it to service
If certification fails or sectors get reallocated, replace the drive immediately

Unreliable Sectors

What it means: The drive had to retry reading data from these sectors. This can happen due to:

Unexpected cable disconnection
Thunderbolt bus eject/reset
Power surge or brownout
Other external interruptions

The disk marks these sectors as “unreliable” but may recover.

What to do:

Certify the disk using SoftRAID’s disk certification feature
If the disk passes certification without reallocating these sectors, you can return it to service
Disks with “unreliable” sectors often pass certification and clear this error condition
If certification fails, replace the drive

Power-On Hours (POH) Warnings

Hard drives have a limited operational lifespan. Most industry experts suggest replacing mission-critical drives after 20,000-25,000 hours of use.

Why this matters: As drives age, their failure rate increases. For example:

Year 1-2: ~1.5-2% annual failure rate
Year 3+ (20,000-25,000 hours): 5-10% or higher annual failure rate
40,000+ hours: Very few drives survive past this point
50,000+ hours: 90% of drives have failed

SoftRAID alerts when drives reach these thresholds so you can replace them proactively before failure.

Our recommendations:

Desktop/external HDDs: Begin replacement planning at 25,000-30,000 POH
Server HDDs: Same recommendation (25,000-30,000 POH for mission-critical environments)
These are conservative recommendations based on real-world reliability data

Flash Media Failures (SSDs/NVMe)

Flash media fails differently than mechanical hard drives.

Media Wear

How it works:

Flash media is designed to constantly reallocate bad sectors as part of normal wear-leveling
Reallocation counts are normal and don’t predict failure like they do for HDDs
Flash drives have a limited number of spare sectors
When spare sectors run low, the drive is likely to fail imminently

SoftRAID’s media wear indicator:

Starts at 100% (brand new drive)
Gradually decreases: 90%, 80%, 70%, etc.
Represents percentage of spare sectors remaining

Performance:

Flash drives should perform well down to 10% media wear remaining
Below 10%: Replace the drive immediately

Media Worn Out

What happens: When a flash drive runs out of spare sectors, it has failed. Flash drives are designed to continue operating in read-only mode when spare sectors are exhausted, but we have also seen drives stop working completely without warning.

Critical: Never let flash media reach 0% media wear. Unpredictable behavior may occur, including:

Sudden complete failure
Data corruption
Read-only mode (no writes possible)

I/O Errors

What is an I/O error? An I/O error means a read or write operation failed to complete. This is a communication error, not necessarily a drive failure.

What SoftRAID does:

Reports every I/O error
Saves I/O error counters to disks and volumes (when possible)
Displays error count in disk and volume tiles

Common Causes of I/O Errors

The disk is failing or has failed
The disk or enclosure temporarily “hung” (became unresponsive)
Disks were ejected or unplugged during an I/O operation
Kernel panic - macOS crashed and was unable to complete operations
Damaged volume directory - the disk was asked to read/write to an impossible location

How to Investigate I/O Errors

I/O errors should be treated seriously but need investigation. An I/O error does not automatically mean your disk failed – it means communication failed.

Steps to diagnose:

Clear the I/O error counter:

Select the affected disk(s) or volume in SoftRAID
Go to Utilities menu → Clear I/O Counters
Select ”Errors only”

Monitor for patterns:

Continue using your system normally
Watch for new I/O errors
Note which disk(s) generate errors

Interpret the results:

Same disk repeatedly: Likely a disk hardware problem – replace the disk
Same enclosure slot with different disks: Enclosure problem – may need to replace enclosure
All disks simultaneously: Likely filesystem corruption, cable issue, or enclosure-wide problem
No new errors: May have been a transient issue (kernel panic, power event, etc.)

See our FAQ: “I/O Errors – What is an I/O error?” for detailed troubleshooting steps.

Important Note

If you get I/O errors on disks that have reallocated sectors or are predicted to fail, the I/O errors are likely the result of disk failure. Replace these disks immediately.

Summary: When to Replace Drives

Replace Immediately

SMART test failure - Drive has already failed
Failed reallocations - Days/weeks before failure
Reallocated sectors - Any reallocated sector, even one
Media wear <10% (SSDs/NVMe) - Running out of spare sectors
Media worn out (SSDs/NVMe) – Drive at end of life
Repeated I/O errors on same disk with other predicted failure indicators

Plan Replacement Soon

Pending reallocations - Certify disk first; replace if certification fails
Unreliable sectors - Certify disk first; replace if certification fails
POH >25,000-30,000 - Mission-critical environments should replace proactively

Monitor Closely

POH 20,000-25,000 - Begin planning replacement

Single I/O error after known event (kernel panic, power loss) – Clear counter and monitor

Research Foundation

Our SMART monitoring and replacement recommendations are based on extensive research:

Google’s 2007 study (100,000+ drives):

Drives with reallocated sectors: 20-60x more likely to fail within 60 days
First scan error: 39x increased failure risk
36% of failed drives showed no SMART warnings - backups remain essential

Backblaze ongoing data (2013-present, 300,000+ drives):

2024 annual failure rate: 1.57% overall
Failure rates increase with drive age
Specific models show varying reliability patterns

Getting Help

If you need assistance diagnosing SMART warnings or I/O errors:

Submit a support ticket. 

Include:

SoftRAID Tech Support Report (attach after receiving response from support staff)
Which SMART warning you received
I/O error patterns (if applicable)
Steps you’ve already taken

Extended Warranty on OWC Thunderbay Pre-Certified Drives

Note: When you purchase a Thunderbay bundled with drives, OWC pre-certifies the drives inside your enclosure. As a result, we are able to offer a 3 year replacement warranty for any drive showing reallocated sectors or other serious SMART failures.

Was this article helpful?

Yes No

Predicted failure and SMART Failures

Related Articles

Need Support?

Login

Register