Published: Oct 15, 2025
•
6 min read
The dashboard serves three stakeholder groups:
Backblaze hard drive telemetry data between January 1st and 31st, 2025.
The dashboard uses a multi-view layout with a top-level "Select View" (top left) dropdown that switches between different graphs. Five key metric cards are always visible at the bottom. This design serves all three audience groups: operations managers get a quick perspective on health summaries, procurement teams can explore vendor/model comparisons, and data scientists can explore reliability patterns.
HDD Count tracks total drives in service for capacity managementAnnualized Failure Rate directly informs replacement planning and budget forecastingAverage Age At Failure guides optimal replacement timingAverage Drive Age provides fleet maturity contextST14000NM0138 → Seagate 14TBWDC WUH721414ALE6L4 → Western Digital 14TBHGST HMS5C4040ALE640 → HGST 4TBCritical Failure Predictors
Operational Metrics
A python script was used to remove the majority of the columns in the dataset. This reduced the size of the files by about 75%.
The dashboard's colors serve three purposes:
Most of this project involved learning Tableau and dealing with its issues. I went through the DataCamp course and learned the basics, but getting comfortable building the dashboard took significant time. Once I understood how to structure the data sources and panels, the interactive filtering worked well. It's clear (besides the errors I experienced) why people use it for this kind of work.
The main issue was Tableau crashing frequently. Loading more than about 7 days of data would cause the application to lock up or crash entirely. I capped the dataset at 9 days to have a working version. Additionally, the project got corrupted at one point and I lost a week of work. I also encountered UI bugs like the Data Source tab becoming unclickable. It was incredibly frustrating to work through these stability problems.
If I were to do this again, I would build it as a web dashboard instead. I have much more experience with JavaScript, HTML, and CSS, and I could have built something comparable significantly faster. Tableau works for certain use cases, but in this instance, it felt unnecessary, especially given the instability issues (I would get a new error every single time I would open the project).
If I had more time I would want to try connecting to a database and spend a lot more time extracting information from the dataset.
# Columns to keep
columns_to_keep = [
'date',
'serial_number',
'model',
'capacity_bytes',
'failure',
'datacenter',
# SMART raw values only
'smart_5_raw', # Reallocated Sectors
'smart_9_raw', # Power-On Hours
'smart_12_raw', # Power Cycle Count
'smart_187_raw', # Reported Uncorrectable Errors
'smart_188_raw', # Command Timeout
'smart_194_raw', # Temperature
'smart_197_raw', # Current Pending Sectors
'smart_198_raw' # Offline Uncorrectable Sectors
]
# Get all CSV files
csv_files = [f for f in os.listdir(input_directory) if f.endswith('.csv')]
print(f"Found {len(csv_files)} CSV file(s) to process:\n")
for filename in csv_files:
try:
input_file = os.path.join(input_directory, filename)
output_file = os.path.join(output_directory, filename)
print(f"Processing: {filename}...")
# Read only the columns you need
df = pd.read_csv(input_file, usecols=columns_to_keep)
# Save to _filtered directory
df.to_csv(output_file, index=False)
print(f" ✓ Saved to: _filtered/{filename}")
print(f" ✓ Columns: {len(df.columns)}, Rows: {len(df):,}\n")
except Exception as e:
print(f" ✗ Error processing {filename}: {str(e)}\n")
Click here to download the Tableau Packaged Workbook.
AI tools (ChatGPT/Claude) were used in the following ways during this project: