I think problem solving basically boils down to a single concept:
Figuring out the conditions that led to the observed situation.
Sure, replacing an obviously broken part is one part of problem solving, but you still have to work backwards:
4. It's not working. What could cause this problem? Oh, the framistat broke into three pieces.
3. What conditions could cause the framistat to break? Overheating. Maybe the mindexer wasn't spinning at the optimal speed.
2. Why wasn't the mindexer spinning properly? Oh, it has a failing bearing.
1. Why did the bearing fail? The seal wore, allowing dust to enter.
So many people will only address the last point of failure, without looking at the entire picture. Maybe sourcing a better quality bearing, or preemtively replacing it as scheduled maintenance, will extend the life of the framistat. And looking at the entire picture requires some understanding of how the whole system works.
I once had to restore a Microsoft Exchange mailbox database due to a failing disk drive, but the backups were not properly managed (not my fault). The only good backup was several months old. There were complete transaction logs files spanning from that backup to the current time, but the most recent log file was corrupted. That meant that the recent log file could not be restored, which caused the backup to fail and the restored database to be unmountable. Using a hex editor, I was able to modify the transaction log index file to specify a different ending log file and restore the database to a very recent state.
Now, conventional wisdom is that with that scenario, the database couldn't be properly restored. But, by understanding how things work in a big picture/under-the-hood way, I was able to "break the rules" and provide the customer with an acceptable fix.
But, even though I recovered the data, the problem wasn't actually solved. Only the symptom was addressed. Disks had to be replaced, and backup procedures needed to be corrected before the problem could really be considered "solved."