Silicon Goblin Technologies Home
Services
Policies
Legal/copyrights
Contact
Links
Adult Literacy
Samples
Presentations

Data Loss

Data loss is a neglected topic in computer security, but it shouldn't be, because computer security is, as defined in the beginning of this presentation, a set of practices to be sure your computer behaves as you expect it to, and your data and work done on the computer remain accessible and safe. Data lost to media failure or vendor lock-out is no better than data lost to a virus; the end result is, it's still gone. Computer security has to include protecting your data against threats from without AND within.

Software failure

The most common type of data loss is due to simple software failure. Probably the best-known example is the word processor that crashes while you're working on a document you haven't saved. Poof, it's gone, and you have to recreate it. This can range from annoying to disastrous, depending on how much and what kind of data you've lost. Software crashes by nature are extremely difficult to predict, and as such there is only one precaution to take when working on any kind of document in any application: SAVE FREQUENTLY. Develop a reflex to use the "save" keystroke combination (frequently control-s on Windows and Linux, and apple-s on Macintosh) every minute or two. Save after every sentence you type, after every formatting change you make. Doing this will diminish data loss due to software crashes or accidental quits to nearly zero.

Much more insidious and dangerous is corruption in your hard drive's filesystem, which can lead to massive data loss. This kind of problem can be very hard to detect until disaster strikes: think termites eating away at the inside of the joists supporting your house. Everything looks fine from the outside until enough damage is done internally and then everything collapses. Failures of this type are fortunately relatively rare with modern operating systems, but they can be disastrous when they occur.

Generally, the only clue (other than visible corruption in your files) that this kind of problem is developing is general system instability - applications that crash constantly, or erratic behavior in general that doesn't seem normal. System crashes that occur during disk activity (such as saving a long file) are more likely to contribute to this kind of problem, and they can snowball quickly and compound themselves.

Sometimes, filesystem damage of this sort can be repaired with "disk doctor" or "first aid" type of utilities, but it's possible for a filesystem to be so badly damaged that it cannot be recovered. This can turn into a salvage operation quite quickly. The only thing you can do is try to copy important files off of the damaged hard disk and onto an external media of some sort or over a network, and hope they come through undamaged.

The only preventive action you can take to guard against filesystem damage is KEEP GOOD AND FREQUENT BACKUPS of your important data. If you only have one copy of your important files, then you have a "single point of failure" and are at very high risk of data loss. The frequency of your backups, as with any security decision you make, should be based on the worst-case scenario: how bad would be it be if you suffered total data loss? Backups can be monthly, weekly, daily, hourly, or anywhere in between. They can be manual (copying files by dragging and dropping) or automated (generally with commercial backup software), but the most important thing is that they be scheduled and regular.

Media failure

Media failure is the hardware version of the software failure described above, but it generally produces the exact same circumstances (partial or total data loss). Media failure is data loss resulting from physical damage to the storage medium, be it an internal hard disk, a floppy or zip disk, a CD, or any other media type. This kind of error is generally impossible to recover from.

All magnetic media (hard disks, floppy disks, zip disks, etc) are of course vulnerable to magnetism. Putting a floppy or zip disk with important data on it on top of a large speaker or computer monitor is a bad idea. They are also vulnerable to heat (beware hot surfaces, car dashboards, etc), distortion (don't sit on them), and breakdown of physical moving parts (spindles, sleeves, shutters, drive arms). CDs and DVDs are a more attractive backup medium for many reasons (no moving parts, not magnetic or vulnerable to magnetism), but are still fragile, vulnerable to heat and light, and very suceptible to scratches.

The obvious cautions about handling media - particularly external media like disks and CDs - apply to preventing media failure, and should be followed. Physical damage to internal hard disks is much less likely because they are cased in heavy protective metal, but extreme physical damage to a computer with a hard disk (dropping a laptop, a computer falling off a desk) can still cause shock damage that can break a hard disk and destroy the data on it.

Finally, there is also the added risk of media just plain wearing out. Although stories of hard disks that have worked for ten years or floppies still readable after twenty abound, it is not wise to count on such longevity. The lifespan of CDs is still unknown and a subject of frequent theoretical debate. It is widely agreed that handling media carefully and storing it in cool, dry, dark places will greatly prolong its life, but the point at which failures due to degredation become increasingly likely is hard to pin down. It is wisest to recycle old media with important files: replace backups every year or two at most, and keep redundant copies in case one set fails.

One additional note for CDs and DVDs: DO NOT WRITE ON THEM. Although various pen manufacturers claim their markers are "CD safe", this is not a safe claim, and can also depend on the quality of the CD itself. The ink in pens can damage the surface of some CDs over time. And do NOT use the glue-on labels commonly available in stores: the glue can and will break down the data on the disc over time. Keep your important CDs in a case and write on the label, not the disc itself.

Software obsolescence

Long-term storage of data can be put at risk by the common practice of applications storing data in closed, proprietary (secret) formats that can only be opened by those applications, creating a circumstance known as "vendor lock-in" which results in dependence on the software vendor to get to the data you created. You are then vulnerable to the software becoming obsolete and no longer supported, the software refusing to run for some reason or not performing properly due to bugs or system upgrades, or the vendor changing its licensing or pricing structures to ones less favorable to the consumer.

Data stored in open formats substantially reduces these risks by ensuring that you can migrate your data to another application if the one you're using stops being functional or satisfactory for any reason. This can take deliberate action and planning, because most applications that use proprietary formats (e.g. Microsoft Office, to name the most prominent) always default to saving files in these formats, not open alternatives (like Rich Text for Word files instead of ".doc").

One option that can work well for some people is to use major open-source, freeware applications like OpenOffice as an alternative to Office at least for documents that you wish to have a long lifespan and be free of restrictions by the software that created it. For some types of application, this is not practical or possible, but for basic office applications, there are numerous alternatives to the commercial applications, many of which are of very high quality and in active development, and the documents created by them are in an open format by default and can be accessed by any number of other applications as well, including the commercial ones. It's worth evaluating whether that can work for your purposes as well.

At a minimum, even if you do use commercial software, you should strongly consider saving copies of important documents in open formats like text-only or rich text if you want to ensure that they will never be "locked out" due to software problems.

There is a lot of momentum building around the world regarding the use of open formats for important data, at national / governmental levels as well as within academia and the business world, to avoid vulnerability issues like those discussed above. The OASIS/OpenDoc format is one such initiative gaining steam in Europe, and bears watching. OpenOffice will use OASIS/OpenDoc as its native file format starting in the summer of 2005.

Data loss due to software obsolescence is one of the most tragic sorts, because it is among the most preventable. Accidental or deliberate "locking out" of people from the data they created is absolutely pointless, and applications which practice it should be avoided for anything you wish to live longer than you think the manufacturer might be willing and able to provide support. Your data is more important than the commercial interests of software vendors. Don't be held hostage to them. Consider the use of open formats, and software that supports them, for your important data.

Best practices:

  • Keep good backups
  • Do not write on backup media
  • Save important data in open formats
  • Consider using non-proprietary software

Next: Wrap-up