Backups vs Archive? What’s the Difference?

July 30, 2019 // News From the Edge backups

You say you have an archive of your data, that’s all you need. I’d say not quite. Data archives and backups are different—as in they are meant for different things.

In today’s world where we are confronting more natural disasters, more cyberattacks and greater risks to your business continuity. I want to make sure you are well-informed on the nuanced differences between data backups and archives so that you are certain your organization has the tools it needs to recover quickly.

First, what is a backup?

You can think of a backup as a copy of data that you can use to restore that data in case of loss or damage. The original data will not be deleted after a backup is made (in fact, you may have backups every single day or hour in a day, depending on how sensitive your organization is to data continuity).

Many organizations will retain backups for a certain amount of time (commonly a data retention policy spans a month or two).

Examples of backups might include a nightly backup of all of your files on your computer or all of your photos copied from your iPhone to iCloud. Servers can also be backed up as well as a database dump.

What is important to realize when thinking about data backups is their purpose. The purpose of any backup is simple: to restore data if something happens to the original copy. For instance, if you had a hard drive failure, you probably will need the data on your computer restored.

Alternatively, someone might accidentally delete a file or your files might become encrypted, lost in the cross-hairs of a ransomware attack. Without a good backup system, you may have to deal with losing that file or paying a ransom. With a good backup of your data, you would be able to restore any or all of your data without shelling over a dime (the hard reality is most businesses learn the hard way after a serious attack or data loss incident).

Now, what is an archive?

Think of an archive as a copy of your data that you might use to reference. Typically, the original data file is deleted after an archive is made.

While the purpose of a backup is to put something to how it looked at a very specific point in time (say yesterday afternoon), an archive can serve a variety of different purposes. Most commonly, it is used to find some particular data from a long time or several years ago.

This data might be related to a specific client. Let’s say you’re a lawyer and one of your clients from years ago is now claiming malpractice. Without the specific documents used in that case, you might not be able to prove your innocence. You would have a data archive of that specific data, along with any necessary software archived to ensure you are able to gain access to those particular files.

Think of an archive like those microfilms in the library—they are stored for people to access newspaper information going back decades (the librarian can easily access and present the microfilm to you). A backup might be a replacement of papers that people expect to be accessible for the past couple of weeks (the librarian will give you ALL of the available newspapers in his or her box).

Why use an archive?

Perhaps an employee believes they were given permission to do something and were fired as a result. Their lawsuit might issue an electronic discovery request asking for all emails associated with those permissions. In order to ensure you can defend the stance of your organization in such a case, you would have needed to archive all communications (in this case archive someone’s email).

An archive is what would help you address these tasks. You can essentially archive any type of data—sales orders, quotes, or contracts. You might keep current contracts and orders in your live system and take all old client data in an archive.

Note that many archives are indexed, which allows you to search for specific data you need to find.

When you need to retrieve data, you will need to evaluate whether you are actually looking to restore or retrieve.

If you are using an archive to store your data, you are probably going to retrieve specific data rather than recover your entire system. When you restore something by archive, it’s typically related to one file or a specific group of files. A restore is done to a single point in time. You may also retrieve data from a range of dates to get different versions from an archive system (this might be necessary to retrieve emails from a date range for a specific user).

In order to restore from an archive, you will need to know a LOT about where the file or data was if you were to restore it from a backup (backups aren’t indexed in the same way as a data archive). You will need to know the name of the server it was on, the database or the directory and the name of the file or table you want back.

Why not just use an archive system as a backup? Why the heck does it matter?

Many folks try to use a backup system as an archive. The problem with this is when you get a request to retrieve a specific file, it might take quite a long time (in some instances months instead of minutes) and it might cost a lot more money—millions instead of a couple of dollars—to get the specific information you were looking for.

Bottom line: backups are used to completely restore your systems in the event of a disaster. Backups are meant to continuously ensure that your data is recoverable. Archiving is meant to find and retrieve specific pieces of information. In all likelihood, you need both.