Monday, December 29, 2014

Understanding the Cloud


Introduction

“The cloud” is a confusing concept to many, something that’s not surprising given that the term is used in many different contexts and often not in a consistent manner. This short essay is designed to make the concept clear enough that users can understand its application in any context.

Where Computers Keep Things

A stand alone computer keeps information—which means programming information as well as data—in three places:  1) right on the CPU chip; 2) in memory, often called RAM; and 3) in storage, which is usually a disk on a laptop or desktop but is most often chip-based “flash” memory in a tablet, phone, or ultralight laptop.

In its simplest meaning, “the cloud” just refers to a fourth place where things are kept—disk storage that is not local but on a server computer somewhere on the Internet.

The term “cloud” comes from the fact that, in the early days of networks, engineers made careful drawings to show which devices were in which locations and how they were connected. These network maps were helpful in planning and troubleshooting.

Detailed network drawings were possible even for corporate networks that connected computers in multiple cities.  The company leased just a few lines to connect the cities and it was easy to identify them.

But when the Internet came along and businesses started using it instead of private networks, drawing became difficult because information flowing through this new kind of network doesn’t always take the same path. Difficulty in drawing quickly became an impossibility as the number of available paths between any two from points on the Internet became so large that no one was even sure what all of them were.

As a result of the Internet’s complexity, designers of network maps compromised by showing instead a connection going into a cloud symbol and coming out somewhere on the other side. The cloud is simply a symbol representing the Internet.

Things You Need to Know #1

1.         When someone talks about “cloud storage” they’re talking about a disk drive somewhere on a server that’s connected to the Internet.
2.         It follows that you can’t access cloud storage unless you’re connected to the Internet.
3.         You have to read on to find out how some services offset that issue.

An Example of Cloud Storage:  Smarter than Just a Big Disk Drive

Dropbox was one of the first “cloud storage” providers and is still a major player, though the swift entry of Google, Microsoft, Apple, Amazon, and others into the market has prevented them from exploding into a gigabillion dollar company as other Internet businesses have done in the past. Dropbox, with some 300 million users, is still major, though.

The Dropbox Folder

The thing that gave Dropbox a lead in the cloud storage market was its ability to modify your local operating system (for example, Windows, OS/X) in order to create a folder on your computer. The most important thing about the Dropbox folder is that it operates like all your other folders. You can put files in and out in the same way, no matter the computer. Curiously, it took others a long time to be able to do the same thing:  Dropbox had a smoothly functioning folder in Windows before Microsoft was able to do the same for its competing OneDrive service.

When You Save a File

Let’s say you’re working on your office desktop and you create a new file, give it a name, then save it. (Always, always do this.)

You choose the Dropbox folder as the place to save the file.

What happens next?

First, the local computer system saves the file to a Dropbox folder that is located on your local computer—a hard drive or flash memory (for an exception, see Things You Need to Know #2-3, below).

Once the file is saved locally, the Dropbox software running on your local computer notices that the file is new (doesn’t exist in the cloud) and uploads it to Dropbox’s Cloud server, where it’s placed in the folder you own there. FYI—Dropbox uses Amazon’s Web Services rather than owning its own set of servers.

Now, there are two copies of your file. One is on your office desktop’s hard drive and one is on Dropbox’s cloud server.

When this is done, you can go home, have a drink and dinner, open up your laptop and find the file is there—in your local Dropbox folder.

This is possible because, as soon as you open the laptop at home, the Dropbox service running on the laptop looks out across the Internet and compares the local Dropbox folder with the one in the cloud. If there’s a difference, which is the case here because there’s a new file in the cloud, the service downloads the file to the local hard drive (your home laptop in this case) and you can access it.
Normally, this synchronization happens so fast that the user doesn’t know that the file wasn’t always on the local machine.

So now what happens if you edit (change) the file on your laptop?

As soon as you save it, the Dropbox service on the laptop notices there’s a discrepancy between the file on the laptop and the one in the cloud—the local one is newer. Since newer is better, the service uploads the file from the laptop and has it replace the version in the cloud.

Two things to know here.

First, the synchronization works continuously. Thus, the Dropbox service on the desktop machine in your office will be watching the cloud and, when it notices that a file there has changed, will compare it with the local one (the original that you created there). Since the version in the cloud is newer, it will be downloaded and replace the local one.

In quick summary, Dropbox’s software is constantly working to make sure that the newest version of a file is the one you see when you open any machine.

One cool thing about Dropbox, something that sets it apart from some of its competitors, is that it knows you might make a mistake and keeps backup copies of older files. So, when you change the file on your laptop at home and that file is updated to the cloud, Dropbox will always replace the old file in the folder and also sync the newer file to your desktop (and any other computer you have connected) as soon as it can.

But, Dropbox will put a copy of the original file—the one before the laptop version was uploaded—into a special backup folder. This means that, if you didn’t want the changed file on the laptop to replace the original, you can just go to the cloud and find the untouched original file and access it (you’ll give it a new name after you open it, of course).

A quick warning here. If you edit a file on your home laptop then quickly close the lid without closing the file and waiting  few seconds, it's likely the Dropbox (or other) software won't have time to update the version in the cloud. This means that when you get back to the office the version of the file you edited just before closing your home laptop won't be available. The takeaway:  when you're ready to shut down a laptop (or other device), first close any cloud-connected files and then wait a minute or so for the system to update. My experience is that Dropbox does this much faster than its competitors.

A computer doesn’t have to be connected to the Internet for Dropbox to work. You use your local drive as you normally would. Then, when the disconnected computer (for example a laptop being used on an airplane), reconnects to the Internet the synchronization process starts.

So the marvelous thing about Dropbox is that it isn’t just a dumb disk drive somewhere out there in the Internet, it’s a smart drive that keeps files synchronized across multiple computers and also provides backup in case you make mistakes.

Speaking of backups, one nice thing about cloud services from major companies like Dropbox, not to mention Google, Apple, Microsoft, and Amazon, is that they don’t keep your data on just one physical drive in their data centers. Rather, the data are mirrored to another cloud drive so there’s a backup if the original drive goes down. Typically, there are also backup drives at different physical locations. Thus, if the Google server farm in Virginia is torched by Luddites, the data is still available somewhere in Pennsylvania (for example).

Things You Need to Know #2

1.         Cloud storage is typically a smart service that keeps files synchronized across many machines. Logically, this isn’t complicated because all the computers share a common central point—the versions of the files in the cloud server. Practically, the process works because software on each computer is constantly looking for newer versions of a file and making sure that the newest version is on the server for connected computers to download and use.

2.         Cloud storage can provide two kinds of backup:  1) the cloud server has a copy of your file in case your laptop goes down (the cloud server copy is itself backed up in multiple locations so you don’t have to worry about a server or even a server location going down); and 2) if you accidentally overwrite an old file with a new one, the cloud service can usually recover the old version.

3.         With Dropbox, there’s an exception to keeping all files locally. The Dropbox service running on devices with just a small amount of storage and limited Internet bandwidth, for example tablets and phones, will not keep files locally unless you tell it to do that one by one. See https://www.dropbox.com/help/82 for information. If you don’t remember to do this, you can be somewhere without an Internet connection and find that you can’t get a file from Dropbox to your tablet or phone.

4.         If your computer crashes, you don’t lose the data you have in Dropbox. Once your new or rebuilt machine is operating, you go to the Dropbox site on the web, download the company’s software and it will start. The software will ask for your username and password at login. Once you’ve done that, the software will find the cloud copy of your files and begin to download them. This will continue until the two Dropbox folders are again the same.

Cloud Storage as a Smart Service

Dropbox’s business model is to give you a certain amount of storage free (currently 2 GB) and then ask you to pay if you need more (you will if you keep a lot of photos in the cloud).

Dropbox’s big competitors like Google, Microsoft, Apple, and Amazon use the same model, but provide a lot more free storage to start:  usually 5 GB (if you look carefully you can get even more). How can they afford to do this?

Google, Apple, et al see the cloud as a way to lock you into their brand system. Let’s take Apple as an example.

Apple sells you services and entertainment in addition to devices:  apps, music, movies, books. You can download these to your computer and use them as needed, but you can also keep them in Apple’s iCloud.

Why keep apps and stuff in the cloud?

In the case of apps, they have to be on the local machine to work, but iCloud is a very nice place to keep backup copies. If you need more local storage in your iPhone or iPad, for example to load a new version of the IOS operating system, you can delete the local copy then reinstall it from the cloud later. This process also makes adding a new device relatively easy, for example a new iPad Air to replace an old one. Apple originally did a terrible job of making this process understandable, but it’s quite good now.


Music, movies, and books also have to be local to be usable but some of them, especially movies, take a lot of local storage. So, with a cloud service you can download and watch, then delete and still be able to download and watch any time you want (assuming you have an Internet connection).

Led by Amazon, providers of cloud services are also making it possible to use movies without ever downloading—you can “stream” them from the cloud to your device at will (again, assuming you have an Internet connection).

It seems really generous of Apple to keep all these big files for you, doesn’t it?

Well, it would be if that’s what they did. If you were to go to the place where your app, movie, music, and book files are kept on an Apple cloud server, you would see just one file. In that file is a list of what you own. When you ask for something, the server goes to its master database of apps, movies, etc. and pulls that one out. Just one copy needs to be stored for tens of millions of users (though there are of course backups of the master lists and usually there are copies distributed across the Internet to make access faster).

In summary, the idea of making cloud storage relatively free is that the particular service you use will become the place where you also buy your music, movies, and books.

Cloud Computing

When large businesses started moving their local computing power to a shared server cluster connected to the Internet, they found they could save a lot of money, both by avoiding redundant hardware and in consolidating administration of the software. So, a company which previously had a set of servers and staff at each of its seven locations now could consolidate at just one.

The next step past this is for a company not to own servers at all, but to outsource that function to a cloud computing provider like IBM (or Amazon, or others). IBM leases the hardware capacity and basic system software maintenance at a price only very large businesses could match. All you do is load your own software and you’re ready to go (and IBM can help with this). The speed of access is about the same as it would be if the computer were next door rather than in the cloud.

There are two big problems with this strategy. The first is the reliability of Internet access. Fortunately, this has been mostly solved:  businesses purchase two Internet connections from two different providers and the odds of both going down at once are tiny.

The second problem is security of information on the Internet, and it most definitely hasn’t been solved. That being said, the problem is really no worse with cloud computing than it is for any Internet-connected system.

Thanks to the wizardry of companies like IBM, Amazon, Microsoft and others, you can actually send large-scale mathematical or data problems to the cloud and have their servers (rather than just the ones you lease) do the computation for you. In addition to the server farms that store data, these companies have fast and vast clusters that can quickly bring a thousand, ten thousand, or more CPUs to focus on your problem and just yours. Amazon is particularly good at this, providing an easy to use service with simple pricing that can also include as much disk storage as you want. If your hobby is gene sequencing at home, Amazon’s got you covered.

Summary

The term “the cloud” is just another way of saying “somewhere on the Internet.”

It’s always been possible to send files from place to place on the Internet for purposes of backup; cloud storage services like Dropbox simply make that process transparent to the user. They also provide synchronization across multiple computers.

Cloud storage is invulnerable to local system crashes. Anything you have in a system like Dropbox can be retrieved when your new or rebuilt system is up and running.

Cloud services now also provide storage and management of purchased software like music, videos, and books as well as backups of apps.

Cloud computing makes vast numbers of fast CPUs available on demand to any business or individual who knows how to use them (and can afford it).