A couple of days ago I was cleaning up my recently migrated server, and ran across a directory filled with a couple thousand text files and some perl scripts. The directory wasn’t obviously named, but after some poking around I realized I was looking at the remains of a small consulting gig from 4 years ago. It was a pretty straightforward data mining job: There was a bunch of information on a public web site that an organization needed. Filings or grants applications or something like that. I needed to download it and remix it into a spreadsheet. What should have been a really easy spider and collate job ended up being complicated by the fact that said web host had a rate limiting module setup, so no IP address could grab more than 10-20 pages every hour. There were thousands of them.
If this problem sounds familiar, it’s similar to what Aaron Swartz was doing, and the problem that he was trying to overcome when he snuck that laptop into an MIT closet. In my case there was no login or private, privileged access, and I was running all this stuff in the middle of the night as to not inconvenience anyone else, but the problem remained: If I’d followed the rate limiters desires, it would have taken weeks or months to grab the data.
I ended up getting around the rate limiter by using something called Tor, The Onion Router. Tor works by sending your traffic through a distributed network of hundreds of other participants computers, anonymizing your physical and digital location in the process. For me, that meant that I could download all the files in 15 minutes or so in the middle of the night. For other people it means posting to Twitter or accessing dissident web sites from Syria or China or where-ever.
Running across these files reminded me of something I’ve been thinking about for a while: That what the Personal Cloud really needs to take off is an immediate problem-solving use case, and to find useful examples, we might want to look in the grayer areas of the internet. We can talk about companies bidding for our orders VRM style or Internet of Things devices dumping metrics into our personal data warehouses, but both of those things are going to require a lot of supporting infrastructure before they’re really viable. If you want to get a lot of people excited about something today, you solve a problem they have today, today. And that brings me to the edges…
The Edge: Where Innovation Happens
One common aphorism shared by those in technology or innovation is that new things develop at the boundaries. Change and chaos happens at the edges, it happens at the borders, where things mix and intermingle. Here’s the MIT Media Lab’s Joi Ito talking about it, for instance. MIT is a large, stable organization. Businesses are large, stable organizations. The Media Lab is where they meet, where they cross-populate and where the friction in developing new ideas is reduced as much as possible. The same can be said about border towns. New York City is an American border town. It’s the edge between our country and a whole bunch of immigrants, both old and new. The mix of ideas and talents and experiences creates new things.
Joi Ito: Innovate on the Edges and Embrace… by FORAtv
A lot of the innovation that happens on the edge happens in the gray area outside the strictly legal, or deep in the illegal. Across our southern border we have very advanced drug and gun smuggling tunnels, complete with ventilation and electricity. Neal Stephenson’s last book REAMDE was largely about northern border smuggling. Chocolate, toy filled Kinder Eggs are illegal in this country, so people smuggle those in. I brought some back the last time I went to Mexico, and some friends brought back a whole carton when they went to Germany recently. In Gaza they even have KFC delivered by tunnel:
Given that innovation happens at the edges, that people solve their problems at the edges using interesting methods, and that the Personal Cloud needs some need-driven use cases in order to flourish, I think it’s useful to look at some of the ways people are using things like the Personal Cloud already for dubiously legal purposes (though the legality they’re avoiding isn’t always our own). Perhaps by digging into what makes them compelling, and how their developers have solved those problems, we can learn something about developing Personal Clouds for everybody.
Some Personal Cloud Definitions
When looking for products that fit the Personal Cloud mold, I’m specifically looking for interesting uses of on-demand computing and networking. Especially things that don’t inherently scale beyond the individual, either due to privacy concerns, the need to be distributed, or some other unique aspect of the approach.
A job that only takes 10% more time to run for another person isn’t a good candidate for a personal cloud, because the economy of scale is going to keep it expensive. Running your own mail server is a bad idea these days, because your data and address can be portable (with IMAP and a personal domain) and running the spam filtering and staying on whitelists is hard. It’s a lot better to register your own domain and let a trustworthy third party do it. I should mention that Phil Windley has a good post about IMAP being a proto-Personal Cloud protocol, if you haven’t read it.
So, with that said, let’s look at some examples…
Tor: Anonymize All the Things
So back to Tor. Tor is built as a distributed, self-organizing network. There are Tor nodes that you connect to, the address for which you get either by getting passed an IP address on the side, or by looking one up publicly where that won’t get you thrown in prison. Once connected to the Tor network your public internet traffic is bounced through the network of Tor nodes in a randomized, encrypted way, and eventually finds its way onto the public internet through Tor Bridges.
The people who run Tor Bridges are paying for your traffic twice, because your connections come into their machine and then out again. Running Tor Bridge is a labor of love, done by people who believe in anonymity and freedom of speech. It doesn’t pay, but knowing that a political dissident somewhere can speak freely about an oppressive regime has a karmic payoff.
A few years ago Amazon’s EC2 cloud computing service started offering a free micro level of service. You could sign up and run a really small cloud server for development or testing without paying. It didn’t cost Amazon much to run them, performance wasn’t really great, but it got people onto their platform. Usually people start up Amazon provided server instances to install software and play around on, but the folks behind Tor realized that they could create a pre-configured server image with the Tor Bridge on it, and let people spin those up in Amazon’s free usage tier. They call it the Tor cloud. You still pay for bandwidth, but if you bridge 15 Gig of bandwidth a month, your bill will only be around $3. It’s less than the price of a latte, and you do something good for internet freedom. You don’t have to know a lot about the cloud to set it up, you just register for Amazon Web Services, pick the image, and hit Start. The images are pre-configured to download software updates and patches, so there’s virtually no maintenance work. Just the kind of simplicity you need for a Personal Cloud feature.
Back It Up Or Lose It: The Archive Team
I’ve harped on our tendency to not take care of the things we create before. Web sites get acquired and shutter within months. Promises are made that users will be able to export their data, but promises are made to be broken. Fortunately for us, there’s a group of archivists led by Jason Scott called Archive Team. Archive Team scrapes sites that are destined for the Internet trash heap, and uploads the data to the Internet Archive. So far they’ve archived sites like Apple’s MobileMe homepages, Yahoo Groups, and are currently trying to grab as much of Posterous as they can before Twitter drops the axe. This may sound pointless till a few years after a company acquires and then shutters the site your mom or sister blogs at or posts family photos to, and you realize there’s no way for you to get that stuff back.
Archive Team runs into a lot of the same issues I had around rate limiters. Yahoo! and Twitter don’t want them slurping down the whole site, they want to take the engineering resources off those projects and let them die a quiet, cost-cutting death. To get around this, Archive Team offers a virtual machine, the Archive Team Warrior.
The Archive Team Warrior is a distributed but centrally managed web spider. The Archive Team central server slices the archiving work up into little chunks, and the Warrior on your computer asks the server for some work to do. The central server gives it a small to-do list of URLs to fetch, and the Warrior starts downloading those until it hits the sites rate limit. Any data it can download, it sends back to the Archive Team server for bundling and uploading into the Internet Archive. Then it waits and retries until the site will let it back in.
The Archive Team manages the projects, and the Warrior presents a simple web interface where you can tweak a few settings and track how you’re doing. Most importantly, it’s hands-off. You can set it up once, and let it run in the background forever. It manages its own software updates, and you can tell it to work on whatever the Archive Teams priorities are, and ignore it from then on. If you have a PC sitting around that you don’t use a lot, running the Warrior is a nice way to give back to the Internet that’s given us so much. It’s good karma, and it’s easy.
Pirate All the Things: Seedboxes
So far we’ve talked primarily about projects which give good karma, now let’s talk about a project that is often used for… not so good karma. In 2001 the BitTorrent protocol was introduced, allowing for a (then) secure way to share lots of files in a bandwidth-optimized fashion. Users get pieces of a file, trackers know who’s downloading the file at any one time, and clients cooperate to distribute the pieces as widely as possible. When you’re downloading a file from BitTorrent it’s entirely likely you’ll be downloading chunks of it from people who don’t have the entire file yet, and likewise you’ll be sharing parts of the files you’ve downloaded with other people who don’t have those pieces yet. By working this way everyone gets it faster.
While BitTorrent might have been secure once, it’s now entirely likely that your ISP knows what you’re downloading, who you’re downloading it from, and what you’re sharing back. They can look at payload sizes, the trackers you’re talking to, traffic bursts, and pretty reasonably reconstruct your activity. If they’re the MPAA or other pirate-hunting groups they can even run their own clients and integrate themselves into the network. Running a BitTorrent client from your home computer and downloading anything remotely illegal is like asking the bagger at the grocery store to help you out with your shoplifted goodies.
So let’s say you’re sharing something that you think should be legal but isn’t, or you’re trying to use BitTorrent for a legal end, like sharing a bundle of book materials or distributing an Operating System or a big chunk of GeoCities and don’t have the bandwidth at home to support it. (Or, sure, you could be downloading Iron Man 3.) This is where something called Seedboxes come into play. A Seedbox is a server at an ISP somewhere that just runs a BitTorrent client. You can use them to get your torrents out to a bunch of people really fast, or you can use them to download files that you wouldn’t be comfortable with downloading to your home IP. You can even buy them in another country, increasing the difficulty of tracing the traffic back to you.
Seedboxes are managed servers, you don’t install software updates on them, the provider does that, but they likely won’t give you much in the way of customer support. Lots of them use a Web UI called ruTorrent, an open source frontend for the rTorrent BitTorrent client. You don’t SSH into these machines, you probably don’t even have a server login, but you can use the web UI, and conduct your business in the cloud.
In this way ruTorrent Seedboxes are a perfect prototype for our Personal Cloud. The providers don’t watch the servers or monitor their quality. Privacy is implicit when you’re doing something at the edge of legality. What they don’t know won’t hurt them as much when Interpol comes calling. The web UIs are built for self-service. You have a login, but the web UI is your entire management plane. rTorrent has an Android front-end, but most people likely manage them through the web. There isn’t any software on your home computer, just a username and password to a web site somewhere. The data’s yours, and if you wanted to shove it sideways into a cloud storage provider, you probably could.
Points of Presence: The Personal VPN
As an addendum to these offerings, a sort of post-script on the idea of exploiting technologies at the edges for personal gain, I’d be remiss if I didn’t mention personal VPNs. Tor’s good for anonymity, but what if you just want to appear like you’re somewhere else. Say, for instance, somewhere the new season of Sherlock, Doctor Who or Downton Abbey is available for streaming 6 months or a year before it comes to your country. (Or vice versa, where we get new episodes of Mad Men a year before they do.) What do you do then?
The same technology that your company uses to securely connect you to your corporate network can be used to make you appear to be in the UK, or the US Midwest, or Japan, or wherever else the content is region-limited. You run the software (likely built-in to your Operating System), and connect somewhat securely to a computer in some other country or even continent, and all your internet traffic appears to come from there.
A few years ago I was in Mexico over Christmas, and there were some really good deals on Steam’s Holiday Sale. I have a US account, with a US billing address and a US credit card, but I couldn’t buy anything because my computer was with me in Mexico. I ended up installing a bunch of software on one of my servers and setting up a VPN to it, just to buy some cheap games. These days I could just plunk down a few bucks and be good to go, and a lot of people do.
A Few Learnings Lessons Learned
Users have problems, and will go to considerable lengths to solve them. None of these services are as easy as they could be, either because they’re niche offerings (Tor and Archive Team) or because of their dubious legality (Seedboxes). ruTorrent is a lot easier to use than it probably was, but it still isn’t as easy as using the Netflix or iPlayer iPad apps. The Warrior is a 174 meg download that requires installing Virtualbox on your computer. The Tor Cloud Bridge requires signing up for Amazon Web Services, and navigating their UI. To get a VPN provider or Seedbox requires research, dealing with a company that might not be entirely legit, and really falls in the class of early adopter technologies.
Even though all this stuff is hard to use, people do it. Seedboxes and private VPNs give people things they want. You may not have known that you wanted to watch the new season of Dr. Who before it comes out in the US, but once you know you can, you’ll go to some pretty extreme lengths to make that happen. Motivation can be powerful, and people will overcome serious technical hurdles if they’re properly motivated.
So looking at these examples, we can see that a Personal Cloud app really needs to offer 3 things:
1. Motivation: It needs to solve a real, immediate problem.
2. Self-Service: It needs to be super-easy to start using and offer a familiar, understandable interface.
3. Hands-Off: It needs to have software updates and easy maintenance built-in.
Any Personal Cloud offerings that don’t check these boxes may get some niche use, and may excite developers, but they aren’t going to start climbing up the adoption curve. As you build your Personal Cloud app, keep these things in mind. Users have needs we can solve, and we can empower them, but our solutions need to be compelling, simple to use, and simple to maintain.