A Kindle or a house extension?

There’s been lots of discussion only recently about ebooks and the new Kindle in particular, covering all the downsides of the Kindle and it’s competitors in the ebook market.

All the downsides of ebooks are real and valid:

You lose the look, smell and feel of the books we grew up with
You can’t resell or give away your books once you’re finished with them
Library borrowing is awkward at best, impossible on the Kindle
You own your own books, noone can take it away unlike Amazon did with George Orwell’s 1984
You can take a book in the bath or the rain, worst case, you damage one book
Regular paperback books are generally cheaper than ebooks – this is true, and very silly

And I’m sure there’s dozens of others that are equally valid, but they don’t outweigh one single huge upside of ebooks for me:

Space is expensive!

I’ve recently given boxes and boxes of books to charity, after I moved house and had nowhere to put my old books. At first, I kept them in the perfect storage space – my parents house, but that’s not really a long-term solution is it? A Kindle will store over 3000 books, and if you somehow fill it, then you can delete some of books from the device and re-download them in the future, Amazon retain your purchase list and let you retrieve the files whenever you want.

Of course there’s other nice things about the Kindle, the 3G web access in 100 countries is a fantastic feature for people who like to travel while keeping in touch with people, text searching of books to find a quote you’re hunting for, the flexibility of being able to access 500,000 or more books on demand, but fundamentally it’s the space.

So if someone else tells you the Kindle is too expensive and ebooks cost too much, ask them how much it would cost to build a library extension to my house so I can keep 3000 books in it 🙂

OpenStack – The future of private and public clouds?

This week, Rackspace and NASA (a cloud computing pioneer), annouced a major contribution of source code to the open-source community, with the launch of OpenStack – a project to develop the software needed to deploy and operate a fully operational cloud computing solution.

Combining work from Rackspace, who run a large public cloud system, and NASA who were among the first to develop private cloud systems, the new OpenStack system currently consists of “OpenStack Object Store”, a cloud-scale storage solution based on Rackspace Cloud Storage, and the newly developed “OpenStack Compute”, the basis for an Amazon EC2 competitor providing computing infrastructure on demand.

So what do Rackspace get out of this? Well, if things go to plan for Rackspace, then in 5 years you’ll be running your applications on an OpenStack cloud, which Rackspace will manage either in their own data centre as part of the Rackspace public cloud or as a dedicated set of machines in a private cloud they host for you, or even as a hybrid cloud with a baseline cloud computing capacity in an enterprise’s own data centre, with extra capacity available on demand in the Rackspace cloud. Of course, you could choose to work with someone else on OpenStack, but Rackspace will be hoping you choose to stick with a company that obviously knows the code well and has been running it successfully for several years. There’s a video interview with one of the Rackspace Cloud founders on Redmonk where this subject comes up.

While these contributions from Rackspace and NASA are significant pieces of the cloud puzzle, the real work of OpenStack is still to come – they have signed up 25 partner organisations, and are now working hard on completing the development and testing of the systems, and adding functionality.

The possibilities for “OpenStart Compute” in particular are significant, with cooperation from across the industry, we could see the rapid inclusion of technologies like CloudAudit, which helps companies verify the security capabilites of a cloud computing platform, and “Open vSwitch”, a network switch that operates inside the cloud, providing the management and security capabilities of a physical network switch but without many of the limitatioons that go with physical cabling.

Assuming OpenStack develops positively, it’s likely that there will be rapid additions of new systems like an “OpenStack Message Queue”, and “OpenStack Block Storage”, though much of the development will depend on the willingness of contributors to either hand over code that is currently closed source, or to start again with a clean slate and re-develop solutions based on the lessons they’ve previously learnt.

The other possibility is that Amazon continues to take the majority share of the cloud computing market, continues to grow their economies of scale and overall cost leadership, adds functionality to match any new additions to OpenStack (currently Amazon S3 and EC2 more than match OpenStack’s capabilities), and people learn to live with the limitations of a public cloud secutiy model.

Either way, the future of computing is significantly different from the way it operates today for most organisations.

Converting longwords.org from Postgres to MongoDB

I couldn’t find the time to get down to the No:sql(eu) Conference in London this week, but I did want to learn more about NoSQL databases, so I decided the best way to learn would be to move one of my existing websites from a traditional SQL database, to a nosql one.

I picked MongoDB almost at random, and my longwords.org website seemed to be the best option to switch, since I wrote it a couple of years ago and haven’t looked at it since, so it would be good to get to know it again. The site gets about 2500 unique visitors a month, so the traffic isn’t insignificant.

I split the migration process into 3 phases:

Converting the data from Postgres to MongoDB
Converting SQL queries into MongoDB Javascript
Implementing MongoDB Javascript in MongoDB PHP statements

Converting the data from Postgres to MongoDB

Data conversion turned out to be the easiest part of the process.

Exporting data in Postgres is very easy, and longwords is based around 1 single table, so this command ran in psql dumped the data out into a CSV file:

\copy words to ‘/tmp/outputfile.csv’ delimiters ‘,’ with null as ‘0’

The next step was to import that data, which again took just 1 command

/usr/local/mongodb/bin/mongoimport -d wordsdb -c words -f word,number,votes,score —file /tmp/longwordslist.txt —type csv

Easy!

Creating the same indexes as had been in Postgres was simple too, from the mongo console I ran these commands in the new wordsdb database.

db.words.ensureIndex({score:-1});

db.words.ensureIndex({number:1}, {unique: true});

Converting SQL queries into MongoDB Javascript

This part took the longest, simply because I didn’t know the MongoDB syntax beforehand. The longwords site used 3 main select statements, one which pulled out the next word to display, one to return the top 10 list of most popular words, and one to return the count of total votes.

MongoDB query to return single word:

db.words.find({number:1000});

MongoDB query to return top 10 words:

db.words.find().sort( { score : -1 }).limit(10);

MongoDB query to return sum of votes:

db.words.group( { reduce: function(obj,prev) { prev.votes += obj.votes; }, initial: { votes: 0 } } );

Notice the last query makes use of the group function in MongoDB, which is a simplified interface to the MapReduce functionality, and can be used to produce the same result as the “sum(value)” function in SQL.

There were also 2 update statements for when people vote yes or no to a word. These queries needed to increment the number of votes that word has received, and to increment or decrement the score of that word, depending on if the person clicked yes or no.

MongoDB query to increase score and increase votes values:

db.words.update( { word:”ascosporous” }, { $inc: { score : 1, votes : 1 } } );

MongoDB query to decrease score and increase votes values:

db.words.update( { word:”ascosporous” }, { $inc: { score : -1, votes : 1 } } );

With these statements in place, I was ready to implement them in the MongoDB PHP module.

Implementing MongoDB Javascript in MongoDB PHP statements

This took a little bit of time, but really the format changes are pretty obvious once you get used to it.

MongoDB PHP code to return single word:

$totalwords=$words->count();

$randomlength=rand(1,$totalwords);

$result=$words->find(array(‘number’ => $randomlength));

MongoDB PHP code to return top 10 words:

$toprated = $words->find()->sort(array(“score” => -1))->limit(10);

$count=0;

while ($count<10)

{

$row = $toprated->getNext();

$rowword = ucfirst($row[word]);

echo $rowword;

$count++;

}

MongoDB PHP code to return sum of votes:

$keys = array();

$reduce = “function(obj,prev) { prev.votes += obj.votes; }”;

$initial = array(“votes” => 0);

$g = $words->group($keys,$initial,$reduce);

$votecount = $g[retval][0][votes];

MongoDB PHP code to increase score and increase votes values:

$words->update(array(“word” => $longword), array(‘$inc’ => array(“score” => 1,”votes” => 1)));

$words->update(array(“word” => $longword), array(‘$inc’ => array(“score” => -1,”votes” => 1)));

Results

There was really only 1 issue with the conversion, and it’s one that I still haven’t overcome – the query to return the sum of votes causes significant CPU usage, unlike the original SQL statement which was a simple “select sum(votes) from words” query.

Until I come up with a solution, I’ve disabled that small section of the longwords page, but hopefully I’ll find a suitable replacement statement. If you’ve got any suggestions, I’d love to hear them!

Other than that query, CPU and memory usage is minimal, as is disk I/O – there’s certainly nothing which would make me think that MongoDB isn’t a practical replacement for MySQL or Postgres for many websites.

Ewan's Blog on IT and stuff