Category Archives: unix

Converting longwords.org from Postgres to MongoDB

I couldn’t find the time to get down to the No:sql(eu) Conference in London this week, but I did want to learn more about NoSQL databases, so I decided the best way to learn would be to move one of my existing websites from a traditional SQL database, to a nosql one.

I picked MongoDB almost at random, and my longwords.org website seemed to be the best option to switch, since I wrote it a couple of years ago and haven’t looked at it since, so it would be good to get to know it again. The site gets about 2500 unique visitors a month, so the traffic isn’t insignificant.

I split the migration process into 3 phases:

  1. Converting the data from Postgres to MongoDB
  2. Converting SQL queries into MongoDB Javascript
  3. Implementing MongoDB Javascript in MongoDB PHP statements

Converting the data from Postgres to MongoDB

Data conversion turned out to be the easiest part of the process.

Exporting data in Postgres is very easy, and longwords is based around 1 single table, so this command ran in psql dumped the data out into a CSV file:

\copy words to ‘/tmp/outputfile.csv’ delimiters ‘,’ with null as ‘0’

The next step was to import that data, which again took just 1 command

/usr/local/mongodb/bin/mongoimport -d wordsdb -c words -f word,number,votes,score —file /tmp/longwordslist.txt —type csv

Easy!

Creating the same indexes as had been in Postgres was simple too, from the mongo console I ran these commands in the new wordsdb database.

db.words.ensureIndex({score:-1});

db.words.ensureIndex({number:1}, {unique: true});

Converting SQL queries into MongoDB Javascript

This part took the longest, simply because I didn’t know the MongoDB syntax beforehand. The longwords site used 3 main select statements, one which pulled out the next word to display, one to return the top 10 list of most popular words, and one to return the count of total votes.

MongoDB query to return single word:

db.words.find({number:1000});

MongoDB query to return top 10 words:

db.words.find().sort( { score : -1 }).limit(10);

MongoDB query to return sum of votes:

db.words.group( { reduce: function(obj,prev) { prev.votes += obj.votes; }, initial: { votes: 0 } } );

Notice the last query makes use of the group function in MongoDB, which is a simplified interface to the MapReduce functionality, and can be used to produce the same result as the “sum(value)” function in SQL.

There were also 2 update statements for when people vote yes or no to a word. These queries needed to increment the number of votes that word has received, and to increment or decrement the score of that word, depending on if the person clicked yes or no.

MongoDB query to increase score and increase votes values:

db.words.update( { word:”ascosporous” }, { $inc: { score : 1, votes : 1 } } );

MongoDB query to decrease score and increase votes values:

db.words.update( { word:”ascosporous” }, { $inc: { score : -1, votes : 1 } } );

With these statements in place, I was ready to implement them in the MongoDB PHP module.

Implementing MongoDB Javascript in MongoDB PHP statements

This took a little bit of time, but really the format changes are pretty obvious once you get used to it.

MongoDB PHP code to return single word:

$totalwords=$words->count();

$randomlength=rand(1,$totalwords);

$result=$words->find(array(‘number’ => $randomlength));

MongoDB PHP code to return top 10 words:

$toprated = $words->find()->sort(array(“score” => -1))->limit(10);

$count=0;

while ($count<10)

{

$row = $toprated->getNext();

$rowword = ucfirst($row[word]);

echo $rowword;

$count++;

}

MongoDB PHP code to return sum of votes:

$keys = array();

$reduce = “function(obj,prev) { prev.votes += obj.votes; }”;

$initial = array(“votes” => 0);

$g = $words->group($keys,$initial,$reduce);

$votecount = $g[retval][0][votes];

MongoDB PHP code to increase score and increase votes values:

$words->update(array(“word” => $longword), array(‘$inc’ => array(“score” => 1,”votes” => 1)));

$words->update(array(“word” => $longword), array(‘$inc’ => array(“score” => -1,”votes” => 1)));

Results

There was really only 1 issue with the conversion, and it’s one that I still haven’t overcome – the query to return the sum of votes causes significant CPU usage, unlike the original SQL statement which was a simple “select sum(votes) from words” query.

Until I come up with a solution, I’ve disabled that small section of the longwords page, but hopefully I’ll find a suitable replacement statement. If you’ve got any suggestions, I’d love to hear them!

Other than that query, CPU and memory usage is minimal, as is disk I/O – there’s certainly nothing which would make me think that MongoDB isn’t a practical replacement for MySQL or Postgres for many websites.

MongoDB init.d script for Linux

If you’re thinking of trying MongoDB, one thing you’ll notice that doesn’t come with the binary downloads is an mongodb init.d script for Linux distributions, so here’s mine. It’s based on having mongodb installed in /usr/local/mongodb with the binaries in the /bin directory under there, but if you’ve got the binaries somewhere else you just need to change one line at the top of the file.

All you need to do is download the file to your Linux server, rename it from mongodb.txt to mongodb and copy it to /etc/init.d, then fix the permissions. The following commands should do it

cp mongodb.txt /etc/init.d/mongodb

chmod 755 /etc/init.d/mongodb

chown root:root /etc/init.d mongodb

Finally, you just need to run the following 2 commands to set mongodb to start on boot:

chkconfig —add mongodb

chkconfig mongodb on

The script works on Centos, but should work fine on Ubuntu, RHEL, etc with minimal changes.

Arista Network-Blog: Arista 7500 (via abnerg)

I’m glad the 7500 is finally out. It should light a fire under anyone who claims to care about ethernet performance and value. The first time I saw the specs on a roadmap, I was pretty well blown away. It looks like what they brought to market is as good. Look forward to seeing some benchmarking and test results.

(via irq)