Build a Real Time Activity Stream on Cloud Foundry with Node.js, Redis and MongoDB 2.0 - Part III

In Part II of this series, we covered the architecture needed for persisting the Activity Streams to MongoDB and fanning it out in real-time to all the clients using Redis PubSub.

Since then, some exciting new Node.js features for Cloud Foundry were launched. In addition, the MongoDB version on Cloud Foundry has been upgraded to 2.0.

In this blog post we will cover how to:

Use Mongoose-Auth to store basic user information, including information from Facebook, Twitter, and Github, and how we made this module with native dependencies work on Cloud Foundry
Use Mongo GridFS and ImageMagick to store user uploaded photos and profile pictures
Perform powerful stream filtering, thanks to new capabilities exposed in MongoDB 2.0
Update the UX of the app to become a real-time stream client using Bootstrap, Backbone.js and Jade.

Offering SSO and Persisting Users

The requirement for this boilerplate Activity Streams App was to allow users to log in with Facebook, Twitter, or Github, and to persist this user data in the database.

I am sure many of you are familiar with how to store user information in a database and perhaps even in MongoDB. What some of you may not have tried is storing third party user information like the one we obtain when users log in with Facebook, Twitter or Github. In a relational database, you would probably store this information across multiple records in multiple tables (e.g., users, services, accounts, or auth). However, with MongoDB you can use embedded documents and store all this information in a single document, thus reducing the complexity of the operation. I found that this was very easy to do using @bnoguchi’s mongoose-auth which decorates the Mongoose User Schema with the third party services fields.

My previous version of the app was using another popular module from Brian called everyauth which handled SSO very well, but it did not persist the user info. It was fairly straightforward to upgrade from everyauth to mongoose-auth.

First, I updated a helper module I created called activity-streams-mongoose to offer a User Schema and made it possible to extend all schemas. Then I loaded mongoose-auth and decorated that schema with the specifics needed. You can see the exact code changes in this diff. The key part in the upgrade was to normalize the user data as it was saved. I did this by leveraging the pre-save callback MongooseJS offers.

Here is a snippet of the code:

var streamLib = require('activity-streams-mongoose')({
  mongoUrl: app.siteConf.mongoUrl,
  redis: app.siteConf.redisOptions,
  defaultActor: defaultAvatar
});

var authentication = new require('./authentication.js')(streamLib, app.siteConf);

// Moved normalization to only be done on pre save
streamLib.types.UserSchema.pre('save', function (next) {
  var user = this;
  var svcUrl = null;
  if (user.fb && user.fb.id) {
    user.displayName = "FB: " + user.fb.name.full;
    asmsDB.ActivityObject.findOne().where('url', facebookHash.url).exec(function(err, doc){
      if (err) throw err;
      user.author = doc._id;
      // Need to fetch the users image...
      https.get({
        'host': 'graph.facebook.com'
        , 'path': '/me/picture?access_token='+ user.fb.accessToken
      }, function(response) {
        user.image = {url: response.headers.location};
        next();
      }).on('error', function(e) {
        next();
      });
    })
  } else  {
    if (user.github && user.github.id) {
      user.displayName = "GitHub: " + user.github.name;
      var avatar = 'http://1.gravatar.com/avatar/'+ user.github.gravatarId + '?s=48'
      user.image = {url: avatar};
      svcUrl = githubHash.url;
    } else if (user.twit && user.twit.id) {
      user.displayName = "Twitter: " + user.twit.name;
      user.image = {url: user.twit.profileImageUrl};
      svcUrl = twitterHash.url;
    }

    if(!user.actor) {
      asmsDB.ActivityObject.findOne().where('url', svcUrl).exec(function(err, doc){
        user.author = doc;
        next();
      });
    } else {
      next();
    }
  }
  });

var asmsDB = new streamLib.DB(streamLib.db, streamLib.types);
streamLib.asmsDB = asmsDB;

MongoDB 2.0 and Cloud Foundry

For those of you not familiar with MongoDB 2.0, one neat feature is that it supports Multi-location Documents. I was also able to add a location property to the core Activity Object in the activity-streams-mongoose module which can be used with the Activities, Activity Objects and User collections to allow performing Geo Queries.

var LocationHash = {
  displayName: {type: String},
  position: {
    latitude: Number,
    longitude: Number
  }
};

var ActivityObjectHash = {
  id: {type: String},
  image: {type: MediaLinkHash, default: null},
  icon: {type: MediaLinkHash, default: null},
  displayName: {type: String},
  summary: {type: String},
  content: {type: String},
  url: {type:String},
  published: {type: Date, default: null},
  objectType: {type: String},
  updated: {type: Date, default: null},
  location: LocationHash,
  fullImage : {type: MediaLinkHash, default: null},
  thumbnail : {type: MediaLinkHash, default: null},
  author : {type: ObjectId, ref: "activityObject"},
  attachments : [{type: ObjectId, ref: 'activityObject'}],
  upstreamDuplicates : [{type: String, default: null}],
  downstreamDuplicates : [{type: String, default: null}]
};

Another interesting geo feature in MongoDB 2.0 is polygonal search. This means that you can search whether a given object is in a specified area by providing the area points. For example, this can be helpful when you want to see if certain objects, like houses, are in a certain zip code.

Working with Node.js Modules with Native Dependencies

Mongoose-Auth requires a module bcrypt which has a native dependency (it gets compiled locally when you do a Node Package Manager (npm) install and the binary placed in the mode_modules directory). If you are working from a Mac or Windows and deploying to the cloud you can run into issues by including your node_modules folder. Luckily, now there is support in Cloud Foundry for excluding the node_modules folder and having Cloud Foundry fetch and build the npm modules server-side.

$ npm shrinkwrap
$ echo "{ignoreNodeModules: true}" > cloudfoundry.json
$ vmc update

For more info you can read this blog post from Maria and/or watch this helpful video from Raja.

My recommendation is that when you start a new Node.js App make sure to add the cloudfoundry.json file with skip node_modules folder set to true so all the native dependencies are built directly on Cloud Foundry. Also don’t forget to run npm shrinkwrap if you change package.json.

User Uploaded Photos with ImageMagick and Mongo GridFS

One of the most engaging objects to show in a web app like this
Activity Stream boilerplate app are photos. Apps like Instagram and Pinterest have taken photo sharing to a whole new level and have completely redesigned the UX of photo feeds. We wanted to help developers build Activity Stream apps with rich photo sharing and thus needed a library to help us manipulate images and a place to store all these images. Since we were already using MongoDB, I decided to leverage Mongo GridFS to store the images.

I had previously worked storing photos in GridFS but it was from Ruby. It was a little bit more challenging to find the right tools in Node.js. I found a lot of npm modules which seemed to handle it for me, but found that they were either unfinished or loaded several additional components which were incompatible. I really wanted to keep it simple so I ended up following the documentation on the official Node.js Mongo DB driver and creating a few routes to handle creating photos and viewing photos.

Here is how I ingested the photos into Mongo GridFS:

var im = require('imagemagick');
var Guid = require('guid');
var siteConf = require('./lib/getConfig');
var lib = new require('./lib/asms-client.js')(app, cf).streamLib;

function ingestPhoto(req, res, next){
  if (req.files.image) {
    im.identify(req.files.image.path, function(err, features){
      if (features && features.width) {
        var guid = Guid.create();
        // Concatenating name to guid guarantees that we always have
        // unique file names
        var fileId = guid + '/' + req.files.image.name;
        // The GridStore class is the equivalent of a File class but has the
        // added benefit of allowing you to store metadata
        var gs = lib.GridStore(lib.realMongoDB, fileId, "w", {
          content_type : req.files.image.type,
          // metadata is optional
          metadata : {
            author: req.session.user._id,
            public : false,
            filename: req.files.image.name,
            path: req.files.image.path,
            width: features.width,
            height: features.height,
            format: features.format,
            size_kb: req.files.image.size / 1024 | 0
          }
        });
         // This command copies the file from the file system(temp dir)
         // to GridFS.
         // GridFS supports any file size by breaking it into chunks
         // behind the scenes
        gs.writeFile(req.files.image.path, function(err, doc){
          if (err) {
            next(err);
          } else {
            if (! req.photosUploaded) {
              req.photosUploaded = {};
            }
            // I have another express route to serve the photos by fileId
            var url = siteConf.uri + "/photos/" + fileId;
            // Add the results of the upload to the chain
            req.photosUploaded['original'] = {url : url, metadata: gs.metadata};
            req.nextSizeIndex = 0;
            next();
          }
        });
      } else {
        if (err) throw err;
        throw(new Error("Cannot get width for photo"));
      }
    });
  } else {
    next(new Error("Could not find the file"));
  }
};

The above snippet shows how I used ImageMagick to get the photo dimensions. ImageMagick is an amazing open source software suite for manipulating images and there is an easy to use ImageMagick node module to expose its functionality.

An open source project backed by years of continual development, ImageMagick supports about 100 image formats and can perform impressive operations such as creating images from scratch; changing colors; stretching, rotating, and overlaying images; and overlaying text on images.

ImageMagick.org

And this snippet shows how to produce a new image of smaller size:

var im = require('imagemagick');
var Guid = require('guid');

function reducePhoto(req, res, next){
    var photoIngested = req.photosUploaded['original'];
    if (photoIngested) {
        var sizeName = sizes[req.nextSizeIndex].name;
        var destPath = photoIngested.metadata.path + '-' + sizeName ;
        var nameParts = photoIngested.metadata.filename.split('.');
        var newName = nameParts[0] + '-' + sizeName + '.' + nameParts[1];
        var width = sizes[req.nextSizeIndex].width;

        im.resize({
          srcPath: photoIngested.metadata.path,
          dstPath: destPath,
          width:   width
        }, function(err, stdout, stderr){
          if (err) {
              next(err);
          } else {
            console.log("The photo was resized to " + width + "px wide");
            var guid = Guid.create();
            var fileId = guid + '/' + newName;
            var ratio = photoIngested.metadata.width / width;
            var height = photoIngested.metadata.height / ratio;
            var gs = asmsClient.streamLib.GridStore(asmsClient.streamLib.realMongoDB, fileId, "w", {
                  content_type : req.files.image.type,
                  metadata : {
                      author: req.session.user._id,
                      public : false,
                      filename: newName,
                      width: width,
                      height: height,
                      path: destPath
                  }
              });
              gs.writeFile(destPath, function(err, doc){
                  if (err) {
                    next(err);
                  } else {
                      var url = siteConf.uri + "/photos/" + fileId;
                      req.photosUploaded[sizeName] = {url : url, metadata: gs.metadata};
                      req.nextSizeIndex = req.nextSizeIndex + 1;
                      next();
                  }
              });
          }
        });
    }
};

The only gotcha on Cloud Foundry was that it did not set for us the environment variable TMP which is used by the formidable module to offer a temp directory where the files are first uploaded. Once I set it using env-add the problem was solved.

bash-3.2$ vmc files asms app/tmp

   36272c476f10ecbf0e3a99481a8d365b         50.6K

All apps on Cloud Foundry have permissions to write files to its own directory or a subdirectory. Set the environment variable TMP to a subdir if you are working with express/formidable to have it handle form uploads.

More Powerful Queries with MongoDB
Part of the beauty of the ActivityStrea.ms format is that it provides a lot of metadata about each activity which can then be searched, aggregated and pivoted to draw interesting conclusions about certain topics and trends.

Examples of these fields are: Hashtags or Topics, Verb, Object Types, Actor Types and Location.

The first step to allowing users to analyze the stream data is providing them with a map of their universe. This means allowing them to see all the possible values for each field. For example, if we are talking about hashtags, then we would show our users that the population has used so far ten hashtags. We would then reveal the distribution in usage across everyone. Then we would provide our users with the ability to drill in by segmenting via actor type or location, for example. This could yield interesting results showing where certain topics are most popular.

var getDistinct = function (req, res, next, term, init){
  var key = 'used.' + term;
  req[key] = init ? init : [];
  var query = {streams: req.session.desiredStream};
  asmsDB.Activity.distinct(term, query, function(err, docs) {
    if (!err && docs) {
      _.each(docs, function(result){
        req[key].push(result);
      });
      next();
    } else {
      next(new Error('Failed to fetch distinct ' + term));
    }
  });
}

//..

function getDistinctVerbs(req, res, next){
  getDistinct(req, res, next, 'verb');
};

function getDistinctActors(req, res, next){
  getDistinct(req, res, next, 'actor');
};

function getDistinctObjects(req, res, next){
  getDistinct(req, res, next, 'object', ['none']);
};

function getDistinctObjectTypes(req, res, next){
  getDistinct(req, res, next, 'object.object.type', ['none']);
};

function getDistinctActorObjectTypes(req, res, next){
  getDistinct(req, res, next, 'actor.object.type', ['none']);
};

//...

app.get('/streams/:streamName', loadUser, getDistinctStreams, getDistinctVerbs, getDistinctObjects, getDistinctActors,
  getDistinctObjectTypes, getDistinctActorObjectTypes, getDistinctVerbs, getMetaData, function(req, res) {

    asmsClient.asmsDB.Activity.getStream(req.params.streamName, 20, function (err, docs) {
    var activities = [];
    if (!err && docs) {
      activities = docs;
    }
    req.streams[req.params.streamName].items = activities;
    var data = {
      currentUser: req.user,
      streams : req.streams,
      desiredStream : req.session.desiredStream,
      actorTypes: req.actorTypes,
      objectTypes : req.objectTypes,
      verbs: req.verbs,
      usedVerbs: req['used.verb'],
      usedObjects: req['used.object'],
      usedObjectTypes: req['used.object.type'],
      usedActorObjectTypes: req['used.actor.object.type'],
      usedActors: req['used.actor']
    };
    if (req.is('json')) {
      res.json(data);

    } else {
       res.render('index', data);
    }
  });

});

A Robust Activity Stream UX

The initial node-express-boilerplate app had some basic jQuery used to show plain text messages and users’ photos. In the new app, we have much richer messages and the ability to post and filter them. For this reason, we decided to use some of the great client-side open source tools available today.

After some consideration, we ended up using these three tools:

Backbone.js: A lightweight client-side MVC framework
Bootstrap: A set of CSS, HTML and Javascript components which help developers produce great looking apps, without needing to start from scratch
Jade: A templating language with the help of ClientJade

Templating, Markup and CSS

If you are a web developer, you probably know that there are many choices in tools to render HTML dynamically. A good number of web developers prefer to use templating engines to render HTML because these help you produce more readable code. Using declarative programming, you can interpolate variables and directives in the HTML. The most popular templating engine for Node.js is Embedded JavaScript (EJS), which resembles ERB in Ruby. This is what the node-express-boilerplate project included. When I started working with Node.js, I found many more choices that were not present in the Ruby world such as: Mustache, Handlebars, Dust and Jade. In fact, LinkedIn wrote an excellent blog post discussing the many alternative choices for templating engines.

I ended up selecting Jade because I already liked HAML, which is very similar to Jade in its terseness. Both templating languages use indentation to understand the hierarchy of elements. Jade is even terser than HAML because it removes the need to put % in front of the HTML tags.

Another cool thing about Jade is that it already had support for server-side and client-side rendering via ClientJade. Here is how I broke out the views allowing easy extension of the object types.

I then compiled the Jade views into js for faster client-side rendering with ClientJade.

clientjade views/*.* > public/js/templates.js

Once this was done, I simply included templates.js in the list of files to be minified and used it like this from Backbone:

var ActivityCreateView = Backbone.View.extend({
    el: '#new_activity',
    initialize: function(){
        _.bindAll(this, 'newAct', 'render', 'changeType', 'includeLocation', 'sendMessage');

        this.trimForServer = App.helper.trimForServer;

        var streamName = this.$el.find('#streamName').val();
        var verb = this.trimForServer(this.$el.find('#verb-show'));
        var objectType = this.trimForServer(this.$el.find('#object-show'));

        this.newAct(streamName, verb, objectType);
        this.render();
    },
    events: {
        "click .type-select" : "changeType",
        "click #includeLocation" : "includeLocation",
        "click #send-message" : "sendMessage"
    },
    newAct : function(streamName, verb, objectType) {
        this.streamName = streamName;
        this.model = new Activity({
            object: {content: '', objectType: objectType, title: '', url: ''},
            verb: verb,
            streams: [streamName]
        });
    },
    render: function(){
      var actData = this.model.toJSON();
      this.$el.find("#specific-activity-input").html(jade.templates[actData.object.objectType]());

      return this; // for chainable calls, like .render().el
    },
    changeType : function(event) {
        console.log(event);
        var itemName = $(event.target).data("type-show");
        if (itemName) {
            $("#" + itemName)[0].innerHTML = event.target.text + "  ";
            var val = this.trimForServer(event.target.text);
            if (itemName == "verb-show") {
                this.model.set('verb', val);
            } else {
                var obj = this.model.get('object');
                obj.objectType = val;
                this.model.set('object', obj);
            }
        }
        this.render();
    },
    includeLocation : function(event) {
        if (navigator.geolocation) {
            navigator.geolocation.getCurrentPosition(App.helper.getLocation);
        } else {
            alert("Geo Location is not supported on your device");
        }
    },
    sendMessage : function() {
        console.log("In send message");

        var obj = this.model.get('object');
        obj.content = $("#msg").val();
        obj.url = $('#url').val();
        obj.title = $('#title').val();
        obj.objectType = this.trimForServer($('#object-show'));
        this.model.set('object', obj);

        var streamName = $('#streamName').val();
        this.model.set('streams', [streamName]);

        var verb = this.trimForServer($('#verb-show'));
        this.model.set('verb', verb);

        if (this.model.isValid()) {
            if (this.model.save()) {
                this.newAct(streamName, verb, obj.objectType);
                this.render();
            }
        }

    }

});

The original node-express-boilerplate app was using 960gs and jQuery. However, this activity streams boilerplate app is a bit more complex so I switched to using Twitter’s Bootstrap as the first step. This provided me a nice way to the nav bar, hero unit, modals, drop downs and so on. Also it was easy enough to go from using one grid system to another. For the moment the app is using the default grid system but it can easily be made to use Bootstrap’s Fluid Layout and Responsive Design enhancements.

Manipulating Data on the Client with Backbone.js

Instead of having a large number of individual jQuery handlers on HTML elements, Backbone.js helps you break your UX apart into components called Views which are more similar to Controllers when thinking of server-side MVC frameworks. These Backbone views can take Backbone models and templates and render them, as well as listen for events on the elements that comprise the view. You can see in the snippet above we have a Backbone view that works with a Backbone model for an Activity.
Backbone Models are pretty simple classes you create by detailing all the properties for the model and validation rules. Here is the code for the Activity Backbone Model which is also used to create the ActivityStreamView:

var Activity = Backbone.Model.extend({
    url : "/activities",
    // From activity-streams-mongoose/lib/activityMongoose.js
    defaults: {
        verb: 'post',
        object: null, //ActivityObject
        actor: null, //ActivityObject
        url: '',
        title: '',
        content: '',
        icon: null, // MediaLinkHash
        target: null, //ActivityObject
        published: Date.now,
        updated: Date.now,
        inReplyTo: null, //Activity
        provider: null, //ActivityObject
        generator: null, //ActivityObject
        streams: ['firehose'],
        likes: {},
        likes_count: 0,
        comments: [],
        comments_count: 0,
        userFriendlyDate: 'No idea when'
    },
    validate: function(attrs) {

    if (! attrs.object) {
        return "Object is missing"
    }
    if (!attrs.object.title) {
      return "Title is missing";
    }
  }
});

You can then easily instantiate passing a javascript bare object. In the example below, I took the output from socket.io when a message comes in, converted it to an object and added it to the Backbone collection associated with the Stream View:

var streamView = new ActivityStreamView();
App.socketIoClient.on('message', function(json) {
  var doc = JSON.parse(json);
    if (doc) {
      streamView.collection.add(new Activity(doc));
    }
});

Working with Backbone.js was very fun but it definitely takes some time to convert all your logic to using Backbone Views and Models. In this example, I only used Backbone.js for a subset of the app. Having Jade as my templating language and Node.js allowed me to share code between server and client. If you have a trivial application, you may not need to use Backbone.js and may be able to keep it simple with express and server side templates. In the case of this Activity Streams application, which syndicates in real time and offers the ability to react to any new item, it made sense to use Backbone.js. It also allowed me to provide the hooks for the next iteration of this app. After all, this is a boilerplate app.

Remember that it is very easy to push updates to your application on Cloud Foundry as you make progress doing:

bash-3.2$ vmc update

Updating application 'asms'...
Uploading Application:
  Checking for available resources: OK
  Processing resources: OK
  Packing application: OK
  Uploading (380K): OK
Push Status: OK
Stopping Application 'asms': OK
Staging Application 'asms': OK
Starting Application 'asms': OK

Conclusion

As a contributor to the ActivityStrea.ms specification, I find it necessary (and fun) to get my hands dirty building the apps which use open standards to see where there are limitations and what technologies can make it easier. Working with MongoDB proved to be the right choice, giving me the ability to do complex queries, aggregation and full modeling of my objects which are needed for quickly painting the stream. I am really happy that MongoDB 2.0 is now running on Cloud Foundry because a lot of the Object Document Mappers like Mongoid in Ruby only support 2.x. This is a very exciting time to be a developer, as things are moving very fast and there is ample opportunity to make a difference via open source.

Here is the final architecture of the app at a very high-level: