Developing a D3.js Edge

9. Introducing Crossfilter

Set up Crossfilter.js for our transit stop data sets
Define a dimension with Crossfilter.js
Create a filter method in Crossfilter.js

For our example, we are going to employ Crossfilter.js to make sense of our massive transit stop data sets. We have chosen Crossfilter.js for several reason. First off, it was developed in a large part by Mike Bostock, the developer of D3.js. This means that the patterns used in Crossfilter are very similar to those in D3.js. This makes the code more cohesive and lessens the learning curve. In addition, Crossfilter will allow us to link our visualizations later on as it provides some great filtering methods that we can use to allow for some exploratory analytics. Crossfilter also has some great performance features when working with massive data sets, like ours. This is very important in the browser environment to ensure a pleasant user experience.

Finally, Crossfilter allows us to define our own aggregate functions so that we may investigate metrics that are of interest to us. With minor modifications, we can simply re-filter, or re-aggregate the data with Crossfilter, feed it into our reusable graphing modules, and we will have a completely new dimension that we can evaluate. This is a great feature to have when doing exploratory analytics.

Setting Up Crossfilter

Source code and data files are available in the code/Chapter09/SettingUpCrossfilter/ directory tree.

There is plenty of documentation to help get you started with using Crossfilter, but we will discuss it as we work through the example. In our application, we want to use Crossfilter to make sense of the transit stop metrics that have been provided to us.

The first step to setup Crossfilter is simply to create a new Crossfilter like so:

001:// Define our data manager module.
002:d3Edge.dataManager = function module() {
003:    var exports = {},
004:        dispatch = d3.dispatch('geoReady', 'dataReady', 'dataLoading'),
005:        data,
006:        // Create a new Crossfilter.
007:        transitCrossfilter = crossfilter();
008:    //...
009:}

Here, we create a new empty Crossfilter via transitCrossfilter = crossfilter();. We assigned this to the local transitCrossfilter within our module, which makes it available to us throughout the module. We now have all the Crossfilter methods available on our transitCrossfilter variable, but without any data it isn't much use. Adding data to our Crossfilter is quite simple, we simply call:

001:// Add data to our Crossfilter.
002:transitCrossfilter.add(data);

For our example, we want to add the transit stop metrics after the data has been loaded and cleaned in the browser. If you remember back to our data manager module, we had a method called loadCsvData that took care of this for us. So to populate our Crossfilter, we just need to put the code snippet above after our data cleaning in this method and call our loadCsvData method.

001:// Create a method to load the csv file, and apply cleaning function asynchronously.
002:exports.loadCsvData = function(_file, _cleaningFunc) {
003:    // Create the csv request using d3.csv.
004:    var loadCsv = d3.csv(_file);
005:    // On the progress event, dispatch the custom dataLoading event.
006:    loadCsv.on('progress', function() {
007:        dispatch.dataLoading(d3.event.loaded);
008:    });
009:    loadCsv.get(function (_err, _response) {
010:        // Apply the cleaning function supplied in the _cleaningFunc parameter.
011:        _response.forEach(function (d) {
012:            _cleaningFunc(d);
013:        });
014:        // Assign the cleaned response to our data variable.
015:        data = _response;
016:        // Add data to our Crossfilter.
017:        transitCrossfilter.add(_response);
018:        // Dispatch our custom dataReady event passing in the cleaned data.
019:        dispatch.dataReady(_response);
020:    });
021:};
022:zurichDataManager.loadCsvData('./data/zurich/zurich_delay.csv', function(d) {
023:    var timeFormat = d3.time.format('%Y-%m-%d %H:%M:%S %p');
024:    d.DELAY = +d.DELAY_MIN;
025:    delete d.DELAY_MIN;
026:    d.SCHEDULED = timeFormat.parse(d.SCHEDULED);
027:    d.LATITUDE = +d.LATITUDE;
028:    d.LONGITUDE = +d.LONGITUDE;
029:    d.LOCATION = [d.LONGITUDE, d.LATITUDE];
030:});

Now that we have populated our Crossfilter, let's add a convenience method to our data manager module that will allow us to inspect the size of our Crossfilter. The Crossfilter API provides us with a .size() method that we can invoke to get the number of records in our Crossfilter.

001:// Create a convenience method to get the size of our Crossfilter
002:exports.getCrossfilterSize = function () {
003:    return transitCrossfilter.size();
004:};

If we invoke this method on the Zurich data manager we should see:

001:zurichDataManager.getCrossfilterSize();
002:RETURNS 219371

Location Dimension

The source code and data files are available in the code/Chapter09/LocationDimension/ directory tree.

Now that our Crossfilter has data in it, let's define a dimension. In Crossfilter, a dimension is exactly what it sounds like: a dimension of the data that is of interest to us. We use the dimension to filter on, group on, and compute aggregate statistics on. Each of our data sets has a latitude and longitude field. In our data cleaning function we combined these into a location field and we will create a Crossfilter dimension on this field. To create a dimension, we call the dimension method on our Crossfilter and define an accessor function much like D3.js. First, let us create a local variable, location, in our module that we can assign our dimension to:

001:// Define our data manager module.
002:d3Edge.dataManager = function module() {
003:    var exports = {},
004:        dispatch = d3.dispatch('geoReady', 'dataReady', 'dataLoading'),
005:        data,
006:        // Instantiate a new Crossfilter.
007:        transitCrossfilter = crossfilter(),
008:        // Define a location variable for our location dimension.
009:        location;
010:    //..........
011:};

Now, we can create our dimension after our data loads:

001:// Create a method to load the csv file, and apply cleaning function asynchronously.
002:exports.loadCsvData = function(_file, _cleaningFunc) {
003:    // Create the csv request using d3.csv.
004:    var loadCsv = d3.csv(_file);
005:    // On the progress event, dispatch the custom dataLoading event.
006:    loadCsv.on('progress', function() {
007:        dispatch.dataLoading(d3.event.loaded);
008:    });
009:    loadCsv.get(function (_err, _response) {
010:        // Apply the cleaning function supplied in the _cleaningFunc parameter.
011:        _response.forEach(function (d) {
012:            _cleaningFunc(d);
013:        });
014:        // Assign the cleaned response to our data variable.
015:        data = _response;
016:        // Add data to our Crossfilter.
017:        transitCrossfilter.add(_response);
018:        // Setup the location dimension.
019:        location = transitCrossfilter.dimension(function (d) {
020:            return d.LOCATION;
021:        });
022:       // Dispatch our custom dataReady event passing in the cleaned data.
023:       dispatch.dataReady(_response);
024:   });
025:};

In the code above, once our data has loaded, we define our accessor function for our location dimension. This will create a crossfilter dimension using the LOCATION key our data set.

Now that we have defined our dimension, we can filter on it. This is how we can link our two graphics together. Since the map data has a location dimension and our stop data has a location dimension, we can easily filter our stop data based on selected stops on the map. We can create a filter method on our data manager module to accomplish this task. This method will accept an area as its argument. This area will be a geographic box that we will construct using a brush later on in the application. For now we can create the method and pass in some test locations to prove functionality.

Location Filter

The source code and data files are available in the code/Chapter09/LocationFilter/ directory tree.

For our location filter we know that we want to pass in a geographic bounding box that can be used to filter the stop data. This box will be defined by a longitude and latitude for the top left corner plus a longitude and latitude for the bottom right corner. Any stops that have coordinates within this box will be returned by our filter function. We will use an array of arrays in JavaScript to represent this box. The first element of the array will be an array with coordinates for the top left corner and the second element of the array will be an array with the coordinates for the bottom right corner. We will call the filterFunction on our location dimension and return all records whose coordinates are within our bounding box. For our filter function, the accessor will receive the location array of our dimension. It will look like this:

001:// Create a filterLocation method to filter stop data by location area.
002:exports.filterLocation = function (_locationArea) {
003:    // Get the longitudes of our bounding box, and construct an array from them.
004:    var longitudes = [_locationArea[0][0], _locationArea[1][0]],
005:        // Get the latitudes of our bounding box, and construct an array from them.
006:        latitudes = [_locationArea[0][1], _locationArea[1][1]];
007:    location.filterFunction(function (d) {
008:        return d[0] >= longitudes[0]
009:            && d[0] <= longitudes[1]
010:            && d[1] >= latitudes[0]
011:            && d[1] <= latitudes[1];
012:    });
013:    // Return all records within our bounding box.
014:    return location.top(Infinity);
015:};

This method will first filter our location dimension by returning the stops within our bounding box, then it returns all of the records by calling top(Infinity). To test this method, let's invoke it passing in a bounding box of the entire world. This should return an array with a length equal to the result of our getCrossfilterSize method.

001:zurichDataManager.filterLocation([[-180, -90], [180, 90]]);
002:// Returns Array[219371]

Summary

Now that we can filter our stop data by location, we finally have a mechanism to link our two visualizations together. We need to be able to select a geographic bounding area on the map that can be passed into our filter function to return all of the stops within that area. For this we are going to use D3.js's brush module. This will allows us to easily select an area on the screen and translate that area into geographic coordinates, as covered in the next chapter.