Skip to content

Client Side CommonJS

Phillip Gates-Idem edited this page Jan 24, 2014 · 20 revisions

Overview

This proposal outlines a new mechanism for efficiently transporting CommonJS modules to the client and subsequently loading the transported modules on the client. The goal of this proposal is to exactly mirror the server-side Node.js module system that is based on the CommonJS/Modules Specification. Unlike the AMD specification and similar specs, developers do not write their code according to this spec. Instead, developers should write modular code that conforms to the Node.js module system based on require, module, exports and node_modules and a separate tool will be used to wrap CommonJS module code such that it can be efficiently transported to the client using the proposed transport protocol. The proposed transport protocol allows the source code for CommonJS modules to be bundled with other non-CommonJS JavaScript code.

For this proposal, the RaptorJS Optimizer will support plugins that automatically wrap CommonJS code and RaptorJS will also provide a client-side implementation of the module system that understands the transport protocol described in this document.

Because the RaptorJS Optimizer will be extended to facilitate the delivery of CommonJS module code to the browser, the application developer can easily control resource bundling, resource compilation and resource minification for all types of resources such as CSS, LESS, Raptor Templates and CoffeeScript (not just CommonJS module code written in JavaScript).

In addition, because the goal of this proposal is to exactly mirror the Node.js module loader, asynchronous module loading will be delegated to a separate raptor-loader module which will offer a generic package loader (See the Asynchronous Module Loading section below).

In addition, this proposal aims to avoid sending duplicate code down to the browser in cases where the same module (based on name and exact version) is shared as a dependency for multiple parent modules.

Duplicate Module Dependencies

Given a project with the following dependencies:

This would result in the following directory structure on the server after running npm install:

In this example, the module [email protected] is a duplicate dependency and the source code for the duplicate module exists multiple times on disk on the server. At runtime, foo and bar are expected to get separate instances of the baz module since they have separate file system paths. Code duplication is not ideal on the server and it is even worse in the browser.

The technique described in this proposal makes use of symbolic links to avoid duplicating modules while also still maintaining the logical directory structure and this technique can be applied equally on the server and on the client. Imagine that instead of duplicating the files for modules of the same version, npm install instead added symbolic links to modules in a shared location where each target module directory consisted of the module name and the version. With symbolic links in place, the module source would only exist once on disk for a given module name and version, but the logical directory structure needed for dependency resolution using the symbol links would still be maintained. To further clarify, instead of the directory structure shown earlier, the following directory structure that takes advantages of symbolic links could be used instead:

  • app_root/
    • .node_modules/
      • foo_1.0.0/
        • lib/
          • index.js
          • hello.js
          • world.js
        • node_modules/
          • baz/../../baz_3.0.0
      • bar_2.0.0/
        • lib/
          • index.js
        • node_modules/
          • baz/../../baz_3.0.0
      • baz_3.0.0/
        • lib/
          • index.js
    • node_modules/
      • foo/../.node_modules/foo_1.0.0
      • bar/../.node_modules/bar_2.0.0

Conceptually, this is how the client-side module loader will maintain the required logical project structure required for dependency resolution and module loading while avoiding sending down duplicate code to the client.

Logical and Real Paths

A logical module path represents the path of a module without resolving symbolic links. The real path of a module represents the real canonical path with all symbolic links resolved. For example.

  • Logical path: /node_modules/foo/node_modules/baz/lib/index
  • Real path: /[email protected]/lib/index

A dependent module instance should be cached based on the logical path (e.g. /node_modules/foo/node_modules/baz/lib/index), but the definition should be looked up by the real path (e.g. /[email protected]/lib/index).

Module Transport

Module Definition Wrapper

All module code will be wrapped inside a call that will register the definition based on the real path of a module that includes the root module name, the root module version and the relative file path (e.g. /[email protected]/lib/index). For a given real path, only one definition should exist to avoid code duplication.

The module at the file system path of app_root/node_modules/foo/node_modules/baz/lib/index.js would be transported to the client as shown in the following code:

$rmod.def('/[email protected]/lib/index', function(require, exports, module, __filename, __dirname) { /* 
  module source code goes here 
*/ });

NOTE: The filename extension will always be dropped when registering a module.

Register Module Dependencies

In addition, the following code would be sent down to the browser in order to allow client code to add "symbolic links" to allow the logical paths to be maintained:

// some module that is not nested under a "node_modules" directory requires [email protected]
$rmod.dep('', 'foo', '1.0.0');          // /node_modules/foo → /[email protected]

// some module that is not nested under a "node_modules" directory requires [email protected]
$rmod.dep('', 'bar', '2.0.0');          // /node_modules/bar → /[email protected]

// the "foo" module (installed at "/node_modules/foo") directory requires [email protected]
$rmod.dep('/node_modules/foo', 'baz', '3.0.0'); // /node_modules/foo/node_modules/baz → /[email protected]

// the "bar" module (installed at "/node_modules/bar") directory requires [email protected]
$rmod.dep('/node_modules/bar', 'baz', '3.0.0'); // /node_modules/bar/node_modules/baz → /[email protected]

Conceptually, the $rmod.dep function can be thought of as adding a symbolic file system link for a module dependency. For example, $rmod.dep('', 'foo', '1.0.0') would result in the following symbolic link: /node_modules/foo/[email protected]

In addition, $rmod.dep('/node_modules/bar', 'baz', '3.0.0') would result in the following symbolic link: /node_modules/bar/node_modules/baz/[email protected]

Self-executing Modules

There will be no global require function introduced and because of this, a root module that is meant to bootstrap the application on the client should be wrapped in code that makes it self-executing. The $rmod.run(path, factory) will be used for this purpose. For example, if a module at path app_root/src/ui-pages/login/login.js is needed to bootstrap the login page then the following code would need to be generated:

$rmod.run('/src/ui-pages/login/login-page', function(require, exports, module, __filename, __dirname) { /* 
  module source code goes here 
*/ });

NOTE: It is possible that multiple modules might be sent down to the browser as self-executing and this behavior should not be disallowed.

Main Scripts

Node.js allows a main script to be associated with a directory. If the directory is required'd then the main script will be used to determine which module is actually resolved. The main script can be explicitly declared in a package.json file contained in the directory or, if not explicitly declared, a default main script of "index" will be assumed. The main script is assumed to be relative to the module directory.

To allow for mapping a directory to a main script, the following code will be used:

$rmod.main('/[email protected]', 'lib/index');

By registering a main script for the /[email protected] module, the following will work as expected:

require.resolve('foo'); // Returns "/node_modules/foo/lib/index

Remapping Paths

In certain situations, the rules for resolving a module may be modified. For example, a proposal was mad to add support for a special browser field inside a module's package.son file that can be used to override how a module is resolved. Please see: https://gist.github.com/defunctzombie/4339901

In cases where the server determines that the normal rules should not be used to resolve a module, the server should be able to provide the client with a remapping rule.

In some cases, it's necessary to remap a module installed at logical path to a new module. Consider the case where code was written to require the NodeJS streams module but the optimizer must provide a version of the streams module that is compatible for the web browser. In this case, it might be necessary to recognize streams as a dependency but remap it to streams-browserify (an implementation of the streams interface for the web browser). This can be achieved with the following code:

$rmod.dep('', 'streams-browserify', '1.0.0', 'streams');

The fourth argument ('streams' in the example above) to the $rmod.dep() function is an alternate name that will be recognized in place of 'streams-browserify'. It might be helpful to think of this argument as the "also-known-as" module name.

Another type of remapping rule that is supported is the ability to remap a real path using on a new relative path.

For example, consider a case where a streams module needs to be written to run on the server and in the web browser. Most likely, there is a lot of common code between the two runtime environments, but it may be helpful to provide some implementation code that is specific to the web browser. A remapping rule could be defined for one or more paths within in the module. For example, suppose that /[email protected]/lib/index should instead be substituted with /[email protected]/lib/browser/index for the browser runtime. This remapping rule could be written as follows:

$rmod.remap('/[email protected]/lib/index', './browser/index');

NOTE: The second argument does not actually need to include the './' prefix (this is used to make it clear that the second argument is a relative path). In the example above, 'browser/index' is equivalent.

Module Dependency Resolution

Dependent module names will be resolved based on the logical path of a module. A path will be either a relative module name, an already resolved absolute path or a top-level module name. Each type of path is described in the sections below.

Relative Module Paths

A relative module path is a module path that begins with "." or "..". Relative module paths are resolved relative to the logical path of the parent module. Examples:

  • require.resolve('./hello') (from /node_modules/foo/lib/index) → '/node_modules/foo/lib/hello' (real path: /[email protected]/lib/hello)
  • require.resolve('../lib/world') (from /node_modules/foo/lib/index) → '/node_modules/foo/lib/world' (real path: /[email protected]/lib/hello)
  • require.resolve('../../') (from /node_modules/foo/lib/index) → '/node_modules/foo/lib/index' (real path: /[email protected]/lib/index)

Absolute Module Paths

An absolute module path is a module path prefixed with a forward-slash ("/"). An absolute module is assumed to already be resolved and if the an absolute path is passed to require.resolve(...) then the input should be returned unmodified.

Top-level Module Paths

Top-level module paths are module paths that do not begin with ".", ".." or "/". The mechanism for resolving a top-level module name on the client will be very similar to the mechanism that Node.js uses to resolve a top-level module path on the server. The client-side code will search up the module's logical path looking for "node_modules" directories with the target module name. For example, calling require('baz') from within /node_modules/foo/lib/index will result in the following paths being searched:

  1. /node_modules/foo/lib/node_modules/baz (search for baz as a dependency for /node_modules/foo/lib)
  2. /node_modules/foo/node_modules/baz (search for baz as a dependency for /node_modules/foo)
  3. /node_modules/baz (search for baz as a root dependency)

NOTE: The directory name of node_modules is not actually used on the client and it is used here only for clarity. A shorter special character (e.g. $) could be used to achieve the same end result.

Examples:

  • require.resolve('baz') (from /node_modules/foo/lib/index) → '/node_modules/foo/node_modules/baz/lib/index' (real path: /[email protected]/lib/index)
  • require.resolve('foo') (from ROOT) → '/node_modules/foo/lib/index' (real path: /[email protected]/lib/index)
  • require.resolve('MISSING') (from /node_modules/foo/lib/index) → Error('Module not found: MISSING')

The paths will be search in order and the first existing module path will be used. If module path resolves to a module directory then the defined main script will be used to resolve the full path or "index" will be assumed if a main script is not explicitly defined for the module directory.

Definition and Instance Caching

As mentioned earlier, a dependent module instance should be cached based on the logical path (e.g. /node_modules/foo/node_modules/baz/lib/index), but the definition should be looked up by the actual path (e.g. /[email protected]/lib/index). For example:

definitionCache['<real_path>'] = <def>;
instanceCache['<logical_path>'] = <instance>;

Continuing with the original example, the following module definitions will be registered after the code for modules foo, bar and baz is sent to the browser and executed:

definitionCache['/[email protected]/lib/index'] = <def>;
definitionCache['/[email protected]/lib/hello'] = <def>;
definitionCache['/[email protected]/lib/world'] = <def>;
definitionCache['/[email protected]/lib/index'] = <def>;
definitionCache['/[email protected]/lib/index'] = <def>;

After modules foo, bar and baz are require'd the module cache would be similar to the following:

instanceCache['/node_modules/foo/lib/index'] = <instance>;
instanceCache['/node_modules/foo/lib/hello'] = <instance>;
instanceCache['/node_modules/foo/lib/world'] = <instance>;
instanceCache['/node_modules/bar/lib/index'] = <instance>;
instanceCache['/node_modules/foo/node_modules/baz/lib/index'] = <instance>;
instanceCache['/node_modules/bar/node_modules/baz/lib/index'] = <instance>;

NOTE: /node_modules/foo/node_modules/baz/lib/index and /node_modules/bar/node_modules/baz/lib/index refer to the same real path, but the cache entries will refer to separate loaded instances of the same module.

A factory function will be associated with each module definition. The factory function for a module should be invoked multiple times if a module has multiple logical paths (once for each logical path).

Wrapping Compiled Code

It's possible that compiled code (such as templates) may need to be wrapped so that the resulting code can also benefit from having a path-aware dependency resolver and so that the resulting code can be require'd. For example, take the following simple template:

<template>
    Hello $data.name!
</template>

The compiled template could be compiled down into the following code:

$rmod.def("/src/ui-components/buttons/SimpleButton/template.rhtml", function(require, exports, module, __filename, __dirname) {
    exports.create = function(helpers) {
        var empty = helpers.e,
            notEmpty = helpers.ne,
            escapeXml = helpers.x;

        return function(data, context) {
            context.w('Hello ')
                .w(escapeXml(data.name))
                .w('!');
        }
    };
});

With this approach, a compiled template module could be required and the compiled code would have access to a require function that could be used to load dependencies (relative to the template location on disk). Internally, a template could be loaded as shown in the following code:

var templatePath = require.resolve('./template.rhtml'); // e.g. "/src/ui-components/buttons/SimpleButton/template.rhtml"
var html = require('raptor-templates').renderToString(templatePath, templateData);

NOTE: Assuming the "path" module is available, the following would be equivalent:

var templatePath = require('path').join(__dirname, './template.rhtml');

Asynchronous Module Loading

Asynchronous module loading will be delegated to a new raptor-loader module. In contrast to the AMD spec, with this proposal the require function will not be extended to support an asynchronous loading callback. Extending the existing require function to support asynchronous loading is dangerous since it would break compatibility with the API that Node.js already exposes. As a result, asynchronous module loading will be delegated to a completely separate raptor-loader module that makes no assumptions on what code is being transported over the wire (it could be JavaScript or CSS, it could be CommonJS modules, AMD modules or even EcmaScript 6 modules, etc.). The raptor-loader will provide a generic asynchronous package loader. The usage for the new raptor-loader module is illustrated in the code below:

require('raptor-loader').getLoader(module).load(['foo/some/package', './another/package'], function(err) {
  if (err) {
    // One ore more packages failed to load
    return;
  }
  
  // Assuming the code for "foo" and "bar" were downloaded in the asynchronously loaded packages
  var foo = require('foo');
  var bar = require('bar');
  // Do something with the loaded modules...
});

Internally, the loader metadata can be transported to the browser wrapped as a module. For example:

$rmod.def('/[email protected]/some/package#async', {
    'js': ['http://mycdn/some/resource.js', ...],
    'css': ['http://mycdn/some/resource.css', ...]
})

NOTE: The #async suffix is only needed to keep the path for the psuedo async module unique and to discern it from a real file module.

By wrapping the asynchronous metadata as a module, we can utilize the provided resolver to resolve the metadata relative to a given base module.

Implementation

For the implementation a new module named raptor-modules will be introduced. This module will contain one sub-module to handle transporting server-side module code to the client and another sub-module for providing the client-side module API.

Server-side Implementation

The raptor-modules/transport will be responsible for generating the required JavaScript code that the client-side module loader needs to function correctly. This includes wrapping modules and adding additional metadata required to resolve and load modules.

API for "raptor-modules/transport"

defineCode(path, streamOrString, options) : Stream

Returns the a Readable stream that can be used to read the wrapped code designated by the streamOrString argument. If options.object is set to true then the code will not be wrapped in a factory function and will be written out verbatim.

Examples:

require('raptor-modules/transport').defineCode('/[email protected]/lib/index', 'hello();')

$rmod.def("/[email protected]/lib/index", function(require, exports, module, __filename, __dirname) { hello(); });

With options.object === true:

require('raptor-modules/transport').defineCode('/some/path', '{ hello: "world" }', {object: true})

$rmod.def("/some/path", { hello: "world" });
runCode(logicalPath, streamOrString) : Stream

Generates the code to produce a self-executing module.

Examples:

require('raptor-modules/transport').runCode('/app/main', 'hello();')

$rmod.run("/app/main", function(require, exports, module, __filename, __dirname) { hello(); });
registerDependencyCode(logicalParentPath, targetId, targetVersion) : Stream

Generates code that when executed will register a child dependency for a given parent path.

Examples:

require('raptor-modules/transport').registerDependencyCode('', 'foo', '1.0.0')

$rmod.dep('', 'foo', '1.0.0');

require('raptor-modules/transport').registerDependencyCode('/node_modules/foo', 'baz', '3.0.0')

$rmod.dep('/node_modules/foo', 'baz', '3.0.0');
registerMainCode(realPath, main) : Stream

Generates code that when executed will register the main script for a given directory path.

Example:

require('raptor-modules/transport').registerMainCode('/[email protected]', 'lib/index')

$rmod.main('/[email protected]', 'lib/index');
registerResolvedPath(target, from, resolved) : Stream

Generates code that when executed will register a pre-resolved path for a given "target" and "from" path.

Example:

require('raptor-modules/transport').registerResolvedPath('stream', '/node_modules/foo/lib/index')

$rmod.resolved("stream", "/node_modules/foo/lib/index", "/node_modules/foo/node_modules/stream-browserify/index");
getPathInfo(path)

The getPathInfo method is used to get additional information about a module path so that the module can be transported to the client. The return value of the function is a PathInfo object that is described below.

PathInfo
  • logicalPath (String): The logical path for the module with no symbolic links resolved
  • realPath (String): The real path for the module (all symbolic links resolved)
  • filePath (String): The actual file system path
  • isDir (boolean): true if filePath refers to a directory. false otherwise
  • dep (object): If the target module was found inside a dependency (i.e. under node_modules) then this object will contain information about the dependency. For example: { parentPath: '/[email protected]', childId: 'baz', childVersion: '3.0.0' }
  • main (String): If the path is a directory, then the main property will contain the relative path to the main script file for the module

Examples:

getPathInfo('/development/my-project/node_modules/foo/node_modules/baz');

{
    logicalPath: '/node_modules/foo/node_modules/baz',
    realPath: '/[email protected]',
    filePath: '/development/my-project/node_modules/foo/node_modules/baz',
    isDir: true,
    dep: {
        parentPath: '/node_modules/foo',
        childId: 'baz',
        childVersion: '3.0.0'
    },
    main: '/development/my-project/node_modules/foo/node_modules/baz/lib/index.js'
}

getPathInfo('/development/my-project/node_modules/foo/node_modules/baz/lib/index.js');

{
    logicalPath: '/node_modules/foo/node_modules/baz/lib/index.js',
    realPath: '/[email protected]/lib/index.js',
    filePath: '/development/my-project/node_modules/foo/node_modules/baz/lib/index.js'
    isDir: false,
    dep: {
        parentPath: '/node_modules/foo',
        childId: 'baz',
        childVersion: '3.0.0'
    }
}

getPathInfo('/development/my-project/src/ui-components/buttons/Button/renderer.js');

{
    logicalPath: '/src/ui-components/buttons/Button/renderer.js',
    realPath: '/src/ui-components/buttons/Button/renderer.js',
    filePath: '/development/my-project/src/ui-components/buttons/Button/renderer.js'
    isDir: false
}
resolveRequire(path, from)

The resolveRequire method is used to resolve a module being required from a given base path. The return value of the function is PathInfo object described above.

Examples:

resolveRequire('baz', '/development/my-project/node_modules/foo/lib/index.js');

{
    logicalPath: '/node_modules/foo/node_modules/baz',
    realPath: '/[email protected]',
    filePath: '/development/my-project/node_modules/foo/node_modules/baz',
    isDir: true,
    dep: {
        parentPath: '/node_modules/foo',
        childId: 'baz',
        childVersion: '3.0.0'
    },
    main: '/development/my-project/node_modules/foo/node_modules/baz/lib/index.js'
}

resolveRequire('/node_modules/foo/node_modules/baz/lib/index', '/development/my-project/node_modules/foo/node_modules/baz');

{
    logicalPath: '/node_modules/foo/node_modules/baz/lib/index.js',
    realPath: '/[email protected]/lib/index.js',
    filePath: '/development/my-project/node_modules/foo/node_modules/baz/lib/index.js'
    isDir: false
}

Client-side Implementation

The client-side module provider will be provided by a new raptor-modules/client module. Including this module on a page will create a new $rmod global object with the following methods:

  • $rmod.def(realPath, factoryOrObject)
  • $rmod.dep(logicalParentPath, depId, depVersion, [depAlsoKnownAsId])
  • $rmod.main(realPath, relPath)
  • $rmod.run(logicalPath, factory)
  • $rmod.remap(realPath, relativePath)

require function

Pseudo code:

function require(id, from) {
  var path = resolve(id, from)
  var instance = instanceCache[path];
  if (!instance) {
    var def = definitionCache[realPath(id)];
    instance = create(def);
    instanceCache[path] = instance;
  }
  return instance;
}

require.resolve(path):

Resolves a dependent module path from the base path associated with the resolve function and returns a resolved logical path. This function should handle relative, absolute and top-level module paths. If the target path does not exist then an Error should be thrown. NOTE: The following should always be true:

require('./test') === require(require.resolve('./test'))

Internal Helper Functions

realPath(logicalPath):

Checks if the given logical path is a "symbolic link" and, if so, the real path is returned, otherwise the path is returned as-is.

Examples:

realPath('/node_modules/foo') // → '/[email protected]'
realPath('/node_modules/foo/node_modules/baz') // → '/[email protected]'
realPath('/node_modules/foo/lib/index') // → '/[email protected]/lib/index'
realPath('/src/app.js') // → '/src/app.js'
resolve(path, from)

Resolves a path from a given base path and handles both absolute, relative and dependencies. Used internally by require.resolve(path)

RaptorJS Optimizer Integration

Additional dependency types can be registered with the raptor-optimizer module to enable the transport of CommonJS modules. These dependencies can be registered using the following code:

require('raptor-modules/optimizer').registerDependencyTypes(raptorOptimizer);

Usage

Transporting a CommonJS module will be as simple including a require dependency (along with the raptor-modules/client module). For example:

Inside a package.json:

{
  "raptor": {
    "dependencies": [
      { "module": "raptor-modules/client" },
      { "require": "./foo" },
      { "require": "bar" },
      { "require": "../app/main", "run": true }
    ]
  }
}

Or using the RaptorJS Optimizer taglib:

<optimizer:page name="index">
    <dependencies>
        <module name="raptor-modules/client" />
        <require name="./foo" />
        <require name="bar" />
        <require name="../app/main" run="true" />
    </dependencies>
</optimizer:page>

By default, all dependencies will be included and the dependencies will be determined by scanning the JavaScript source code for the module to determine which other modules are required. If the target module resolves to a module directory then the optimizer will include additional information for resolving the "main" script for the module directory (i.e. $rmod.main(...)). In addition, if the dependent module resolves to a module inside a "node_modules" directory then the optimizer will include the additional information to register that dependency (i.e. $rmod.dep(...)).

Optimizer Dependency Types

To support the above proposal, the following dependency types will be registered:

  • require: Resolves the dependencies for a required CommonJS module relative to some base path. This may include additional require dependencies and any of the dependencies listed below.
  • commonjs-def: Generates wrapped CommonJS code for a JavaScript file
  • commonjs-main: Generates code that registers a main script for a CommonJS module directory
  • commonjs-dep: Generates code that adds a symbolic link to a module dependency installed under node_modules
  • commonjs-run: Generates code that runs a module
  • commonjs-resolved: Generates code registers a pre-resolved path

NOTE: Only the require dependency should be used directly by developers. The other dependency types are used internally.

The require dependency type will resolve to a package that includes the required dependencies to ensure that a module can be required from some base path.

For example, given the following require dependency:

{ "require": "baz" }

The resulting dependencies would be returned:

[
  { "type": "commonjs-main", "path": "/[email protected]", "main": "lib/index" },
  { "type": "commonjs-dep", "parent": "/[email protected]", "dependency": "[email protected]" },
  { "type": "commonjs-def", "path": "/[email protected]/lib/index", "file": "app_root/node_modules/foo/node_modules/baz/lib/index.js" }
]

For modules that are meant to run immediately, the "commonjs-run" dependency should be used. For example, given the following require dependency with "run" set to true:

{ "type": "require", "id": "baz", "from": "/app_root/node_modules/foo/lib/index.js", "run": "true" }

CoffeeScript, LiveScript and Other Languages

It's possible that CommonJS modules may be written in CoffeeScript, LiveScript or another language that compiles to JavaScript. If a non "js" filename extension is encountered, then the "require" resolver will use the appropriate dependency type registered with the raptor-optimizer module to compile the code to JavaScript.

Pseudo code:

if (extension !== 'js') {
  var dependency = raptorOptimizer.getDependencyTypeForPath(path);
  code = dependency.getCode();
}

Reference

Clone this wiki locally