Parse the module system in node

Author：Eve Cole Update Time：2023-04-04 10:00:43

Node.js quick introduction course: enter to learn

Two years ago I wrote an article introducing the module system: Understanding the concept of front-end modules: CommonJs and ES6Module. The knowledge in this article is aimed at beginners and is relatively simple. Here we also correct a few errors in the article:

[Module] and [Module System] are two different things. A module is a unit in software, and a module system is a set of syntax or tools. The module system allows developers to define and use modules in projects.
The abbreviation of ECMAScript Module is ESM, or ESModule, not ES6Module.

The basic knowledge about the module system is almost covered in the previous article, so this article will focus on the internal principles of the module system and a more complete introduction to the differences between different module systems. The content of the previous article is in This will not be repeated again.

Module system

Not all programming languages have a built-in module system, and JavaScript did not have a module system for a long time after its birth.

In the browser environment, you can only use the <script> tag to introduce unused code files. This method shares a global scope, which can be said to be full of problems. Coupled with the rapid development of the front end, this method no longer meets the current needs. . Before the official module system appeared, the front-end community created its own third-party module system. The most commonly used ones are: asynchronous module definition AMD , universal module definition UMD , etc. Of course, the most famous one is CommonJS .

Since Node.js is a JavaScript runtime environment, it can directly access the underlying file system. So developers adopted it and implemented a module system in accordance with CommonJS specifications.

At first, CommonJS could only be used on the Node.js platform. With the emergence of module packaging tools such as Browserify and Webpack, CommonJS can finally run on the browser side.

It was not until the release of the ECMAScript6 specification in 2015 that there was a formal standard for the module system. The module system built in accordance with this standard was called ECMAScript module (ESM) for short. From then on, ESM began to unify the Node.js environment and the browser environment. Of course, ECMAScript6 only provides syntax and semantics. As for the implementation, it is up to various browser service vendors and Node developers to work hard. That’s why we have the babel artifact that is the envy of other programming languages. Implementing a module system is not an easy task. Node.js only has relatively stable support for ESM in version 13.2.

But no matter what, ESM is the "son" of JavaScript, and there is nothing wrong with learning it!

The basic idea of the module system

In the era of slash-and-burn farming, JavaScript was used to develop applications, and script files could only be introduced through script tags. One of the more serious problems is the lack of a namespace mechanism, which means that each script shares the same scope. There is a better solution to this problem in the community: Revevaling module

 const myModule = (() => {
const _privateFn = () => {}
const_privateAttr = 1
return {
publicFn: () => {},
publicAttr: 2
}
})()

console.log(myModule)
console.log(myModule.publicFn, myModule._privateFn)

The running results are as follows:

This pattern is very simple, use IIFE to create a private scope, and use return variables to be exposed. Internal variables (such as _privateFn, _privateAttr) cannot be accessed from the outside scope.

[revealing module] takes advantage of these features to hide private information and export APIs that should be exposed to the outside world. The subsequent module system is also developed based on this idea.

CommonJS

Based on the above ideas, develop a module loader.

First write a function that loads module content, wrap this function in a private scope, and then evaluate it through eval() to run the function:

 function loadModule (filename, module, require) {
const wrappedSrc =
`(function (module, exports, require) {
${fs.readFileSync(filename, 'utf8)}
}(module, module.exports, require)`
eval(wrappedSrc)
}

Like [revealing module], the source code of the module is wrapped in a function. The difference is that a series of variables (module, module.exports, require) are also passed to the function.

It is worth noting that the module content is read through [readFileSync]. Generally speaking, you should not use the synchronized version when calling APIs involving the file system. But this time is different, because loading modules through the CommonJs system itself should be implemented as a synchronous operation to ensure that multiple modules can be introduced in the correct dependency order.

Then simulate the require() function, whose main function is to load the module.

 function require(moduleName) {
const id = require.resolve(moduleName)
if (require.cache[id]) {
return require.cache[id].exports
}
// Module metadata const module = {
exports: {},
ID
}
//Update cache require.cache[id] = module

//Load module loadModule(id, module, require)

// Return exported variables return module.exports
}
require.cache = {}
require.resolve = (moduleName) => {
// Parse out the complete module id based on moduleName
}

(1) After the function receives the moduleName, it first parses the complete path of the module and assigns it to the id.
(2) If cache[id] is true, it means that the module has been loaded, and the cache result will be returned directly. (3) Otherwise, an environment will be configured for the first loading. Specifically, create a module object, including exports (that is, exported content) and id (the function is as above)
(4) Cache the module loaded for the first time (5) Read the source code from the module's source file through loadModule (6) Finally, return module.exports returns the content you want to export.

require is synchronous

When simulating the require function, there is a very important detail: the require function must be synchronous . Its function is only to directly return the module content, and does not use the callback mechanism. The same is true for require in Node.js. Therefore, the assignment operation for module.exports must also be synchronous. If asynchronous is used, problems will occur:

 // Something went wrong setTimeout(() => {
module.exports = function () {}
}, 1000)

The fact that require is a synchronous function has a very important impact on the way to define modules, because it forces us to only use synchronous code when defining modules, so that Node.js provides synchronous versions of most asynchronous APIs for this purpose. .

Early Node.js had an asynchronous version of the require function, but it was quickly removed because it would make the function very complicated.

ESM

ESM is part of the ECMAScript2015 specification, which specifies an official module system for the JavaScript language to adapt to various execution environments.

Using ESM with Node.js

By default, Node.js treats files with a .js suffix as being written using CommonJS syntax. If you use ESM syntax directly in the .js file, the interpreter will report an error.

There are three ways to convert the Node.js interpreter to ESM syntax:
1. Change the file extension to .mjs;
2. Add a type field to the latest package.json file with the value "module";
3. The string is passed into --eval as a parameter, or transmitted to node through the STDIN pipe with the flag --input-type=module
for example:

 node --input-type=module --eval "import { sep } from 'node:path';
console.log(sep);"

Different types of module references

ESM can be parsed and cached as a URL (which also means special characters must be percent-encoded). Supports URL protocols such as file: node: and data:

file:URL
Module is loaded multiple times if the import specifier used to resolve the module has different queries or fragments

 // Considered to be two different modules import './foo.mjs?query=1';
import './foo.mjs?query=2';

data:URL
Supports importing using MIME types:

text/javascript for ES modules
application/json for JSON
application/wasm for Wasm

 import 'data:text/javascript,console.log("hello!");';
import _ from 'data:application/json,"world!"' assert { type: 'json' };

data:URL only parses bare and absolute specifiers for built-in modules. Parsing relative specifiers doesn't work because data: is not a special protocol and has no concept of relative parsing.

Import Assertion <br/>This attribute adds inline syntax to the module import statement to pass in more information next to the module specifier.

 import fooData from './foo.json' assert { type: 'json' };

const { default: barData } = await import('./bar.json', { assert: { type: 'json' } });

Currently only the JSON module is supported, and assert { type: 'json' } syntax is mandatory.

Importing Wash Modules <br/>Importing WebAssembly modules is supported under the --experimental-wasm-modules flag, allowing any .wasm file to be imported as a normal module, while also supporting the import of their modules.

 // index.mjs
import * as M from './module.wasm';
console.log(M)

Use the following command to execute:

 node --experimental-wasm-modules index.mjs

top-level await

The await keyword can be used at the top level in ESM.

 // a.mjs
export const five = await Promise.resolve(5)

// b.mjs
import { five } from './a.mjs'
console.log(five) // 5

asynchronous reference

As mentioned earlier, the import statement's resolution of module dependencies is static, so it has two famous limitations:

Module identifiers cannot wait until runtime to construct them;
Module import statements must be written at the top of the file and cannot be nested in control flow statements;

However, for some situations, these two restrictions are undoubtedly too strict. For example, there is a relatively common requirement: lazy loading :

When encountering a large module, you only want to load this huge module when you really need to use a certain function in the module.

For this purpose, ESM provides an asynchronous introduction mechanism. This introduction operation can be achieved through import() operator when the program is running. From a syntactic point of view, it is equivalent to a function that receives a module identifier as a parameter and returns a Promise. After the Promise is resolved, the parsed module object can be obtained.

ESM loading process

Use an example of circular dependency to illustrate the ESM loading process:

 // index.js
import * as foo from './foo.js';
import * as bar from './bar.js';
console.log(foo);
console.log(bar);

// foo.js
import * as Bar from './bar.js'
export let loaded = false;
export const bar = Bar;
loaded = true;

//bar.js
import * as Foo from './foo.js';
export let loaded = false;
export const foo = Foo;
loaded = true

Let’s take a look at the running results first:

It can be observed through loaded that both modules foo and bar can log the complete module information loaded. But CommonJS is different. There must be a module that cannot print out what it looks like after being fully loaded.

Let’s dive into the loading process to see why this result occurs.
The loading process can be divided into three stages:

The first stage: analysis
Second stage: declaration
The third stage: execution

Parsing stage:
The interpreter starts from the entry file (that is, index.js), analyzes the dependencies between modules, and displays them in the form of a graph. This graph is also called a dependency graph.

At this stage, we only focus on the import statements and load the source code corresponding to the modules that these statements want to introduce. And obtain the final dependency graph through in-depth analysis. Take the above example to illustrate:
1. Starting from index.js, find import * as foo from './foo.js' statement and go to the foo.js file.
2. Continue parsing from the foo.js file and find import * as Bar from './bar.js' statement, thus going to bar.js.
3. Continue parsing from bar.js and find import * as Foo from './foo.js' statement, which forms a circular dependency. However, since the interpreter is already processing the foo.js module, it will not enter it again, and then continue. Parse the bar module.
4. After parsing the bar module, it is found that there is no import statement, so it returns to foo.js and continues to parse. The import statement was not found again all the way, and index.js was returned.
5. import * as bar from './bar.js' is found in index.js, but since bar.js has already been parsed, it is skipped and continues execution.

Finally, the dependency graph is completely displayed through depth-first approach:

Declaration phase:
The interpreter starts from the obtained dependency graph and declares each module in order from bottom to top. Specifically, every time a module is reached, all properties to be exported by the module are searched and the identifiers of the exported values are declared in memory. Please note that only declarations are made at this stage and no assignment operations are performed.
1. The interpreter starts from the bar.js module and declares the identifiers of loaded and foo.
2. Trace back up to the foo.js module and declare the loaded and bar identifiers.
3. We arrived at the index.js module, but this module has no export statement, so no identifier is declared.

After declaring all export identifiers, walk through the dependency graph again to connect the relationship between import and export.

It can be seen that a binding relationship similar to const is established between the module introduced by import and the value exported by export. The importing side can only read but not write. Moreover, the bar module read in index.js and the bar module read in foo.js are essentially the same instance.

So this is why the complete parsing results are output in the results of this example.

This is fundamentally different from the approach used by the CommonJS system. If a module imports a CommonJS module, the system will copy the entire exports object of the latter and copy its contents to the current module. In this case, if the imported module modifies its own copy variable, then the user cannot see the new value.

Execution phase:
At this stage, the engine will execute the module's code. The dependency graph is still accessed in bottom-up order and the accessed files are executed one by one. Execution starts from the bar.js file, to foo.js, and finally to index.js. In this process, the value of the identifier in the export table is gradually improved.

This process does not seem to be much different from CommonJS, but there are actually major differences. Since CommonJS is dynamic, it parses the dependency graph while executing related files. So as long as you see a require statement, you can be sure that when the program comes to this statement, all the previous codes have been executed. Therefore, the require statement does not necessarily have to appear at the beginning of the file, but can appear anywhere, and module identifiers can also be constructed from variables.

But ESM is different. In ESM, the above three stages are separated from each other. It must first completely construct the dependency graph before it can execute the code. Therefore, the operations of introducing modules and exporting modules must be static. You can't wait until the code is executed.

The difference between ESM and CommonJS

In addition to the several differences mentioned earlier, there are some differences worth noting:

Mandatory file extension

When using the import keyword in ESM to resolve relative or absolute specifiers, the file extension must be provided and the directory index ('./path/index.js') must be fully specified. The CommonJS require function allows this extension to be omitted.

strict mode

ESM runs in strict mode by default, and this strict mode cannot be disabled. Therefore, you cannot use undeclared variables, nor can you use features that are only available in non-strict mode (such as with).

ESM does not support some references provided by CommonJS

CommonJS provides some global variables. These variables cannot be used under ESM. If you try to use these variables, a ReferenceError will occur. include

require
exports
module.exports
__filename
__dirname

Among them, __filename refers to the absolute path of the current module file, and __dirname is the absolute path of the folder where the file is located. These two variables are very helpful when constructing the relative path of the current file, so ESM provides some methods to implement the functions of the two variables.

In ESM, you can use the import.meta object to obtain a reference, which refers to the URL of the current file. Specifically, the file path of the current module is obtained through import.meta.url . The format of this path is similar to file:///path/to/current_module.js . Based on this path, the absolute path expressed by __filename and __dirname is constructed:

 import { fileURLToPath } from 'url'
import { dirname } from 'path'
const __filename = fileURLToPath(import.meta.url)
const __dirname = dirname(__filename)

It can also simulate the require() function in CommonJS

 import { createRequire } from 'module'
const require = createRequire(import.meta.url)

this points to

In the ESM global scope, this is undefined, but in the CommonJS module system, it is a reference to exports:

 //ESM
console.log(this) // undefined

// CommonJS
console.log(this === exports) // true

ESM loads CommonJS

As mentioned above, the CommonJS require() function can be simulated in ESM to load CommonJS modules. In addition, you can also use standard import syntax to introduce CommonJS modules, but this import method can only import things exported by default:

 import packageMain from 'commonjs-package' // It is completely possible to import { method } from 'commonjs-package' // Error

The CommonJS module's require always treats the files it references as CommonJS. Loading ES modules using require is not supported because ES modules have asynchronous execution. But you can use import() to load ES modules from CommonJS modules.

export dual module

Although ESM has been launched for 7 years, node.js has also stably supported it. When we develop component libraries, we can only support ESM. But in order to be compatible with old projects, support for CommonJS is also essential. There are two widely used methods for making a component library support exports from both module systems.

Using ES module wrappers

Write packages in CommonJS or convert ES module source code to CommonJS and create ES module wrapper files that define named exports. Use conditional export, import uses the ES module wrapper, and require uses the CommonJS entry point. For example, in the example module

 // package.json
{
"type": "module",
"exports": {
"import": "./wrapper.mjs",
"require": "./index.cjs"
}
}

Use display extensions .cjs and .mjs , because using only .js will either default to CommonJS, or "type": "module" will cause these files to be treated as ES modules.

 // ./index.cjs
export.name = 'name';

// ./wrapper.mjs
import cjsModule from './index.cjs'
export const name = cjsModule.name;

In this example:

 // Use ESM to introduce import { name } from 'example'

// Use CommonJS to introduce const { name } = require('example')

The name introduced in both ways is the same singleton.

isolation status

The package.json file can directly define separate CommonJS and ES module entry points:

 // package.json
{
"type": "module",
"exports": {
"import": "./index.mjs",
"require": "./index.cjs"
}
}

This can be done if the CommonJS and ESM versions of the package are equivalent, e.g. because one is a transpiled output of the other; and the package's state management is carefully isolated (or the package is stateless)

The reason status is a problem is because both the CommonJS and ESM versions of the package may be used in the application; for example, the user's referrer code can import the ESM version, while the dependency requires the CommonJS version. If this happens, two copies of the package will be loaded into memory, so two different states will occur. This can lead to errors that are difficult to resolve.

In addition to writing stateless packages (for example, if JavaScript's Math were a package, it would be stateless because all its methods are static), there are ways to isolate state so that it can be used in potentially loaded CommonJS and ESM Share it between package instances:

If possible, include all state in the instantiated object. For example, JavaScript's Date needs to be instantiated to contain state; if it is a package, it will be used like this:

 import Date from 'date';
const someDate = new Date();
// someDate contains state; Date does not

The new keyword is not required; package functions can return new objects or modify passed objects to maintain state outside the package.

Isolate state in one or more CommonJS files shared between the CommonJS and ESM versions of a package. For example, the entry points for CommonJS and ESM are index.cjs and index.mjs respectively:

 // index.cjs
const state = require('./state.cjs')
module.exports.state = state;

// index.mjs
import state from './state.cjs'
export {
state
}

Even if example is used in an application via require and import, every reference to example contains the same state; and changes to the state by either module system will apply to both.

at last

If this article is helpful to you, please give it a like and support it. Your "like" is the motivation for me to continue to create.

This article cites the following information:

node.js official documentation
Node.js Design Patterns