Learn about the file module and core module in Node in one article

Author：Eve Cole Update Time：2022-07-23 10:43:09

This article will take you through the file module and core module in Node, talk about the search for file modules, the compilation and execution of file modules, and the compilation and execution of JavaScript and C/C++ core modules. I hope it will be helpful to you!

When we use Nodejs for daily development, we often use require to import two types of modules. One is the module we wrote ourselves or the third-party module installed using npm. This type of module is called文件模块in Node; the other is It is the built-in module of Node that is provided for us to use, such as os , fs and other modules. These modules are called核心模块.

It should be noted that the difference between the file module and the core module lies not only in whether it is built-in by Node, but also in the file location, compilation and execution process of the module. There are obvious differences between the two. Not only that, file modules can also be subdivided into ordinary file modules, custom modules or C/C++ extension modules, etc. Different modules also have many details that differ in file positioning, compilation and other processes.

This article will address these issues and clarify the concepts of file modules and core modules as well as their specific processes and details that need to be paid attention to in file location, compilation or execution. I hope it will be helpful to you.

Let’s start with the file module.

File module

What is a file module?

In Node, modules that are required using module identifiers starting with .、.. 或/ (that is, using relative paths or absolute paths) will be treated as file modules. In addition, there is a special type of module. Although it does not contain a relative path or an absolute path, and it is not a core module, it points to a package. When Node locates this type of module, it will use模块路径to search for the module one by one. This type of module is Called a custom module.

Therefore, file modules include two types, one is ordinary file modules with paths, and the other is custom modules without paths.

The file module is dynamically loaded at runtime, which requires a complete file location, compilation and execution process, and is slower than the core module.

For file positioning, Node handles these two types of file modules differently. Let’s take a closer look at the search processes for these two types of file modules.

Searching for ordinary file modules

For ordinary file modules, since the path they carry is very clear, the search will not take long, so the search efficiency is higher than the custom module introduced below. However, there are still two points to note.

First, under normal circumstances, when using require to introduce a file module, the file extension is generally not specified, for example:

const math = require("math");

Since the extension is not specified, Node cannot determine the final file. In this case, Node will complete the extensions in the order of .js、.json、.node , and try them one by one. This process is called文件扩展名分析.

It should also be noted that in actual development, in addition to requiring a specific file, we usually also specify a directory, such as:

const axios = require("../network");

In this case, Node will First perform file extension analysis. If the corresponding file is not found, but a directory is obtained, Node will treat the directory as a package.

Specifically, Node will return the file pointed to by the main field of package.json in the directory as the search result. If the file pointed to by main is wrong, or the package.json file does not exist at all, Node will use index as the default file name, and then use .js and .node to perform extension analysis and search for the target file one by one. If it is not found, it will Throw an error.

(Of course, since Node has two types of module systems, CJS and ESM, in addition to searching for the main field, Node will also use other methods. Since it is outside the scope of this article, I will not go into details.)

The search for custom modules

was just mentioned, When Node searches for custom modules, it will use the module path. So what is the module path?

Friends who are familiar with module parsing should know that the module path is an array composed of paths. The specific value can be seen in the following example:

// example.js
console.log(module.paths);

print results:

As you can see, the module in Node has a module path array, which is stored in module.paths and is used to specify how Node finds the custom module referenced by the current module.

Specifically, Node will traverse the module path array, try each path one by one, and find out whether there is a specified custom module in the node_modules directory corresponding to the path. If not, it will recurse upward step by step until it reaches the node_modules directory in the root directory. Until the target module is found, an error will be thrown if it is not found.

It can be seen that recursively searching node_modules directory step by step is Node's strategy for finding custom modules, and the module path is the specific implementation of this strategy.

At the same time, we also came to the conclusion that when searching for custom modules, the deeper the level, the more time-consuming the corresponding search will be. Therefore, compared to core modules and ordinary file modules, the loading speed of custom modules is the slowest.

Of course, what is found based on the module path is only a directory, not a specific file. After finding the directory, Node will also search according to the package processing process described above. The specific process will not be described again.

The above is the file positioning process and details that need to be paid attention to for ordinary file modules and custom modules. Next, let’s look at how the two types of modules are compiled and executed.

When

the file module is compiled and executed

and the file pointed to by require is located, the module identifier usually does not have an extension. According to the file extension analysis mentioned above, we can know that Node supports the compilation and execution of files with three extensions. :

JavaScript file. The file is read synchronously through the fs module and then compiled and executed. Except for .node and .json files, other files will be loaded as .js files.
.node file, which is an extension file compiled and generated after writing in C/C++. Node loads the file through the process.dlopen() method.
json file, after reading the file synchronously through the fs module, use JSON.parse() to parse and return the result.

Before compiling and executing the file module, Node will wrap it using a module wrapper as shown below:

(function(exports, require, module, __filename, __dirname) {
    //Module code});

It can be seen that through the module wrapper, Node packages the module into the function scope and isolates it from other scopes to avoid problems such as naming conflicts of variables and contamination of the global scope. At the same time, by passing in The exports and require parameters enable the module to have the necessary import and export capabilities. This is Node's implementation of modules.

After understanding the module wrapper, let's first look at the compilation and execution process of the json file.

Compilation and execution of json files

Compilation and execution of json files is the simplest. After synchronously reading the contents of the JSON file through the fs module, Node will use JSON.parse() to parse out the JavaScript object, then assign it to the exports object of the module, and finally return it to the module that references it. The process is very simple and crude. .

Compilation and execution of JavaScript files

. After using the module wrapper to wrap the JavaScript files, the wrapped code will be executed through runInThisContext() (similar to eval) method of the vm module, returning a function object.

Then, the exports, require, module and other parameters of the JavaScript module are passed to this function for execution. After execution, the exports attribute of the module is returned to the caller. This is the compilation and execution process of the JavaScript file.

Compilation and Execution of C/C++ Extension Modules

Before explaining the compilation and execution of C/C++ extension modules, let us first introduce what a C/C++ extension module is.

C/C++ extension modules belong to a category of file modules. As the name suggests, these modules are written in C/C++. The difference from JavaScript modules is that they do not need to be compiled after being loaded. They can be called externally after being executed directly, so they are loaded Slightly faster than JavaScript modules. Compared with file modules written in JS, C/C++ extension modules have obvious performance advantages. For functions that cannot be covered by the Node core module or have specific performance requirements, users can write C/C++ extension modules to achieve their goals.

So what is a .node file, and what does it have to do with C/C++ extension modules?

In fact, after the written C/C++ extension module is compiled, a .node file is generated. In other words, as users of the module, we do not directly introduce the source code of the C/C++ extension module, but the compiled binary file of the C/C++ extension module. Therefore, the .node file does not need to be compiled. After Node finds the .node file, it only needs to load and execute the file. During execution, the module's exports object is populated and returned to the caller.

It is worth noting that the .node files generated by compiling C/C++ extension modules have different forms under different platforms: under *nix systems, C/C++ extension modules are compiled into dynamic link shared object files by compilers such as g++/gcc. The extension is .so ; under Windows it is compiled into a dynamic link library file by the Visual C++ compiler, and the extension is .dll . But the extension we use in actual use is .node . In fact, the extension of .node is just to look more natural. In fact, it is a .dll file under Windows and a .dll file under *nix . .so files.

After Node finds the .node file to require, it calls process.dlopen() method to load and execute the file. Since .node files have different file forms under different platforms, in order to achieve cross-platform implementation, dlopen() method has different implementations under Windows and *nix platforms, and is then encapsulated through libuv compatibility layer. The following figure shows the compilation and loading process of C/C++ extension modules under different platforms:

Core module

The core module is compiled into a binary executable file during the compilation process of Node source code. When the Node process starts, some core modules are loaded directly into the memory. Therefore, when these core modules are introduced, the two steps of file location and compilation and execution can be omitted, and will be judged before the file module in the path analysis. So its loading speed is the fastest.

The core module is actually divided into two parts written in C/C++ and JavaScript. The C/C++ files are stored in the src directory of the Node project, and the JavaScript files are stored in the lib directory. Obviously, the compilation and execution processes of these two parts of modules are different.

Compilation and execution of JavaScript core modules

For the compilation of JavaScript core modules, during the compilation process of Node source code, Node will use the js2c.py tool that comes with V8 to convert all built-in JavaScript codes, including JavaScript core modules, into C++. Arrays, JavaScript code is stored in the node namespace as strings. When starting the Node process, the JavaScript code is loaded directly into memory.

When a JavaScript core module is introduced, Node will call process.binding() to locate its location in memory through module identifier analysis and retrieve it. After being taken out, the JavaScript core module will also be wrapped by the module wrapper, then executed, the exports object will be exported, and returned to the caller.

The C/C++ core module is compiled and executed

in the core module. Some modules are all written in C/C++, some modules have the core part completed by C/C++, and other parts are packaged or exported by JavaScript to meet performance requirements. Modules like buffer , fs , os , etc. are partly written in C/C++. This model in which the C++ module implements the core inside the main part and the JavaScript module implements the encapsulation outside the main part is a common way for Node to improve performance.

The parts of the core module written in pure C/C++ are called built-in modules, such as node_fs , node_os , etc. They are usually not called directly by users, but are directly dependent on the JavaScript core module. Therefore, in the introduction process of Node's core module, there is such a reference chain:

So how does the JavaScript core module load the built-in module?

Remember process.binding() method? Node removes the JavaScript core module from memory by calling this method. This method also applies to JavaScript core modules to assist in loading built-in modules.

Specific to the implementation of this method, when loading a built-in module, first create an empty exports object, then call get_builtin_module() method to take out the built-in module object, fill the exports object by executing register_func() , and finally return it to the caller to complete the export. This is the loading and execution process of the built-in module.

Through the above analysis, for the introduction of a reference chain such as the core module, taking the os module as an example, the general process is as follows:

In summary, the process of introducing the os module involves the introduction of the JavaScript file module, the loading and execution of the JavaScript core module, and the loading and execution of the built-in module. The process is very cumbersome and complicated, but for the caller of the module, due to the shielding of the underlying For complex implementations and details, the entire module can be imported simply through require(), which is very simple. friendly.

Summary

This article introduces the basic concepts of file modules and core modules as well as their specific processes and details that need to be paid attention to in file location, compilation or execution. Specifically:

file modules can be divided into ordinary file modules and custom modules according to the different file positioning processes. Ordinary file modules can be directly located because of their clear paths, sometimes involving the process of file extension analysis and directory analysis; custom modules will search based on the module path, and after successful search, the final file location will be performed through directory analysis.
File modules can be divided into JavaScript modules and C/C++ extension modules according to different compilation and execution processes. After the JavaScript module is packaged by the module wrapper, it is executed through the runInThisContext method of the vm module; since the C/C++ extension module is already an executable file generated after compilation, it can be executed directly and the exported object is returned to the caller.
The core module is divided into JavaScript core module and built-in module. The JavaScript core module is loaded into memory when the Node process starts. It can be taken out and then executed through the process.binding() method; the compilation and execution of the built-in module will go through process.binding() , get_builtin_module() and register_func() function processing.

In addition, we also found the reference chain for Node to introduce core modules, that is, file module-->JavaScript core module-->built-in module. We also learned that the C++ module completes the core internally, and the JavaScript module implements encapsulation externally. module writing method.