Let’s start with a simple question:
<script type="text/javascript">
alert(i); // ?
var i = 1;
</script>
The output result is undefined. This phenomenon is called "pre-parsing": the JavaScript engine will parse var variables and function definitions first. The code is not executed until pre-parsing is complete. If a document stream contains multiple script code segments (js code separated by script tags or imported js files), the running order is:
step1. Read the first code segment
step2. Do syntax analysis. If there is an error, a syntax error will be reported (such as mismatched brackets, etc.) and jump to step5.
step3. Do "pre-parsing" of var variable and function definitions (no errors will ever be reported, because only correct declarations are parsed)
step4. Execute the code segment and report an error if there is an error (for example, the variable is undefined)
step5. If there is another code segment, read the next code segment and repeat step2.
step6. At the end of the above analysis, it has been able to explain many problems, but I always feel that there is something missing. For example, in step 3, what exactly is "pre-parsing"? And in step 4, look at the following example:
<script type="text/javascript">
alert(i); // error: i is not defined.
i = 1;
</script>
Why does the first sentence cause an error? In JavaScript, don’t variables need to be undefined?
The time of the compilation process passed by like a white horse, and I opened the "Principles of Compilation" next to the bookcase as if it were a world away. There was this note in the familiar yet unfamiliar blank space:
For traditional compiled languages, the compilation steps are divided into: lexical analysis and syntax analysis. , semantic checking, code optimization and byte generation.
But for interpreted languages, after the syntax tree is obtained through lexical analysis and syntax analysis, interpretation and execution can begin.
Simply put, lexical analysis is to convert a character stream (char stream) into a token stream (token stream), such as converting c = a - b; to:
NAME "c"
EQUALS
NAME "a"
MINUS
NAME "b"
SEMICOLON
The above are just examples. For more information, please see Lexical Analysis.
Chapter 2 of "The Definitive Guide to JavaScript" talks about Lexical Structure, which is also described in ECMA-262. Lexical structure is the basis of a language and is easy to master. As for the implementation of lexical analysis, that is another research area and will not be explored here.
We can use the analogy of natural language. Lexical analysis is a one-to-one hard translation. For example, if a paragraph of English is translated into Chinese word by word, what we get is a bunch of token streams, which is difficult to understand. Further translation requires grammatical analysis. The following figure is a syntax tree of a conditional statement:
When constructing the syntax tree, if it is found that it cannot be constructed, such as if(a { i = 2; }, a syntax error will be reported and the parsing of the entire code block will end. This is step 2 at the beginning of this article.
Through syntax analysis, construct After the syntax tree, the translated sentence may still be ambiguous, and further semantic checking is required. For traditional strongly typed languages, the main part of semantic checking is type checking, such as the actual parameters of functions and Whether the formal parameter types match. For weakly typed languages, this step may not be available (I have limited energy and do not have time to look at the JS engine implementation, so I am not sure whether there is a semantic check step in the JS engine)
. It turns out that for JavaScript engines, there must be lexical analysis and syntax analysis, and then there may be steps such as semantic checking and code optimization. After these compilation steps are completed (any language has a compilation process, but interpreted languages are not compiled into binary code), the code will start executing.
The above compilation process still cannot explain the "pre-parsing" at the beginning of the article. We have to carefully explore the execution process of the JavaScript code.
Zhou Aimin said in "The Essence of the JavaScript Language
".The second part of "Programming Practice" has a very careful analysis of this. Here are some of my insights:
Through compilation, the JavaScript code has been translated into a syntax tree, and then will be executed immediately according to the syntax tree,
which requires further execution
.Understand the scope mechanism of JavaScript. JavaScript uses lexical scope. In layman's terms, the scope of JavaScript variables is determined when they are defined rather than when they are executed. That is to say, the lexical scope depends on the source code. The compiler can determine it through static analysis, so lexical scope is also called static scope. However, it should be noted that the semantics of with and eval cannot be realized only through static technology. In fact, we can only talk about the scope mechanism of JS. Very close to lexical scope.
When the JS engine executes each function instance, it creates an execution context. The execution context contains a call object. The call object is a scriptObject structure that is used to save the internal variable table. Syntax analysis structures such as varDecls, embedded function table funDecls, and parent reference list upvalue (note: information such as varDecls and funDecls are obtained during the syntax analysis stage and are saved in the syntax tree. When the function instance is executed, this information will be Copied from the syntax tree to scriptObject). scriptObject is a static system related to the function, consistent with the life cycle of the function instance.
Lexical scope is the scope mechanism of JS, and you also need to understand its implementation method. This is the scope chain. The scope chain is a name lookup mechanism. It first searches for the scriptObject in the current execution environment. If it is not found, it follows the upvalue to the parent scriptObject and looks up to the global object.
When a function instance is executed, a closure is created or associated with it. scriptObject is used to statically save variable tables related to functions, while closure dynamically saves these variable tables and their running values during execution. The life cycle of a closure may be longer than that of a function instance. The function instance will be automatically destroyed after the active reference becomes empty, and the closure will be recycled by the JS engine after the data reference becomes empty (in some cases, it will not be automatically recycled, resulting in a memory leak).
Don’t be intimidated by the bunch of nouns above. Once you understand the concepts of execution environment, calling object, closure, lexical scope, and scope chain, many phenomena in the JS language can be easily solved.
Summary At this point, the questions at the beginning of the article can be explained very clearly:
The so-called "pre-parsing" in step 3 is actually completed in the syntax analysis stage of step 2 and stored in the syntax tree. When a function instance is executed, varDelcs and funcDecls will be copied from the syntax tree to the scriptObject of the execution environment.
In step 4, undefined variables mean that they cannot be found in the variable table of scriptObject. The JS engine will search upward along the upvalue of scriptObject. If neither is found, the write operation i = 1; will finally be equivalent to window. i = 1; adds a new attribute to the window object. For read operations, if the scriptObject that is traced back to the global execution environment cannot be found, a runtime error will occur.
After understanding, the fog cleared and the flowers bloomed, and the sky became clear.
Finally, I leave you with a question:
<script type="text/javascript">
var arg = 1;
function foo(arg) {
alert(arg);
var arg = 2;
}
foo(3);
</script>
What is the output of alert?