RT-Thread-粗浅理解Micropython原理（二）RT-Thread问答社区

粗浅理解Micropython原理（二）

发布于 2021-05-25 18:23:16 浏览：677 订阅该版

[tocm]

# 3. 解析
Micropython语句的解析逻辑上包括两个步骤：  
1. 令牌化（Tokenize），即词法分析器（lexer）将Micropython语句拆分成令牌（Token）。
2. 解析器（parser）根据令牌生成解析树（Parse Tree）。

解析过程由`mp_parse`函数（`py/parse.c`）实现。

## 3.1 令牌化
Micropython语句的令牌化由词法分析器完成，它从Micropython语句中逐个读取令牌并分类。令牌通常包括如下类别：  
- 标识符：包括用户自定义的名称，如函数名、变量名等；以及语言本身定义的关键字，比如`import`、`class`等
- 运算符：比如`+`、`*`等
- 界定符：比如`[`、`)`等
- 字面值常量：比如`2`、`"hello"`等
- NEWLINE：逻辑行，可能包括一或多个物理行
- 缩进：INDENT和DEDENT

Micropython定义的令牌名称见`py/lexer.h`，篇幅原因以下内容有所省略。  
```
/* py/lexer.h */

typedef enum _mp_token_kind_t {
    MP_TOKEN_END,                   // EOF

MP_TOKEN_INVALID,               // INVALID
    MP_TOKEN_DEDENT_MISMATCH,       // 
    MP_TOKEN_LONELY_STRING_OPEN,

MP_TOKEN_NEWLINE,
    MP_TOKEN_INDENT,
    MP_TOKEN_DEDENT,

MP_TOKEN_NAME,
    MP_TOKEN_INTEGER,
    MP_TOKEN_FLOAT_OR_IMAG,
    MP_TOKEN_STRING,
    MP_TOKEN_BYTES,

MP_TOKEN_ELLIPSIS,

MP_TOKEN_KW_FALSE,
    MP_TOKEN_KW_NONE,
    MP_TOKEN_KW_TRUE,
    ...
    MP_TOKEN_KW_WHILE,
    MP_TOKEN_KW_WITH,
    MP_TOKEN_KW_YIELD,

MP_TOKEN_OP_ASSIGN,
    MP_TOKEN_OP_TILDE,

// Order of these 6 matches corresponding mp_binary_op_t operator
    MP_TOKEN_OP_LESS,
    MP_TOKEN_OP_MORE,
    ...

// Order of these 13 matches corresponding mp_binary_op_t operator
    MP_TOKEN_OP_PIPE,
    MP_TOKEN_OP_CARET,
    ...

// Order of these 13 matches corresponding mp_binary_op_t operator
    MP_TOKEN_DEL_PIPE_EQUAL,
    MP_TOKEN_DEL_CARET_EQUAL,
    ...

MP_TOKEN_DEL_PAREN_OPEN,
    MP_TOKEN_DEL_PAREN_CLOSE,
    ...
    MP_TOKEN_DEL_EQUAL,
    MP_TOKEN_DEL_MINUS_MORE,
} mp_token_kind_t;
```  
令牌化的操作由`mp_lexer_to_next`函数（`py/lexer.c`）完成。它逐步读取Micropython语句并拆解成上述定义的各个令牌。  
例如，一个简单脚本文件lcd.py里的语句：  
```
import lcd

lcd.init()
print("hello")

```
会被拆解成的令牌序列为：  
```
MP_TOKEN_KW_IMPORT MP_TOKEN_NAME 
MP_TOKEN_NEWLINE
MP_TOKEN_NAME MP_TOKEN_DEL_PERIOD MP_TOKEN_NAME MP_TOKEN_DEL_PAREN_OPEN MP_TOKEN_DEL_PAREN_CLOSE MP_TOKEN_NEWLINE
MP_TOKEN_NAME MP_TOKEN_DEL_PAREN_OPEN MP_TOKEN_STRING MP_TOKEN_DEL_PAREN_CLOSE 
MP_TOKEN_NEWLINE
MP_TOKEN_END
```
**注**：  
1. 第一行的换行和第二行的换行被合并成了一个NEWLINE令牌。
2. 最后的END令牌表示EOF。

当然词法分析器并不是拆解出以上所有的令牌后才传递给解析器的，而是拆解出一个就传递一个。

## 3.2 生成解析树
解析器根据上述生成的令牌序列和Micropython的语法树生成解析树。解析树记录了沿语法树找到Micropython语句中各元素的路径。
### 3.2.1 语法树
语法树定义了Micropython的语法规则、找到Micropython语句中各元素的路径和各元素间的依赖关系，为语法检查、规则编译提供依据，是Micropython语句被编译成`bytecode`过程的核心。  
Micropython的语法树在`py/grammar.h`中描述。  
```
/* py/grammar.h */

// rules for writing rules:
// - zero_or_more is implemented using opt_rule around a one_or_more rule
// - don't put opt_rule in arguments of or rule; instead, wrap the call to this or rule in opt_rule

// Generic sub-rules used by multiple rules below.

DEF_RULE_NC(generic_colon_test, and_ident(2), tok(DEL_COLON), rule(test))
DEF_RULE_NC(generic_equal_test, and_ident(2), tok(DEL_EQUAL), rule(test))

// # Start symbols for the grammar:
// #       single_input is a single interactive statement;
// #       file_input is a module or sequence of commands read from an input file;
// #       eval_input is the input for the eval() functions.
// # NB: compound_stmt in single_input is followed by extra NEWLINE! --> not in MicroPython
// single_input: NEWLINE | simple_stmt | compound_stmt
// file_input: (NEWLINE | stmt)* ENDMARKER
// eval_input: testlist NEWLINE* ENDMARKER

DEF_RULE_NC(single_input, or(3), tok(NEWLINE), rule(simple_stmt), rule(compound_stmt))
DEF_RULE(file_input, c(generic_all_nodes), and_ident(1), opt_rule(file_input_2))
DEF_RULE(file_input_2, c(generic_all_nodes), one_or_more, rule(file_input_3))
DEF_RULE_NC(file_input_3, or(2), tok(NEWLINE), rule(stmt))
DEF_RULE_NC(eval_input, and_ident(2), rule(testlist), opt_rule(eval_input_2))
DEF_RULE_NC(eval_input_2, and(1), tok(NEWLINE))
...
// yield_expr: 'yield' [yield_arg]
// yield_arg: 'from' test | testlist

DEF_RULE(yield_expr, c(yield_expr), and(2), tok(KW_YIELD), opt_rule(yield_arg))
DEF_RULE_NC(yield_arg, or(2), rule(yield_arg_from), rule(testlist))
DEF_RULE_NC(yield_arg_from, and(2), tok(KW_FROM), rule(test))
```
可见，其中是一系列规则宏，包括两种格式：  
```
DEF_RULE_NC(rule, kind, ...)
DEF_RULE(rule, comp, kind, ...)
```
- rule：规则名
- comp：针对该规则要执行的编译动作，在后面的编译环节会用到
- kind：表示后面各个子规则的关系，可能是“或”、“与”等
- ...：当前规则的各个子规则，可能为rule、opt_rule或tok

在`grammar.h`开始部分定义的三个规则`single_input`、`file_input`和`eval_input`分别对应Micropython语句来自REPL、文件和`eval`函数三种情形的根结点。从根结点出发，逐层探索其子结点，最终就能找到相应的令牌。  
仍然以上述`lcd.py`脚本被拆解出的第一个令牌`MP_TOKEN_KW_IMPORT`为例，我们探索一下在语法树中找到这个令牌的路径。

1. DEF_RULE(**file_input**, c(generic_all_nodes), and_ident(1), opt_rule(*file_input_2*))
2. DEF_RULE(**file_input_2**, c(generic_all_nodes), one_or_more, rule(*file_input_3*))
4. DEF_RULE_NC(**file_input_3**, or(2), tok(NEWLINE), rule(*stmt*))
5. DEF_RULE_NC(**stmt**, or(2), rule(compound_stmt), rule(*simple_stmt*))
6. DEF_RULE_NC(**simple_stmt**, and_ident(2), rule(*simple_stmt_2*), tok(NEWLINE))
7. DEF_RULE(**simple_stmt_2**, c(generic_all_nodes), list_with_end, rule(*small_stmt*), tok(DEL_SEMICOLON))
8. DEF_RULE_NC(**small_stmt**, or(8), rule(del_stmt), rule(pass_stmt), rule(flow_stmt), rule(*import_stmt*), rule(global_stmt), rule(nonlocal_stmt), rule(assert_stmt), rule(expr_stmt))
9. DEF_RULE_NC(**import_stmt**, or(2), rule(*import_name*), rule(import_from))
10. DEF_RULE(**import_name**, c(import_name), and(2), tok(*KW_IMPORT*), rule(dotted_as_names))

**注**：
- 粗体表示当前已探索到的结点，斜体表示下一个要探索的结点。
- 上述过程中省略了很多无关结点的探索过程，只保留了包含目标令牌的路径。

`grammar.h`里的这些规则宏会根据`DEF_RULE_NC`和`DEF_RULE`的不同定义，被用于生成不同的C语言结构，以适应解析和编译的过程。

#### 3.2.1.1 规则索引枚举体
使用规则名生成一个枚举体，其中的各元素即是相应规则的索引。  
```
/* py/parse.c */

enum {
// define rules with a compile function
#define DEF_RULE(rule, comp, kind, ...) RULE_##rule,
#define DEF_RULE_NC(rule, kind, ...)
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
    RULE_const_object, // special node for a constant, generic Python object

// define rules without a compile function
#define DEF_RULE(rule, comp, kind, ...)
#define DEF_RULE_NC(rule, kind, ...) RULE_##rule,
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
};
```
展开之后为：
```
enum {
    // define rules with a compile function
    RULE_file_input,
    RULE_file_input_2,
    ...
    RULE_yield_expr,
    RULE_const_object, // special node for a constant, generic Python object
    
    // define rules without a compile function
    RULE_generic_colon_test,
    RULE_generic_equal_test,
    ...
    RULE_yield_arg,
    RULE_yield_arg_from,
};
```
这样，`RULE_file_input`就可以作为`file_input`规则的索引号，以此类推。  
#### 3.2.1.2 子规则关系表
提取出`grammar.h`中各规则的`kind`字段放入一个数组，各规则的`kind`字段可以通过上述索引号确定。  
```
/* py/parse.c */

STATIC const uint8_t rule_act_table[] = {
#define or(n)                   (RULE_ACT_OR | n)
#define and(n)                  (RULE_ACT_AND | n)
#define and_ident(n)            (RULE_ACT_AND | n | RULE_ACT_ALLOW_IDENT)
#define and_blank(n)            (RULE_ACT_AND | n | RULE_ACT_ADD_BLANK)
#define one_or_more             (RULE_ACT_LIST | 2)
#define list                    (RULE_ACT_LIST | 1)
#define list_with_end           (RULE_ACT_LIST | 3)

#define DEF_RULE(rule, comp, kind, ...) kind,
#define DEF_RULE_NC(rule, kind, ...)
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC

0, // RULE_const_object

#define DEF_RULE(rule, comp, kind, ...)
#define DEF_RULE_NC(rule, kind, ...) kind,
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC

#undef or
#undef and
#undef and_ident
#undef and_blank
#undef one_or_more
#undef list
#undef list_with_end
};
```
部分展开后为：
```
STATIC const uint8_t rule_act_table[] = {
    and_ident(1),
    one_or_more,
    ...
    and(2),
    
    0, // RULE_const_object
    
    and_ident(2),
    and_ident(2),
    ...
    or(2),
    and(2),
};
```
#### 3.2.1.3 子规则表
提取出`grammar.h`中各规则的子结点信息放入一个数组。  
```
/* py/parse.c */

// Define the argument data for each rule, as a combined array
STATIC const uint16_t rule_arg_combined_table[] = {
#define tok(t)                  (RULE_ARG_TOK | MP_TOKEN_##t)
#define rule(r)                 (RULE_ARG_RULE | RULE_##r)
#define opt_rule(r)             (RULE_ARG_OPT_RULE | RULE_##r)

#define DEF_RULE(rule, comp, kind, ...) __VA_ARGS__,
#define DEF_RULE_NC(rule, kind, ...)
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC

#define DEF_RULE(rule, comp, kind, ...)
#define DEF_RULE_NC(rule, kind, ...)  __VA_ARGS__,
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC

#undef tok
#undef rule
#undef opt_rule
};
```
部分展开后为：
```
STATIC const uint16_t rule_arg_combined_table[] = {
    opt_rule(file_input_2),
    rule(file_input_3),
    ...
    tok(KW_YIELD), opt_rule(yield_arg),
    
    tok(DEL_COLON), rule(test),
    tok(DEL_EQUAL), rule(test),
    ...
    rule(yield_arg_from), rule(testlist),
    tok(KW_FROM), rule(test),
};
```
#### 3.2.1.4 子规则占位枚举体
提取出`grammar.h`中各规则的子结点个数信息，将相应个数的占位符放入枚举体。
```
/* py/parse.c */

// Macro to create a list of N identifiers where N is the number of variable arguments to the macro
#define RULE_EXPAND(x) x
#define RULE_PADDING(rule, ...) RULE_PADDING2(rule, __VA_ARGS__, RULE_PADDING_IDS(rule))
#define RULE_PADDING2(rule, ...) RULE_EXPAND(RULE_PADDING3(rule, __VA_ARGS__))
#define RULE_PADDING3(rule, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, ...) __VA_ARGS__
#define RULE_PADDING_IDS(r) PAD13_##r, PAD12_##r, PAD11_##r, PAD10_##r, PAD9_##r, PAD8_##r, PAD7_##r, PAD6_##r, PAD5_##r, PAD4_##r, PAD3_##r, PAD2_##r, PAD1_##r,

// Use an enum to create constants specifying how much room a rule takes in rule_arg_combined_table
enum {
#define DEF_RULE(rule, comp, kind, ...) RULE_PADDING(rule, __VA_ARGS__)
#define DEF_RULE_NC(rule, kind, ...)
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
#define DEF_RULE(rule, comp, kind, ...)
#define DEF_RULE_NC(rule, kind, ...) RULE_PADDING(rule, __VA_ARGS__)
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
};
```
展开后为：
```
enum {
    PAD1_file_input,
    PAD1_file_input_2,
    ...
    PAD2_yield_expr, PAD1_yield_expr, 
    PAD2_generic_colon_test, PAD1_generic_colon_test,
    PAD2_generic_equal_test, PAD1_generic_equal_test,
    ...
    PAD2_yield_arg, PAD1_yield_arg,
    PAD2_yield_arg_from, PAD1_yield_arg_from
};
```
其中，`file_input`规则有1个子结点，则放入`PAD1_file_input`1个占位符；`yield-expr`规则有2个子结点，则放入`PAD2_yield_expr`和`PAD1_yield_expr`2个占位符；以此类推。

#### 3.2.1.5 子规则表开始索引
用于标记某个规则的子结点在子规则表`rule_arg_combined_table`中的起始位置。最终会使用到上面“子规则占位枚举体”里的值。
```
/* py/parse.c */

// Macro to compute the start of a rule in rule_arg_combined_table
#define RULE_ARG_OFFSET(rule, ...) RULE_ARG_OFFSET2(rule, __VA_ARGS__, RULE_ARG_OFFSET_IDS(rule))
#define RULE_ARG_OFFSET2(rule, ...) RULE_EXPAND(RULE_ARG_OFFSET3(rule, __VA_ARGS__))
#define RULE_ARG_OFFSET3(rule, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, _14, ...) _14
#define RULE_ARG_OFFSET_IDS(r) PAD13_##r, PAD12_##r, PAD11_##r, PAD10_##r, PAD9_##r, PAD8_##r, PAD7_##r, PAD6_##r, PAD5_##r, PAD4_##r, PAD3_##r, PAD2_##r, PAD1_##r, PAD0_##r,

// Use the above enum values to create a table of offsets for each rule's arg
// data, which indexes rule_arg_combined_table.  The offsets require 9 bits of
// storage but only the lower 8 bits are stored here.  The 9th bit is computed
// in get_rule_arg using the FIRST_RULE_WITH_OFFSET_ABOVE_255 constant.
STATIC const uint8_t rule_arg_offset_table[] = {
#define DEF_RULE(rule, comp, kind, ...) RULE_ARG_OFFSET(rule, __VA_ARGS__) & 0xff,
#define DEF_RULE_NC(rule, kind, ...)
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
    0, // RULE_const_object
#define DEF_RULE(rule, comp, kind, ...)
#define DEF_RULE_NC(rule, kind, ...) RULE_ARG_OFFSET(rule, __VA_ARGS__) & 0xff,
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
};
```
展开后为：
```
STATIC const uint8_t rule_arg_offset_table[] = {
    PAD1_file_input,
    PAD1_file_input_2,
    ...
    PAD2_yield_expr, 
    0,
    PAD2_generic_colon_test, 
    PAD2_generic_equal_test, 
    ...
    PAD2_yield_arg, 
    PAD2_yield_arg_from, 
};
```
其中的各个值可以在“子规则占位枚举体”里查到。  
子规则表`rule_arg_combined_table`的表项数超过了255，需要用9位数据来表示其偏移地址。这里的`rule_arg_offset_table`只记录了低8位，那第9位怎么确定呢？  
Micropython用一个常量来表示第一个子规则开始索引超过255的规则号：
```
/* py/parse.c */

// Define a constant that's used to determine the 9th bit of the values in rule_arg_offset_table
static const size_t FIRST_RULE_WITH_OFFSET_ABOVE_255 =
#define DEF_RULE(rule, comp, kind, ...) RULE_ARG_OFFSET(rule, __VA_ARGS__) >= 0x100 ? RULE_##rule :
#define DEF_RULE_NC(rule, kind, ...)
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
#define DEF_RULE(rule, comp, kind, ...)
#define DEF_RULE_NC(rule, kind, ...) RULE_ARG_OFFSET(rule, __VA_ARGS__) >= 0x100 ? RULE_##rule :
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
0;
```
这个常量的值由一系列级联的宏定义确定，其最终值会是第一个其子规则在`rule_arg_combined_table`中超过255的规则号RULE_xxx。这里的RULE_xxx就对应了最开始的规则索引枚举体里的值。  
感叹一下，这一系列的宏定义真是巧妙啊。  
#### 3.2.1.6 解析结点（Parse Node）类型
这个类型会在后面的编译环节用到，`grammar.h`里每一条规则都对应一种类型。
```
/* py/compile.c */

typedef enum {
// define rules with a compile function
#define DEF_RULE(rule, comp, kind, ...) PN_##rule,
#define DEF_RULE_NC(rule, kind, ...)
    #include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
    PN_const_object, // special node for a constant, generic Python object
// define rules without a compile function
#define DEF_RULE(rule, comp, kind, ...)
#define DEF_RULE_NC(rule, kind, ...) PN_##rule,
    #include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
} pn_kind_t;
```
展开后为：
```
typedef enum {
    // define rules with a compile function
    PN_file_input,
    PN_file_input_2,
    ...
    PN_yield_expr,
    PN_const_object, // special node for a constant, generic Python object
    
    // define rules without a compile function
    PN_generic_colon_test,
    PN_generic_equal_test,
    ...
    PN_yield_arg,
    PN_yield_arg_from,
} pn_kind_t;
```

#### 3.2.1.7 编译函数表
这是一个函数指针数组，规定了使用`DEF_RULE`定义的规则所对应的编译动作。  
```
/* py/compile.c */

STATIC const compile_function_t compile_function[] = {
// only define rules with a compile function
#define c(f) compile_##f
#define DEF_RULE(rule, comp, kind, ...) comp,
#define DEF_RULE_NC(rule, kind, ...)
    #include "py/grammar.h"
#undef c
#undef DEF_RULE
#undef DEF_RULE_NC
    compile_const_object,
};
```
展开后为：
```
STATIC const compile_function_t compile_function[] = {
    compile_generic_all_nodes,
    compile_generic_all_nodes,
    ...
    compile_yield_expr,
    compile_const_object,
};
```
表中的各个函数都在`py/compile.c`中定义。

### 3.2.2 解析过程
前面有提过，令牌化与生成解析树其实是融合在一起的，整个解析过程在`mp_parse`函数（`py/parse.c`）中实现。  
```
/* py/parse.c */

mp_parse_tree_t mp_parse(mp_lexer_t *lex, mp_parse_input_kind_t input_kind) {

parser_t parser;
    ...

// work out the top-level rule to use, and push it on the stack
    size_t top_level_rule;
    switch (input_kind) {
        case MP_PARSE_SINGLE_INPUT:
            top_level_rule = RULE_single_input;
            break;
        case MP_PARSE_EVAL_INPUT:
            top_level_rule = RULE_eval_input;
            break;
        default:
            top_level_rule = RULE_file_input;
    }
    ...

// parse!

bool backtrack = false;

for (;;) {
    next_rule:
        if (parser.rule_stack_top == 0) {
            break;
        }
        ...

switch (rule_act & RULE_ACT_KIND_MASK) {
            case RULE_ACT_OR:
                ...
                break;

case RULE_ACT_AND: {
                ...
                push_result_rule(...);
                ...
                break;
            }

default: {
                ...
                push_result_rule(...);
                ...
                break;
            }
        }
    }

...
    // truncate final chunk and link into chain of chunks
    if (parser.cur_chunk != NULL) {
        ...
        parser.tree.chunk = parser.cur_chunk;
    }
    
    if (
        lex->tok_kind != MP_TOKEN_END // check we are at the end of the token stream
        || parser.result_stack_top == 0 // check that we got a node (can fail on empty input)
        ) {
    syntax_error:;
        mp_obj_t exc;
        if (lex->tok_kind == MP_TOKEN_INDENT) {
            exc = mp_obj_new_exception_msg(&mp_type_IndentationError,
                MP_ERROR_TEXT("unexpected indent"));
        } else if (lex->tok_kind == MP_TOKEN_DEDENT_MISMATCH) {
            exc = mp_obj_new_exception_msg(&mp_type_IndentationError,
                MP_ERROR_TEXT("unindent doesn't match any outer indent level"));
        } else {
            exc = mp_obj_new_exception_msg(&mp_type_SyntaxError,
                MP_ERROR_TEXT("invalid syntax"));
        }
        // add traceback to give info about file name and location
        // we don't have a 'block' name, so just pass the NULL qstr to indicate this
        mp_obj_exception_add_traceback(exc, lex->source_name, lex->tok_line, MP_QSTRnull);
        nlr_raise(exc);
    }

...

parser.tree.root = parser.result_stack[0];

...

return parser.tree;
}
```
上面是精简后的代码框架。  
- 从语法树的根结点出发（根结点可能为`RULE_single_input`、`RULE_eval_input`或`RULE_file_input`，对应Micropython语句的不同来源），一级一级向下探索，去寻找各个令牌的路径。  
- 探索的过程在`for`循环中进行，对于每条要探索的规则，会根据其子规则的关系是`and`、`or`或者其他采取不同的探索动作。  
- 探索过程中会记录路径信息，包括源代码行号、规则号、该规则需要的参数个数、用到的用户定义的标识符（比如函数名）的索引信息等。这些信息在内存中组织成树状，所以叫做解析树。  
- 在上述探索过程结束后，会判断是否存在语法错误。因此对于Micropython语句的语法检查是在这个解析环节完成的。

### 3.2.3 解析树的结构  
解析环节最终输出一个`mp_parse_tree_t`结构，这也就是所谓的解析树。这个结构体定义如下：  
```
/* py/parse.h */

typedef struct _mp_parse_t {
    mp_parse_node_t root;
    struct _mp_parse_chunk_t *chunk;
} mp_parse_tree_t;
```
其中：
- chunk指向解析树的所有详细信息。
- root指向chunk中树根所在的内存地址。

再来看`struct _mp_parse_chunk_t`:
```
/* py/parse.c */

typedef struct _mp_parse_chunk_t {
    size_t alloc;
    union {
        size_t used;
        struct _mp_parse_chunk_t *next;
    } union_;
    byte data[];
} mp_parse_chunk_t;
```
其中：
- `alloc`为该chunk包含的有效数据字节数，不包括`alloc`和`union_`字段的大小。
- `union_`用来链接多个chunk。如果输入的源代码比较大，最终生成的解析树会很大，由于无法保证给整个解析树分配一大块连续内存，所以有可能将解析树拆成多个chunk，这些chunk间的地址是不连续的。
- `data`为有效的数据信息。查看`mp_parse`的完整源代码可知，`data`其实是一系列
`mp_parse_node_struct_t`结构。

```
/* py/parse.h */

typedef uintptr_t mp_parse_node_t; // must be pointer size

typedef struct _mp_parse_node_struct_t {
    uint32_t source_line;       // line number in source file
    uint32_t kind_num_nodes;    // parse node kind, and number of nodes
    mp_parse_node_t nodes[];    // nodes
} mp_parse_node_struct_t;
```
其中：
- `source_line`为源代码行号。
- `kind_num_nodes`包含了解析结点的类型（还记得前面讲过的解析结点类型吗？）以及其子结点的个数。  
- nodes包含各个子结点的信息，可能为下一个`mp_parse_node_struct_t`的地址，也可能为找到的标识符信息。

上面的描述初看起来可能还是不够清晰，那么我们还是以之前的`lcd.py`为例子来展示一下解析树结构的最终样子。  
![image](https://note.youdao.com/yws/res/17438/WEBRESOURCEb9d0db99908506a18e7480e5dc12cd7f)

最终的解析树由一个`mp_parse_chunk_t`结构体表示，`alloc`字段表示后面所有`mp_parse_node_struct_t`结构体的大小总和。  
我们从树根`parse_tree.root`开始看，表示`RULE_file_input_2`规则，它有3个子结点，对应我们源代码中的三行。其值是3个地址，表示这是三个中间结点，而不是叶子结点。以第一个子结点为例，继续往下看。  
第一个结点指向0x803fd6d0处的`mp_parse_node_struct_t`结构，表示`RULE_import_name`规则，它有一个子结点，这个子结点的值不是一个地址，所以是叶子节点，实际对应着"lcd"这个符号。因此解析树的第一个分支对应着`import lcd`这个语句。另外两个分支也可以按照相同的方法递进，此处不再赘述。  
将结果的数据结构绘制成树形如下：  
![image](http://note.youdao.com/yws/res/17945/WEBRESOURCEb2537ba7ecb14d395c20d4487a56a395)