php:tree structure algorithm

Author：Eve Cole Update Time：2009-06-02 18:07:31

Reposted from Joy Murakami, I have read this article before, and it is explained quite clearly.
We will encounter this problem in many places such as product classification, multi-level tree-structured forums, mailing lists, etc.: How to store multi-level structured data?

In PHP applications, the backend data storage is usually a relational database, which can save large amounts of data and provide efficient data retrieval and update services. However, the basic form of relational data is a criss-crossed table, which is a flat structure. If you want to store a multi-level tree structure in a relational database, you need to perform reasonable translation work. Next, I will discuss with you what I have seen and heard and some practical experiences.

There are basically two common design methods for storing hierarchical data in flat databases:

adjacency list model
Modified preorder tree traversal algorithm
I'm not a computer major, and I haven't learned anything about data structures, so I translated these two names literally. Please let me know if I'm wrong.

These two things may sound scary, but they are actually very easy to understand. Here I use a simple food directory as our example data. Our data structure is like this:

Food: Food
Fruit: fruit
Red: red
Cherry:cherry
Yellow:yellow
Banana:banana
Meat: Meat
Beef:beef
Pork: Pork

adjacency list model

is a model we often use and has been introduced in many tutorials and books. We describe the entire tree structure through a flat table by adding an attribute parent to each node to represent the parent node of this node. According to this principle, the data in the example can be transformed into the following table:

Displaying a multi-level tree If we need to display such a multi-level structure we need a recursive function.

<?php
// $parent is the parent of the children we want to see
// $level is increased when we go deeper into the tree,
// used to display a nice indented tree

function display_children($parent, $level)
{
// Get all child nodes of a parent node $parent
$result = mysql_query('SELECT name FROM tree '.
'WHERE parent="'.$parent.'";');

// Display each child node
while ($row = mysql_fetch_array($result))
{
//Indent the node name
echo str_repeat(' ',$level).$row['name']."n";

//Call this function again to display the child nodes of the child node

display_children($row['name'], $level+1);
}
}
?>
Using this function on the root node (Food) of the entire structure can print out the entire multi-level tree structure. Since Food is the root node and its parent node is empty, call: display_children('',0). will display the contents of the entire tree:

Food
Fruit
Red
Cherry
Yellow
Banana
Meat
Beef
Pork
If you only want to display a part of the entire structure, such as the fruit part, you can call it like this:

display_children('Fruit',0);

Almost using the same method, we can know the path from the root node to any node. For example, Cherry's path is "Food > Fruit > Red". In order to get such a path, we need to start from the deepest level "Cherry", query to get its parent node "Red" and add it to the path, and then we query Red's parent node and add it to the path. , and so on until the top level "Food"

<?php
// $node is the deepest node
function get_path($node)
{
// Query the parent node of this node
$result = mysql_query('SELECT parent FROM tree '.
'WHERE name="'.$node.'";');
$row = mysql_fetch_array($result);

// Use an array to save the path
$path = array();

// If it is not the root node, continue to query upwards
// (The root node has no parent node)
if ($row['parent']!='')
{
// the last part of the path to $node, is the name
// of the parent of $node
$path[] = $row['parent'];

// we should add the path to the parent of this node
// to the path
$path = array_merge(get_path($row['parent']), $path);
}

// return the path
return $path;
}
?>
If you use this function for "Cherry": print_r(get_path('Cherry')), you will get an array like this:

Array
(
[0] => Food
[1] => Fruit
[2] => Red
)
How to print it into the format you want is up to you.
Disadvantages: This method is very simple, easy to understand, and easy to use. But there are some disadvantages. Mainly because the running speed is very slow, because each node requires a database query, and when the amount of data is large, many queries are required to complete a tree. In addition, due to the need for recursive operations, each level of recursion needs to occupy some memory, so the space utilization efficiency is relatively low.

Now let us take a look at another method that does not use recursive calculations and is faster. This is the modified preorder tree traversal algorithm. You may have less exposure to this method, and it is not like the one above when you use it for the first time. The method is easy to understand, but because this method does not use recursive query algorithms, it has higher query efficiency.

We first draw the multi-level data on paper in the following way, write 1 on the left side of the root node Food, then continue down the tree, write 2 on the left side of Fruit, and then continue moving along the entire tree. Edges label each node with numbers on the left and right. The last number is 18 marked to the right of Food. In the picture below you can see the entire multi-level structure marked with numbers. (Don’t understand? Point to the numbers with your fingers and count from 1 to 18 and you will understand. If you still don’t understand, count again and be careful to move your fingers).
These numbers indicate the relationship between each node. The numbers of "Red" are 3 and 6, which are the descendant nodes of "Food" 1-18. Similarly, we can see that all nodes with a left value greater than 2 and a right value less than 11 are descendants of "Fruit" 2-11

In this way, the entire tree structure can be stored in the database through the left and right values. Before continuing, let's take a look at the compiled data table below.

+-----------------------+-----+-----+
| parent | name | lft | rgt |
+-----------------------+-----+-----+
| | Food | 1 | 18 |
| Food | Fruit | 2 | 11 |
| Fruit | Red | 3 | 6 |
| Red | Cherry | 4 | 5 |
| Fruit | Yellow | 7 | 10 |
| Yellow | Banana | 8 | 9 |
| Food | Meat | 12 | 17 |
| Meat | Beef | 13 | 14 |
| Meat | Pork | 15 | 16 |
+-----------------------+-----+-----+
Note: Since "left" and "right" have special meanings in SQL, we need to use "lft" and "rgt" to represent the left and right fields. In addition, the "parent" field is no longer needed in this structure to represent the tree structure. In other words, the following table structure is sufficient.

+------------+-----+-----+
| name | lft | rgt |
+------------+-----+-----+
| Food | 1 | 18 |
| Fruit | 2 | 11 |
| Red | 3 | 6 |
| Cherry | 4 | 5 |
| Yellow | 7 | 10 |
| Banana | 8 | 9 |
| Meat | 12 | 17 |
| Beef | 13 | 14 |
| Pork | 15 | 16 |
+------------+-----+-----+
Okay, we can now get data from the database. For example, if we need to get all the nodes under the "Fruit" item, we can write the query statement like this: SELECT * FROM tree WHERE lft BETWEEN 2 AND 11; This query got the following results .

+------------+-----+-----+
| name | lft | rgt |
+------------+-----+-----+
| Fruit | 2 | 11 |
| Red | 3 | 6 |
| Cherry | 4 | 5 |
| Yellow | 7 | 10 |
| Banana | 8 | 9 |
+------------+-----+-----+
See, you can get all these nodes with just one query. In order to be able to display the entire tree structure like the recursive function above, we also need to sort such a query. Sort by the lvalue of the node:

SELECT * FROM tree WHERE lft BETWEEN 2 AND 11 ORDER BY lft ASC;
The remaining question is how to display hierarchical indentation.

<?php
function display_tree($root)
{
//Get the left and right values of the root node
$result = mysql_query('SELECT lft, rgt FROM tree '.'WHERE name="'.$root.'";');
$row = mysql_fetch_array($result);

// Prepare an empty rvalue stack
$right = array();

// Get all descendant nodes of the root point
$result = mysql_query('SELECT name, lft, rgt FROM tree '.
'WHERE lft BETWEEN '.$row['lft'].' AND '.
$row['rgt'].' ORDER BY lft ASC;');

// Display each row
while ($row = mysql_fetch_array($result))
{
// only check stack if there is one
if (count($right)>0)
{
// Check if we should move the node off the stack
while ($right[count($right)-1]<$row['rgt'])
{
array_pop($right);
}
}

// Indent the name of the node.
echo str_repeat(' ',count($right)).$row['name']."n";

// Add this node to the stack
$right[] = $row['rgt'];
}
}
?>
If you run the above function you will get the same result as the recursive function. It's just that our new function may be faster because there are only 2 database queries. It is easier to know the path of a node. If we want to know the path of Cherry, we use its left and right values 4 and 5 to make a query.

SELECT name FROM tree WHERE lft < 4 AND rgt > 5 ORDER BY lft ASC;
This will give you the following result:

+----------------+
| name |
+----------------+
| Food |
| Fruit |
| Red |
+----------------+
So how many descendant nodes does a certain node have? It's very simple, the total number of descendants = (rvalue - left value - 1)/2 descendants = (right - left - 1) / 2 Don't believe it? Do the math yourself. Using this simple formula, we can quickly calculate that the "Fruit 2-11" node has 4 descendant nodes, while the "Banana 8-9" node has no descendant nodes, which means that it is not a parent node.
Amazing, right? Although I have used this method many times, it still feels amazing every time I do it.

This is indeed a good method, but is there any way to help us create such a data table with left and right values? Here is another function to introduce to you. This function can automatically convert the table with name and parent structures into a data table with left and right values.

<?php
function rebuild_tree($parent, $left) {
// the right value of this node is the left value + 1
$right = $left+1;

// get all children of this node
$result = mysql_query('SELECT name FROM tree '.
'WHERE parent="'.$parent.'";');
while ($row = mysql_fetch_array($result)) {
// recursive execution of this function for each
// child of this node
// $right is the current right value, which is
// incremented by the rebuild_tree function
$right = rebuild_tree($row['name'], $right);
}

// we've got the left value, and now that we've processed
// the children of this node we also know the right value
mysql_query('UPDATE tree SET lft='.$left.', rgt='.
$right.' WHERE name="'.$parent.'";');

// return the right value of this node + 1
return $right+1;
}
?>
Of course, this function is a recursive function. We need to run this function from the root node to rebuild a tree with left and right values

rebuild_tree('Food',1);
This function looks a bit complicated, but its function is the same as manually numbering the table, which is to convert the three-dimensional multi-layer structure into a data table with left and right values.

So how do we add, update and delete a node for such a structure? There are generally two ways to add a node:

retain the original name and parent structure, use the old method to add data to the data, and use the rebuild_tree function to renumber the entire structure after each piece of data is added.
A more efficient approach is to change all values to the right of the new node. For example: we want to add a new fruit "Strawberry" which will become the last child node of the "Red" node. First we need to make some space for it. The right value of "Red" should be changed from 6 to 8, and the left and right value of "Yellow 7-10" should be changed to 9-12. By analogy, we can know that if you want to make room for new values, you need to add 2 to all nodes with left and right values greater than 5 (5 is the right value of the last child node of "Red"). So we perform database operations like this:

UPDATE tree SET rgt=rgt+2 WHERE rgt>5;
UPDATE tree SET lft=lft+2 WHERE lft>5;
This frees up space for the newly inserted value. Now you can create a new data node in the freed space. Its left and right values are 6 and 7 respectively.

INSERT INTO tree SET lft=6, rgt=7, name ='Strawberry';

Let's do another query and see! How about it? Soon.
Okay, now you can design your multi-level database structure in two different ways. Which method you use depends entirely on your personal judgment, but for structures with many levels and a large number, I prefer the second method. The first method is easier if the query volume is small but the data needs to be added and updated frequently.

In addition, if the database supports it, you can also write rebuild_tree() and the space-freeing operation as a trigger function on the database side, and execute it automatically during insertion and update. This can achieve better operating efficiency, and you can add new nodes. SQL statements will become simpler.
Class recursive method
Posted by guest on 2004, May 31 - 9:18am.
I wrote a program using a quasi-recursive method, which is not exactly the same as the recursion in the article, and I am preparing to transplant it to xoops:
http://dev.xoops.org/modules/xfmod/project/?ulink

has experienced memory overflow, but I plan to continue to use the recursive method. I just need to continue to improve.

I hope to have the opportunity to discuss cms with you.
» reply to this comment
Or a comparison of the two methods?
Posted by guest on 2004, March 17 - 8:30pm.
I studied this article carefully and felt that it was very beneficial, but then I thought about it again and felt that there is a problem (for the sake of memory, I call the adjacent directory mode the recursive method, and the pre-sort traversal tree algorithm I call the pre-sort tree method):

1. The biggest difference between the two methods is that recursion requires the use of a stack when querying, while pre-sorting tree requires half of the nodes (referring to the second half of the inserted node) when updating the nodes. of updates. Although you also said that if there are many nodes and frequent updates, the efficiency of the pre-sorted tree will be reduced, and recursion will be better. If there are many node levels, first of all, recursion will cause stack overflow, and furthermore, recursion itself is not very efficient. , plus each level of recursion requires operating the database, the overall effect will not be ideal. My current approach is to take out all the data at once, and then perform recursive operations on the array, which will be better; if it can be further improved, a ROOT root node can be added to each row of records (currently, only adjacent parent nodes are recorded) , so that the efficiency when searching the branch tree will be higher, and it will also be very convenient when updating the tree, which should be a better way.

2. Improve the recursive method. In the article, when calculating the left and right values of the pre-sorted tree nodes, a traversal method is actually used. The stack is replaced by an array, and the push and pop are manually implemented; if this method is referenced in the recursive algorithm , if you use an array instead of the stack when performing recursion, you can also improve the efficiency of recursion.

3. Concurrency. If concurrency is taken into account, especially when updating the tree, the method of updating node information in a large area of the pre-sorted tree requires extra attention to the use of locking and transaction mechanisms to ensure data consistency.

4. In the case of multiple root nodes or multiple parent nodes, in this case, it is obviously not a standard binary tree or multi-fork tree. The pre-sorted tree algorithm needs to be greatly improved to adapt, and the recursive method is applied freely, so in this case, recursion is more adaptable. This is of course, because the recursive method is a form of linked list. Trees and graphs can be expressed by linked lists, and of course it is highly adaptable.

5. Intuitive. If you directly observe the data stored in the database without program operation, it is obvious that the data stored in the recursive mode is more intuitive, while the data in the pre-sorted tree is difficult to read directly (for hierarchical relationships). This is important in data exchange. Will it have any impact?

Generally speaking, I personally prefer to use recursive methods, but I have always been worried about the impact of recursion on efficiency. Fortunately, I have not been exposed to large-scale classification levels. Using arrays instead of stacks recursively would be a better improvement method. The pre-sorted tree is an efficient method to solve simple trees. Once you get used to it, it should be very good, especially its reverse search from the leaf node to the root node is very convenient.

Fwolf
www.fwolf.com
» reply to this comment
Very happy to see your reply
Posted by shuke on 2004, March 18 - 5:47am.
I am very glad that you read this article so carefully. This article was actually originally published on sitepoint.com. I translated it, hoping to introduce some methods to friends who want to get started. Your method is also very good, I will try it if I have a chance. (If you are interested, why not write your method and specific implementation code as a tutorial based on the above example, so that everyone can imitate it with more practical examples) If you have questions about saving multi-level structures in the database If you are interested in research, here are two other good links that can be used as reference:
Introducing the common four methods of one-time query and array sorting script. I think your script must be better than this.
In addition, I saw that you also use drupal. It also has an advanced feature called a distributed user authentication system. As long as you register on any drupal site, you can log in to access other drupal sites. Quite interesting.
Best wishes!
» reply to this comment
Building trees using loops has been implemented
Posted by guest on 2004, March 25 - 10:10pm.
I have read all the information you provided last time, but to be honest, there is not much new stuff in the first article. Maybe I didn’t understand it very well. The second article was actually written in PHP3, and the program structure was not detailed. See, too many function intersections are used.
It just so happened that I needed to use hierarchical user roles in a system, so I wrote down the traversal based on the idea of an array. I didn’t have time to sort it out, so I’ll put it here for you to take a look at. The database is ADODB, and the program is extracted directly from the system. I hope it can be described clearly. It mainly uses PHP's powerful array operations and uses loops to perform recursion. The comment is a similar method, but the timing of processing the results is different.

<?php
/**
* Show list
* @access public
*/
function DispList()
{
//Display mode without indentation
// $this->mIsDispListIndex = true;
// echo('<a href="?action=new&part=role">Add new role</a> '); _fcksavedurl=""?action=new&part=role ">Add new role</a> ');"
//
// $this->mListTitle = 'User role list';
// $this->SetDataOption('list');
//
// $this->SetQueryTable( array($this->mTableUserRole) );
//
// //Query order
// $this->SetQueryOrder( 'asc', $this->mTableUserRole, 'sequence' );
//
// $this->Query('list');
// parent::DispList();

// //Another display method, using an array as a stack, A: Save the role when pushing on the stack, and delete the source after pushing.
// $this->CheckProperty('mrDb');
// $this->CheckProperty('mrSql');
// $this->mrSql->Select('role, title, parent');
// $this->mrSql->From($this->mTableUserRole);
// $this->mrSql->Orderby('parent, sequence');
// $this->mRs = $this->mrDb->Execute($this->mrSql->Sql());
// if (0 < count($this->mRs))
// {
// $source = & $this->mRs->GetArray(); //Numeric index
// $stack = array(''); //stack
// $stacki = array(-1); //Corresponds to the stack, records the level of the data in the stack in the tree
// $target = array();
// while (0 < count($stack))
// {
// $item = array_shift($stack);
// $lev = array_shift($stacki);
// if (!empty($item))
// {
// //Put the processed data into the target array here
// array_push($target, str_repeat(' ', $lev) . $item);
// //$s1 = str_repeat(' ', $lev) . $item;
// }
// $del = array(); //Node to be deleted from $source
// $ar = array(); //Nodes that need to be added to the stack
// foreach ($source as $key=>$val)
// {
// //Find matching child nodes
// if (empty($item))
// {
// $find = empty($source[$key]['parent']);
// }
//else
// {
// $find = ($item == $source[$key]['parent']);
// }
// if ($find)
// {
// array_unshift($ar, $source[$key]['role']);
// $del[] = $key;
// }
// }
// foreach ($ar as $val)
// {
//array_unshift($stack, $val);
//array_unshift($stacki, $lev + 1);
// }
// foreach ($del as $val)
// {
// unset($source[$val]);
// }
// echo(implode(', ', $stack) . ' ' . implode(', ', $stacki) . ' ' . implode(', ', $target) . ' ');
// }
//debug_array();
// }
//else
// {
// echo('<center>No data retrieved</center>');
// }

//Another display method, using an array as a stack, B: Save the array index when pushing the stack, pop it out of the stack, and then delete the source after use.
$this->CheckProperty('mrDb');
$this->CheckProperty('mrSql');
$this->mrSql->Select('role, title, parent');
$this->mrSql->From($this->mTableUserRole);
$this->mrSql->Orderby('parent, sequence');
$this->mRs = $this->mrDb->Execute($this->mrSql->Sql());
if (!empty($this->mRs) && !$this->mRs->EOF)
{
$source = & $this->mRs->GetArray(); //numeric index
$stack = array(-1); //stack
$stacki = array(-1); //Corresponds to the stack, records the level of the data in the stack in the tree
$target = array();
while (0 < count($stack))
{
$item = array_shift($stack);
$lev = array_shift($stacki);
if (-1 != $item)
{
//Put the processed data into the target array here
$s1 = str_repeat(' ', $lev) . '<a href="?action=disp&part=role&role=' . $source[$item]['role'] . '">' . $source[$item] ['title'] . '</a>';
$s2 = '<a href="?action=edit&part=role&role=' . $source[$item]['role'] . '">Edit</a> <a href="?action=delete&part=role&role= ' . $source[$item]['role'] . '">Delete</a>';
array_push($target, array($s1, $s2));
}
$del = array(); //Node to be deleted from $source
$ar = array(); //Nodes that need to be added to the stack
foreach ($source as $key=>$val)
{
//Find matching child nodes
if (-1 == $item)
{
$find = empty($source[$key]['parent']);
}
else
{
$find = ($source[$item]['role'] == $source[$key]['parent']);
}
if($find)
{
array_unshift($ar, $key);
}
}
foreach ($ar as $val)
{
array_unshift($stack, $val);
array_unshift($stacki, $lev + 1);
}
//Delete from source
unset($source[$item]);
//echo(implode(', ', $stack) . ' ' . implode(', ', $stacki) . ' ' . implode(', ', $target) . ' ');
}
//Output
echo('<a href="?action=new&part=role">Add new role</a> ');
array_unshift($target, array('role', 'operation'));
$this->CheckProperty('mrLt');
$this->mrLt->SetData($target);
$this->mrLt->mListTitle = 'User role list';
$this->mrLt->mIsDispIndex = false;
$this->mrLt->Disp();
}
else
{
echo('<center>No data retrieved</center>');
}
} // end of function DispList
?>