A B-tree is a tree data structure that keeps data sorted and allows searches, insertions, deletions, and sequential access in logarithmic amortized time. The B-tree is a generalization of a binary search tree in that more than two paths diverge from a single node. A B-tree is always balanced. Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is most commonly used in databases and filesystems.
A B-tree consists of nodes that contain keys where the information is stored, and pointers that point to other nodes in the tree. A specific B-tree may have an order of n; each node may have a maximum of 2n keys; each node may have a minimum of n keys, except for the root node. Keys are always kept sorted in each node. Each node with k keys must always have k + 1 pointers.
To search for a value, we are given a node. If we are starting from the beginning, let the node be the root node. We are now given k keys and k + 1 pointers. These are sorted as follows: 1st pointer, 1st key, 2nd pointer, 2nd key, etc. We find the value. If it's not in the node, we follow the pointer corresponding to the supposed location of the value, which gives us another node. The algorithm is recursively done until we find the value or we reach a dead-end.
We are given a value to insert. We search for the value. If we have found the value and is already in the tree, we would not need to do anything. Otherwise, we would end up with a leaf node where we can supposedly insert the value. If the node is not full then we can just insert the value there (in sorted order) and insertion is complete.
Otherwise, if the node is full, we will split that node into two. A center key is selected. All keys less than that are put in the left node, and all keys greater than that are put in the right node. The center key is promoted to the parent key, along with the two pointers to the split nodes.
If the parent node is already full, that too is split. If the root node would need to be split, a new root node containing the single promoted key is created; the tree's height increases.
We are given a value to delete. If the key is in a leaf node and if after the deletion there are at least n keys in the node, we can safely delete the node.
If the value is not in a leaf node, get a node in a corresponding subtree that is sequentially next to the value. (We can get the leftmost key of the right subtree or the rightmost key of the left subtree) and swap it with the key containing the value. This is done recursively until the key containing the value is in a leaf node, where it can be safely deleted.
If a deletion results in an underflow, the keys will need to be redistributed. We can get keys from a borrowing node, either the left or the right node. If the other node would still have a sufficient number of keys after transferring, just get the key from the other node and promote it to the parent. The key from the parent would then be transferred to the deficient node.
If transferring cannot be done, that is, both nodes are already at a minimum, then the nodes will need to be merged. The values from the two nodes will be merged to a single node, and the parent key would be demoted to that node. The removal of the parent key from its node might trigger another underflow, in which case reconstruction is recursively done. When the root node underflows, the tree shrinks and its height decreases, while the nodes under it combine to form a new root node.