A. Intro
Download the files for Lab 14 and make an IntelliJ project out of them.
Search in a Collection
An important operation provided by Collection classes is the contains
method: finding whether a given item is in the collection. For certain tasks in which this operation needs to be efficient, we need to choose a good data structure.
Let's take a look at some of the data structures we've already looked at. For each of the questions below, the answer is one of the five options:
- a constant number
- proportional to
k
- proportional to
n
- proportional to
log n
- proportional to
n^2
Question 1
Suppose that n
integers are stored in an IntList
object. How many comparisons are necessary in the worst case to determine if a given integer k
occurs in the sequence?
proportional to n
Question 2
Suppose that n
comparable objects are stored in an ArrayList
. How many comparisons are necessary in the worst case to determine if a given object k
occurs in the list?
proportional to n
Question 3
Suppose that n
integers are stored in increasing order in an IntList
object. How many comparisons are necessary in the worst case to determine if a given object k
occurs in the list?
proportional to n
Question 4
Suppose that n
integers are stored in increasing order in an array. How many comparisons are necessary in the worst case to determine if a given integer k
occurs in the sequence?
proportional to log n
Binary Search
In the last question above, we achieve an improved runtime by employing the well known divide-and-conquer algorithm known as binary search. Used with an array, it assumes that the elements of the array are in increasing order, and executes the following:
- Set
low
to 0 andhigh
to the length of the array, minus 1. The value we're looking for — we'll call itkey
— will be somewhere between positionlow
and positionhigh
if it's in the array. - While
low ≤ high
, do the following: (a) Computemid
, the middle of the range[low,high]
, and see if that'skey
. If so, returnmid
. (b) Otherwise, we can cut the range of possible positions forkey
in half, by settinghigh
tomid-1
or by settinglow
tomid+1
, depending on the result of the comparison. - If the loop terminates with
low > high
, we know thatkey
is not in the array, so we return some indication of failure.
The diagrams below portray a search for the key 25. Elements removed from consideration at each iteration are greyed out.
low = 0, high = 14, mid = 7 |
|
low = 0, high = 6, mid = 3 |
|
low = 4, high = 6, mid = 5 |
|
low = 4, high = 4, mid = 4 |
Since (roughly) half the elements are removed from consideration at each step, the worst-case running time is proportional to the log base 2 of N
, where N
is the number of elements in the array.
Self-test: Binary Search with Linked Lists
Consider applying the binary search algorithm to a sorted doubly linked list (a list where each node has references to both its previous and next node). The variables low
, high
, and med
would then be references to nodes in the list. In fact, binary search does not work in linked lists, and the contains
operation would have to take time proportional to the number of elements in the collection. Which of the steps in the binary search algorithm would would be the bottleneck that causes a decrease in performance in a sorted doubly linked list, relative to a sorted array?
Step 1, because it takes extra time to figure out the length of the linked list
|
Incorrect
|
|
Step 2a, because it takes extra time to check if we have found the key
|
Incorrect
|
|
Step 2a, because it takes extra time to calculate the index of the middle node
|
Incorrect
|
|
Step 2a, because takes extra time to access the middle node
|
Correct
|
B. Introduction to Binary Search Trees
Binary Search Trees
The binary search algorithm suggests a way to organize keys in an explicitly linked tree, as indicated in the diagram below.
The data structure that results is called a binary search tree (BST).
- Let the root value (one of the keys to be stored) be
k
. - Put all the keys that are smaller than
k
into a binary search tree, and let that tree bek
's left subtree. - Put all the keys that are larger than
k
into a binary search tree, and let that tree bek
's right subtree.
(This organization assumes that there are no duplicate keys among those to be stored.)
Below is an example of a binary search tree and two examples of trees that are not binary search trees. Why aren't they BSTs?
BST | nonBST | nonBST |
---|---|---|
Self-test: Identify non-binary search trees
Question 1
What nodes (can be multiple) must be removed from this tree to make it a binary search tree?
43,49
. Both of these are in the left subtree but greater than the root node 42
, and thus violate the constraint of a BST.
Question 2
What nodes (can be multiple) must be removed from this tree to make it a binary search tree?
15
. 15
is less than its parent 19
. Removing 15
and relinking the parent of 30
to 19
would create a BST.
The contains
method
It's important to note that in a Binary Search Tree, each subtree is also a Binary Search Tree. This suggests a recursive or iterative approach for implementing many methods, and the contains
method is no different. In pseudocode, here is an outline of the helper method
private boolean contains (TreeNode t, T key)
Note that the type of key
is T
, which is the generic type of the BinaryTree
class.
- An empty tree cannot contain anything, so if
t
isnull
returnfalse
. - If
key
is equal tot.myItem
, returntrue
. - If
key < t.myItem
,key
must be in the left subtree if it's in the structure at all, so return the result of searching for it in the left subtree. - Otherwise it must be in the right subtree, so return the result of searching for
key
in the right subtree.
This algorithm can go all the way down to a leaf to determine its answer. Thus in the worst case, the number of comparisons is proportional to d
, the depth of the tree. In a balanced tree (more on that later), you can expect the depth of the tree to be proportional to log n
in the worst case.
Use of Comparable objects
Finding a value in the tree will require "less than", "greater than", and "equals" comparisons. Since the operators < and > don't work with objects, we have to use method calls for comparisons.
The Java convention for this situation is to have the values stored in the tree be objects that implement the Comparable
interface. This interface prescribes just one method, int compareTo (Object)
. For Comparable
objects obj1
and obj2
, obj1.compareTo (obj2)
is
- negative if
obj1
is less thanobj2
, - positive if
obj1
is greater thanobj2
, and - zero if the two objects have equal values.
Balance and imbalance
Unfortunately, use of a binary search tree does not guarantee efficient search. For example, the tree
is a binary search tree in which search proceeds in the same runtime as a linked list. We thus are forced to consider the balance of a binary search tree. Informally, a balanced tree has subtrees that are roughly equal in size and depth. Later on in the course, we will encounter specific algorithms for maintaining balance in a binary search tree. Until then, we will work under the possibly unwarranted assumption that we don't need to worry much about balance.
One can prove, incidentally, that search in a BST of N
keys will require only about 2 ln N
comparisons (ln
is the "natural log" of N
) if the keys are inserted in random order. Well-balanced trees are common, and degenerate trees are rare.
Count of maximally imbalanced trees
There are 14 different binary search trees with 4 nodes. How many of the 14 are as deep as possible (i.e. they cause the search algorithm to make 4 comparisons in the worst case)?
(Luckily these bad trees get rarer when N
gets larger.)
8
binary search trees. Starting at the root node, we are constrained to select a single child for this root node, and it can be either to the left or to the right of this root node. Similarly for this child node. So we have 2^3
possibilities.
Insertion into a BST
As you may have noticed in the preceding step, there are a large number of binary search trees that contain a given N
keys. Correspondingly, there are typically a bunch of places in a BST that a key to be inserted might go, anywhere from the root down to a leaf. Given below are two trees; the tree on the right shows one possible way to insert the key 41 into the tree on the left.
To minimize restructuring of the tree and the creation of internal nodes with only one child, we choose in the following exercise to insert a new key only as a new leaf.
C. Implementing a BST
Now it's time to start writing code! As you go, don't forget to write JUnit tests.
Since binary search trees share many of the characteristics of regular binary trees, we can define the BinarySearchTree
class using inheritance from a provided BinaryTree
class.
Exercise 1: Examine BinarySearchTree.java
and BinaryTree.java
Note the following class definition:
public class BinarySearchTree<T extends Comparable<T>> extends BinaryTree<T> ...
This class definition is a slightly more complicated use of generic types than you have seen before in lab. Previously, you saw things like BinaryTree<T>
, which meant that the BinaryTree
class had a generic type T
that could be any class, interface, or abstract class that extends Object
. In this case, BinarySearchTree<T extends Comparable<T>>
means that the BinarySearchTree
class has a generic type T
that can be any class, interface, or abstract class that implements the Comparable<T>
interface. In this case, Comparable<T>
is used because the Comparable
interface itself uses generic types (much like the Iterable
and Iterator
interfaces).
Exercise 2: Implement contains
We use the following method signature:
public boolean contains(T key)
which takes a Comparable
object as an argument and checks whether the tree contains it. (Recall that Comparable
objects provide an int compareTo
method that returns a negative integer if this object is less than the argument, a positive integer if this is greater than the argument, and 0 if the two objects have equal values.)
Depending on whether you take a recursive or iterative approach, you may need to define a helper method.
Exercise 3: Implement add
We use the following method signature:
public void add(T key)
which takes a Comparable
object as an argument and adds it to the tree if and only if it isn't already there.
The trees you create with the add
method will thus not contain any duplicate elements.
Hint: You should be able to do this in a similar way to the contains
method. When you're done with both, you can write a JUnit test suite to test your code. Don't forget edge cases!
Exercise 4: Comparison Counting
How many comparisons between keys are necessary to produce the sample tree shown below? Ignore equality checks.
6 comparisons.
Exercise 5: Extra Challenge (Recommended, but optional)
If you're feeling confident, complete one more exercise. This will be in the BinaryTree
class rather than the BinarySearchTree
class. As a warning, past 61BL students have found this exercise quite difficult.
Write and test a constructor for the BinaryTree
class that, given two ArrayLists
, constructs the corresponding binary tree (NOT necessarily a BST). The first ArrayList
contains the objects in the order they would be enumerated in a preorder traversal; the second ArrayList
contains the objects in the order they would be enumerated in an inorder traversal. For example, given
ArrayLists
["A", "B", "C", "D", "E", "F"]
(preorder) and ["B", "A", "E", "D", "F", "C"]
(inorder), you should initialize the tree as follows:
You may assume every item in the tree is unique when doing this exercise.
D. Deletion from a BST
We've covered insertion in a BST and contains
in a BST. But how about deletion?
When inserting a key, we were allowed to choose where in the tree it should go. The deletion operation doesn't appear to allow us that flexibility; deletion of some keys will be easy (leaves or keys with only one child), but our deletion method has to be able to delete any key from the tree.
Here are a bunch of binary search trees that might result from deleting the root of the tree
Which ones do you think are reasonable?
A Good Way to Delete a Key
The following algorithm for deletion has the advantage of minimizing restructuring and unbalancing of the tree. The method returns the tree that results from the removal.
- Find the node to be removed. We'll call it
remNode
. If it's a leaf, remove it immediately by returningnull
. If it has only one child, remove it by returning the other child node. - Otherwise, remove the inorder successor of
remNode
, copy itsmyItem
intoremNode
, and returnremNode
.
Q: What is an inorder successor?
A: It is just the node that would appear AFTER the remNode
if you were to do an inorder traversal of the tree.
An example is diagrammed below. The node to remove is the node containing 4. It has two children, so it's not an easy node to remove. We locate the node with 4's inorder successor, namely 5. The node containing 5 has no children, so it's easy to delete. We copy the 5 into the node that formerly contained 4, and delete the node that originally contained 5.
before deletion | after deletion |
---|---|
Inorder Successor
Suppose myNode
is the root node in a BST with both a left child and a right child. Will sucNode
, the inorder successor of myNode
, ALWAYS have a null left child?
Yes, always
|
Correct!
|
|
No, only sometimes
|
Incorrect.
|
|
No, never
|
Incorrect.
|
We've implemented a delete
method for you already. Take a look at it and understand how it works.
Animations of BST Algorithms
Here is a cool interactive animation to help you visualize the BST insertion and deletion algorithms. Try inserting a key that ends up as a right child and another key that ends up as a left child. Add some more nodes to the tree, then delete some of them: a node with no children, a node with one child, and a node with two children.
Note that this animation removes from the BST by swapping with the inorder predecessor rather than the inorder successor. Convince yourself this is essentially equivalent.
E. The Engineer's Tradeoff
Consider the problem of finding the k
th largest key in a binary search tree. An obvious way of doing this is to use the inorder enumeration from this week's lab; we could call nextElement
until we get the desired key. If k
is near the number of keys N
in the tree, however, that algorithm's running time is proportional to N
. We'd like to do better. But how?
If you haven't yet noticed, this class is all about tradeoffs - finding the delicate balance between two conflicting factors to perfectly suit a certain task. Choosing an appropriate data structure is one tradeoff: an algorithm that requires quick access to a certain piece of information would perform better on an array, but an algorithm that uses many insert and delete operations would probably do better in a linked list. Sacrificing a shorter running time in exchange for more memory space, or vice versa, is another.
For this problem, we can reduce the runtime of the k
th largest key by storing in each node the size of the tree rooted at that node. Can you design an algorithm using this idea that runs in time proportional to d
, where d
is the depth of the tree?
F. Conclusion
Submission
Please submit BinarySearchTree.java
and BinaryTree.java
.