Lab 11: Equality, Comparison, and Java 8

A. Intro

As usual, pull the skeleton code from github.

Learning Goals

In this lab, we'll wrap up the Java-focused portion of the class. You'll learn about two fundamental parts of comparing objects, the .equals() method for equality, and the Comparator and .compareTo() method for ordering. Both techniques will be widely used in the remainder of the class.

Finally, we'll wrap up with some exercises demonstrating Java 8's newest features, notably streams. These may simplify your coding workflow, condense your code, and make it more readable.

B. Equality & Comparison

Object methods & equals()

Take a look at the methods belonging to Object in the Java API. The two that we have been seeing so far are .toString(), which, as the name implies, returns the string representation of an Object and .equals(Object obj), which compares an Object to another.

This is different from the java built-in equality, ==, which compares by value. That is, it compares primitives for equality, but compares object instances by reference. This leads to behavior like below:

String a = "ayy";
String b = "ayy";
System.out.println(a == b);      // false
System.out.println(a.equals(b)); // true

because a and b point to different objects.

When you compare instances, you'll want to use .equals(); in many Java standard library methods, it is used by default. Additionally, if you don't want to incorporate null checking, you can use Objects.equals().

Exercise: Implement equals()

In User.java, we've provided an example of a basic website User model. Override the equals() method with your own. You should use all the instance variables a User has.

Comparison by Comparable

While we haven't explicitly learned about sorting yet, the idea of sorting should be intuitive enough - you have a list of things, and you want to put it in sorted order. While this makes sense immediately for things like ints, which we can compare with primitive operations like < and ==, this becomes less clear for general objects.

Take a look at the Java API for the interface Comparable<T>. A lot of things implement Comparable in the standard library, as it is a very ubiquitous idea. The only thing required by Comparable is the method compareTo(T o), and this imposes and defines an ordering on a class the implements Comparable.

In case you didn't actually look at the Java API, here's the major requirements for compareTo, slightly adapted:

Compares this object with the specified object for order. Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object, respectively.

There are other requirements that usually just happen naturally with a sensical implementation:

The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.

It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y)). Generally speaking, any class that implements the Comparable interface and violates this condition should clearly indicate this fact. The recommended language is "Note: this class has a natural ordering that is inconsistent with equals.

Typically, a class will compare to objects of the same type as itself (although it does not strictly have to). Doing so means data structures that require ordering (like sorted lists, and in the future, search trees) can contain the class.

Exercise: Implement Comparable for Users

Make User implement Comparable. Use the generic parameter to Comparable to ensure that User can only be used to compare against other Users.

Compare Users by their id number. If their id number is the same, compare them on their username.

After implementing this, you should be able to sort Users:

public static void main(String[] args) {
    User[] users = {new User(2, "christine", ""), new User(4, "antares", ""),
    new User(5, "ching", ""), new User(1, "daniel", ""), new User(1, "dan", "")};
    Arrays.sort(users);
    System.out.println(Arrays.toString(users));
}

Output:
[User{id=1, username='dan', email=''}, User{id=1, username='daniel', email=''},
User{id=2, username='christine', email=''}, User{id=4, username='antares', email=''},
User{id=5, username='ching', email=''}]

Note that here we use Arrays.sort because users is an array; if it was a Java Collection like ArrayList, we would use Collections.sort.

C. Higher Order Functions & Comparators

HoFs, Java 7 and previous

If you have seen higher order functions in python, you may remember being able to do something like this:

>>> a = [1, 2, 3]
>>> def f(x):
...     return x ** 2
...
>>> list(map(f, a))
[1, 4, 9]

Here we have a list, a, and a method, f(), and we pass around f as a reference. Specifically, we pass f to the map function, which applies f to every element of a. Thus map is a higher order function - it takes a function as an argument.

To reproduce the same behavior in Java (before Java 8), which only allows us to pass around objects, we would have to define some kind of Function interface. For example, a unary function interface might be represented like so:

public interface UnaryOperator<T> {
    public T apply(T t);
}

and a function that squares integers might be represented like so:

public class SquareFunction implements UnaryOperator<Integer> {
    public Integer apply(Integer n) {
        return n * n;
    }
}

We could write a map function over an array like so:

public class Mapper<T> {
    public static T[] map(UnaryOperator<T> f, T[] arr) {
        T[] ret = Arrays.copyOf(arr, arr.length);
        for (int i = 0; i < ret.length; i++) {
            ret[i] = f.apply(arr[i]);
        }
        return ret;
    }
}

And we could use it like so:

public static void main(String[] args) {
    int[] arr = {1, 2, 3};
    int[] mapped = Mapper.map(new SquareFunction(), arr);
    System.out.println(Arrays.toString(mapped)); // [1, 4, 9]
}

There are some significant downsides to this - for every function you want to use, you have to define a new class, along with writing all super-interfaces for the whole hierarchy. One shortcut you can take is defining an anonymous class (which we will not cover, but feel free to look up).

Application to Comparators

A Comparator<T> is an example of a Java higher-order function for imposing ordering on objects (if they have no natural ordering), and/or overriding their natural ordering as specified by Comparable. Take a look at the Java API on it.

Comparator only requires one method to be implemented - compare(T o1, T o2), which "Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second." Of course, it should behave transitively & symmetrically, and returning zero should also imply that o1.equals(o2).

Then, we can pass in the Comparator object as the second argument to Arrays.sort (the same works for Collections.sort) and sort on whatever we want. For example, if we wanted to sort integers in reverse order, we can create a ReverseComparator and reverse the result usual Integer comparison.

public class ReverseComparator implements Comparator<Integer> {

    public Integer compare(Integer o1, Integer o2) {
        return -(Integer.compare(o1, o2));
    }

    public static void main(String[] args) {
        int[] arr = {1, 2, 3};
        Arrays.sort(arr, new ReverseComparator());
        // The above line sorts based on ReverseComparator's compare method
        System.out.println(Arrays.toString(arr)); // [3, 2, 1]
    }
}

HoFs in Java 8

Java 8 introduces passing functions around as Function objects (almost the same as passing them around by reference) and using lambda functions (anonymous functions). Now instead of having to create your own custom interfaces, the entire Function hierarchy is implemented in the standard library.

Now, when it comes to writing things like Comparators, it means we no longer have to write a custom class. This is because a Comparator is a functional interface (that is, it only has one abstract method that must be implemented) and can be replaced by any appropriate lambda function or method reference.

Now we can use Arrays.sort like a true higher order function. Let's revisit the example of sorting integers in reverse order. Now we have two more ways to do it:

public class IntSorter {
    public static int reverseCompare(Integer i1, Integer i2) {
        return -(Integer.compare(i1, i2));
    }

    public static void main(String[] args) {
        int[] arr = {1, 2, 3};
        Arrays.sort(arr, IntSorter::reverseCompare);
        System.out.println(Arrays.toString(arr)); // [3, 2, 1]
    }
}

In the above example, we don't use a lambda statement, but instead use the :: notation to denote the reverseCompare method of the IntSorter class. You can also do this for instance methods (without associating it with an object) or with the instance method of a particular object (like instance::method). This is the equivalent of passing the method around by reference.

Note some of the differences between this and the previous Comparator example. Here, we pass in a method reference. IntSorter does not implement the Comparator interface, and we never make a new instance of IntSorter

Next, we'll try doing the same with a lambda method:

public static void main(String[] args) {
    Integer[] arr = {1, 2, 3};
    Arrays.sort(arr, (o1, o2) -> -(Integer.compare(o1, o2)));
    System.out.println(Arrays.toString(arr)); // [3, 2, 1]
}

See how much simpler that was? Notice the syntax here - first the arguments to the method are declared, then an arrow, followed by the return value. The typed parameters are not provided, and the compiler will infer them automatically by their use. Of course, longer and more specific lambda statements are possible, using multiple statements and return statements, and using typed parameters, like so:

Arrays.sort(arr, (Integer o1, Integer o2) -> {
            System.out.println("Do more stuff here");
            return -(Integer.compare(o1, o2));
        });

Exercise: getOrderedBy & Method Generics

Take a look at DBTable.java. It is a simple thin wrapper around a List<T>. You will be adding functionality for a few common operations on a list of things like what we find in databases. In our example, T will always be a User, but your code should generalize to any type.

The first method we will implement is getOrderedBy.

public <R extends Comparable<R>> List<T> getOrderedBy(Function<T, R> getter)

Note: Don't fret over the syntax explained in this paragraph. It is just here to help you better understand the exercise, and you are not expected to reproduce it. The <R extends Comparable<R>> is a method generic, which declares a generic type local to a single method(rather than the entire class), and can be used throughout the method similar to generics in classes. The method generic goes after the visibility keyword (i.e. public) and before the return type declaration(i.e. List<T>). Additionally, there is a bound on the generic type R, stating that only types that extend Comparable<R> will be accepted by the compiler.

The next interface we introduce is Function . See the Javadoc on the package java.util.Function and the Javadoc on the specific interface we are using, Function<T, R>, which describes a unary function from T to R. Like before, our Function only requires a single method be implemented, R apply(T t), which represents the behavior of the Function itself. Functions also have other methods, which are marked default, that can do other cool features, like composing functions through compose (similar to currying in CS61A) and chaining functions through andThen.

getOrderedBy should take in a method reference to a getter method of the T class, and return a new List<T>, ordered on the return value of that getter method. Example usage:

DBTable<User> t = new DBTable<>(Arrays.asList(
        new User(2, "christine", ""),
        new User(4, "antares", ""),
        new User(1, "dan", ""),
        new User(1, "daniel", ""),
        new User(5, "ching", "")
));
List<User> l = t.getOrderedBy(User::getUsername);
System.out.println(l);

Output:
[User{id=4, username='antares', email=''}, User{id=5, username='ching', email=''},
User{id=2, username='christine', email=''}, User{id=1, username='dan', email=''},
User{id=1, username='daniel', email=''}]

Implement getOrderedBy according to specification. You will need to make usage of a custom comparator, without writing Comparator once. Hint: use a lambda statement. Do not mutate DBTable's entries.

D. Streams and Data Manipulation

Introduction

A stream is a sequence of elements supporting sequential and parallel operations on its elements. The syntax looks like this:

List<String> l = Arrays.asList("a", "bd", "bb", "bc", "cc", "d");
l.stream()
    .filter(s -> s.startsWith("b"))
    .map(s -> s + s)
    .skip(1)
    .sorted(String::compareToIgnoreCase)
    .forEach(System.out::println); // returns void

Output:
bbbb
bcbc

Visually, streams create a sequence of operations. This allows for code that is more reasonable and hopefully self-explanatory. If you have seen list comprehension in Python, streams are moderately similar, but implement a strict superset of list comprehension's functionality (i.e they can do more).

Streams are created by either calling the .stream() method after any Collection class or by calling Arrays.stream(T[] array) where T represents a generic type. After that, streams consist of intermediate and terminal operations. Intermediate operations return some kind of stream, so that we can continue to call stream methods on it. filter, map, skip, sorted are all intermediate operations. Terminal operations return a non-stream object or void (like forEach). You can see the full list of operations on the official Javadoc.

Notice how many of these operations take in function references as parameters. It is assumed that these functions do not modify the state of the stream (or the structure the stream originates from) and are typically deterministic (non-random).

The stream itself is one-time-use only - that is, after it is terminated, it cannot be used again.

Working with Streams

It takes a bit of a different mindset to write data manipulation operations in the format of stream operations. One intuitive way to think about streams is as a pipeline of elements that only spits out one element at a time. Each of the intermediate operations modifies the stream in-place without looking at past or future elements, and returns the new stream. A decent analogy would be to compare streams to production lines in a factory. Each piece moves down the line through a sequence of transformations and operations but at each step the worker can complete their task without having to remember the pieces that already moved through or look at the pieces about to reach them.

Part of the motivation for working with streams is this property of independence. Allowing each piece of the collection to be processed independently of the others allows for parallel processing. If we did not use streams and instead used more traditional forms of computing, it would be as if we had our factory, but we only have a single worker who now needs to pick up each part and complete every task on it before picking up the next part. With streams, there are many workers that do not need to communicate with each other and work simultaneously on multiple parts. Also once a worker is done with their one task on a part, it moves directly to the next step in the chain and frees up the worker to start again on the next part. This allows everyone to work at the same time and not have to wait for the previous step.

For this lab, we will not focus on the parallel processing portion and will instead look at the way streams create sequences of discrete tasks to be performed on each element. As we talk about terminal and intermediate operations, almost all common operations over lists, collections, and other general data structures can be represented in terms of these.

The most important operations to think in terms of are the ideas of map, filter, and reduce. These three stream operations are the building blocks of much data manipulation, even to a very large scale. We'll discuss these in detail below.

Finally, there's a lot of operations available, some of which are extremely powerful, and it can be hard to know what the syntax is to use them. One trick is to start the stream by typing out .stream() after a collection object. This creates a stream object from that collection object and then you can type a dot and some prefix to look for which operations you want in the auto-complete menu. IntelliJ will tell you what it takes as parameters, what it returns, and you can infer from the name what it does.

Terminal Operations

Terminal operations convert a stream to something that isn't a stream. Without terminal operations, you cannot use the result of the stream. Often, this is either void or Optional<T>, a container object that may or may not contain a null value. We recommend you read the Javadoc on Stream (or at least take a glance). We'll discuss just a few of the most common and important terminal operations.

Intermediate Operations

Intermediate operations modify a stream in-place, allowing you to continue with stream operations. They all return a stream of some type.

Special Streams

Primitives have their own special streams which the programmer has the option of using. The advantage of these streams is they have additional functionality not present in other streams, like so:

Arrays.stream(new int[] {1, 2, 2, 3, 3, 1, 1, 3})
    .map(n -> n * 2)
    .sequential()
    .distinct()
    .average() // returns java.util.Optional
    .ifPresent(System.out::println)

// 4.66666666667

Of course, these streams also support other interesting terminators and intermediates, like max, min, count, sum, and so on. You can see the Javadoc on IntStream for ideas.

Exercise: getOrderedBy

Rewrite getOrderedBy to only use a single statement (one semicolon only, but can be multiple lines). Utilize streams. Make sure to reference the Java documentation to figure out what steps need to be taken.

Exercise: Fill out DBTable and UserExercises

Fill out the remainder of the DBTable.java and UserExercises, completing each method to specification using only one statement each, utilizing streams to do so. DBTable contains generic operations. UserExercises contains an assortment of very specific queries.

Each of these exercises represents a query someone (like you in the future!) might want to make on a database or list of things. Think functionally by utilizing anonymous & higher order functions. Remember the ways we can operate on our data, and the flexibility in which operations like map, filter, and reduce can be used. Being able to write functional code (no pun intended) effectively and recognizing when it should be used leads to concise, readable, and efficient code.

E. Conclusion

Streams and higher order functions give a powerful and readable way to work with data (in the form of Collections). They provide other benefits too - efficiency in execution and easy parallelization. With the onset of multi-core and multi-threaded computing, parallel computations are becoming a necessity for handling larger workloads. Streams can be made parallel extremely easily and functional programming, due to it's stateless nature, lends itself very naturally to scaling upwards. However, as parallelism is far outside the scope of this class (take CS 61C for more!), we won't explore further in that direction.

Deliverables

Here's a quick recap of what you need to do to complete this lab!

Submit User.java, DBTable.java, and UserExercises.java by pushing to git.

F. Additional Examples

collect(Collector collector)

In the lab, you saw an example of how to use collect in one of the simplest ways. Here, we will cover some of the other useful ways collect and the Collectors class can be used.

We can specify the collection we want to create (in this case an ordered set):

 // Accumulate names into a TreeSet
 TreeSet<String> set = people.stream()
                         .map(Person::getName)
                         .collect(Collectors.toCollection(TreeSet::new));

or just any Set:

 // Accumulate names into a TreeSet
 Set<String> set = people.stream()
                         .map(Person::getName)
                         .collect(Collectors.toSet());

We can do something similar to python's string.join method:

 // Convert elements to strings and concatenate them, separated by commas
 String joined = things.stream()
                       .map(Object::toString)
                       .collect(Collectors.joining(", "));

We can group up things from lists into mappings:

 // Group employees by department
 Map<Department, List<Employee>> byDept =
            employees.stream()
                    .collect(Collectors.groupingBy(Employee::getDepartment));

Collectors have many more powerful uses, which we could talk about spanning twice the length of this lab. We've just listed some of the most commonly used above, but take a look at the javadoc for more ideas of how to use them.

reduce(BinaryOperator<T> accumulator) and Optional<T>

In the lab we introduced the reduce method and mentioned that the first signature we presented, reduce(BinaryOperator<T> accumulator), returns an object of Optional<T> type. Remember that this is essentially just a small wrapper class that allows reduce to return a value if it can and nothing if the stream is empty without causing many errors. Here are some examples of how to make use of Optional<T> type objects.

We can modify a value if it was produced:

int lastTwoDigits = t.entries.stream()
           .reduce(((user, user2) -> {
               if (user.getId() > user2.getId()) {return user;}
               else {return user2;}
           }))
           .map(user -> user.getId() % 10)
           .orElse(0);

or even set the value if it was not produced:

int largestOdd = Arrays.stream(new int[] {2, 22, 8, 14, 22, 6, 12, 18})
           .filter(i -> i % 2 == 1)
           .reduce(((i1, i2) -> {
               if (i1 > i2) {return i1;}
               else {return i2;}
           }))                                            // returns java.util.Optional
           .orElse(0);                                    // Default result if no odd values

Also in this section of the lab we mentioned allMatch(Predicate<T> predicate) and anyMatch(Predicate<T> predicate). A quick example of its uses:

 boolean anyUnderage = Arrays.stream(new int[] {23, 21, 21, 20, 24, 26, 17})
                            .anyMatch(age -> age < 21)           //Would return True in this case because there is a 20 and 17 year old