Lab 11: Equality, Comparison, and Java 8

A. Intro

As usual, pull the skeleton code from github.

Learning Goals

In this lab, we'll wrap up the Java-focused portion of the class. You'll learn about two fundamental parts of comparing objects, the .equals() method for equality, and the Comparator and .compareTo() method for ordering. Both techniques will be widely used in the remainder of the class.

Finally, we'll wrap up with some exercises demonstrating Java 8's newest features, notably streams. These may simplify your coding workflow, condense your code, and make it more readable.

Testing

A JUnit test suite has been provided to verify the correctness of your methods.

B. Equality & Comparison

Object methods & equals()

Take a look at the methods belonging to Object in the Java API. The two that we have been seeing so far are .toString(), which, as the name implies, returns the string representation of an Object and .equals(Object obj), which compares an Object to another.

This is different from the java built-in equality, ==, which compares by value. That is, it compares primitives for equality, but compares object instances by reference. This leads to behavior like below:

String a = "ayy";
String b = "ayy";
System.out.println(a == b);      // false
System.out.println(a.equals(b)); // true

because a and b point to different objects.

When you compare instances, you'll want to use .equals(); in many Java standard library methods, it is used by default. Additionally, if you don't want to incorporate null checking, you can use Objects.equals().

Exercise: Implement equals()

In User.java, we've provided an example of a basic website User model. Override the equals() method with your own.

Comparison by Comparable

While we haven't explicitly learned about sorting yet, the idea of sorting should be intuitive enough - you have a list of things, and you want to put it in sorted order. While this makes sense immediately for things like ints, which we can compare with primitive operations like < and ==, this becomes less clear for general objects.

Take a look at the Java API for the interface Comparable<T>. A lot of things implement Comparable in the standard library, as it is a very ubiquitous idea. The only thing required by Comparable is the method compareTo(T o), and this imposes and defines an ordering on a class the implements Comparable.

In case you didn't actually look at the Java API, here's the major requirements for compareTo, slightly adapted:

Compares this object with the specified object for order. Returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object, respectively.

There are other requirements that usually just happen naturally with a sensical implementation:

The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.

It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y)). Generally speaking, any class that implements the Comparable interface and violates this condition should clearly indicate this fact. The recommended language is "Note: this class has a natural ordering that is inconsistent with equals.

Typically, a class will compare to objects of the same type as itself (although it does not strictly have to). Doing so means data structures that require ordering (like sorted lists, and in the future, search trees) can contain the class.

Exercise: Implement Comparable for Users

Make User implement Comparable. Use the generic parameter to Comparable to ensure that User can only be used to compare against other Users.

Compare Users by their id number. If their id number is the same, compare them on their username.

After implementing this, you should be able to sort Users:

public static void main(String[] args) {
    User[] users = {new User(2, "daniel", ""), new User(4, "matt", ""),
            new User(1, "sarahjkim", ""), new User(1, "alanyao", "")};
    Arrays.sort(users);
    System.out.println(Arrays.toString(users));
}

Output:
[User{id=1, username='alanyao', email=''}, User{id=1, username='sarahjkim', email=''}, User{id=2, username='daniel', email=''}, User{id=4, username='matt', email=''}]

Note that here we use Arrays.sort because users is an array; if it was a Java Collection like ArrayList, we would use Collections.sort.

C. Higher Order Functions & Comparators

HoFs, Java 7 and previous

If you learned higher order functions in python, you may remember being able to do something like this:

>>> a = [1, 2, 3]
>>> def f(x):
...     return x ** 2
...
>>> list(map(f, a))
[1, 4, 9]

Here we have a list, a, and a method, f(), and we pass around f as a reference. Specifically, we pass f to the map function, which applies f to every element of a. Thus map is a higher order function - it takes a function as an argument.

To reproduce the same behavior in Java (before Java 8), which only allows us to pass around objects, we would have to define some kind of Function interface. For example, a unary function interface might be represented like so:

public interface UnaryOperator<T> {
    public T apply(T t);
}

and a function that squares integers might be represented like so:

public class SquareFunction implements UnaryOperator<Integer> {
    public Integer apply(Integer n) {
        return n * n;
    }
}

We could write a map function over an array like so:

public class Mapper<T> {
    public static T[] map(UnaryOperator<T> f, T[] arr) {
        T[] ret = Arrays.copyOf(arr, arr.length);
        for (int i = 0; i < ret.length; i++) {
            ret[i] = f.apply(arr[i]);
        }
        return ret;
    }
}

And we could use it like so:

public static void main(String[] args) {
    int[] arr = {1, 2, 3};
    int[] mapped = Mapper.map(new SquareFunction(), arr);
    System.out.println(Arrays.toString(mapped)); // [1, 4, 9]
}

There are some significant downsides to this - for every function you want to use, you have to define a new class, along with writing all super-interfaces for the whole hierarchy. One shortcut you can take is defining an anonymous class, without assigning a name to it, and creating an object of it in one step. You can do so with the following syntax:

public static void main(String[] args) {
    int[] arr = {1, 2, 3};
    int[] mapped = Mapper.map(new UnaryOperator<Integer>() {
        public Integer apply(Integer n) {
            return n * n;
        }
    }, arr);
    System.out.println(Arrays.toString(mapped)); // [1, 4, 9]
}

Application to Comparators

A Comparator<T> is an example of a Java higher-order function for imposing ordering on objects (if they have no natural ordering), and/or overriding their natural ordering as specified by Comparable. Take a look at the Java API on it.

Comparator only requires one method to be implemented - compare(T o1, T o2), which "Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second." Of course, it should behave transitively & symmetrically, and returning zero should also imply that o1.equals(o2).

Then, we can pass in the Comparator object as the second argument to Arrays.sort (the same works for Collections.sort) and sort on whatever we want. For example, if we wanted to sort integers in reverse order:

public static void main(String[] args) {
    int[] arr = {1, 2, 3};
    Arrays.sort(arr, new Comparator<Integer>() {
        public Integer compare(Integer o1, Integer o2) {
            return -(Integer.compare(o1, o2));
        }
    });
    System.out.println(Arrays.toString(arr)); // [3, 2, 1]
}

We can create a new anonymous comparator and reverse the result of the usual integer comparator.

HoFs in Java 8

Java 8 introduces passing functions around as Function objects (almost the same as passing them around by reference) and using lambda functions (anonymous functions). The entire Function hierarchy is implemented in the standard library.

Immediately, when it comes to writing things like comparators, it means we no longer have to do the clunky anonymous class instantiation, or writing of custom HoF classes. This is because a Comparator is a functional interface (that is, it only has one abstract method that must be implemented) and can be replaced by any appropriate lambda function or method reference.

Now we can use Arrays.sort like a true higher order function. Let's revisit the example of sorting integers in reverse order. Now we have two more ways to do it:

public class IntSorter {
    public static int reverseCompare(Integer i1, Integer i2) {
        return -(Integer.compare(i1, i2));
    }

    public static void main(String[] args) {
        int[] arr = {1, 2, 3};
        Arrays.sort(arr, IntSorter::reverseCompare);
        System.out.println(Arrays.toString(arr)); // [3, 2, 1]
    }
}

In the above example, we don't use a lambda statement, but instead use the :: notation to denote the reverseCompare method of the IntSorter class. You can also do this for instance methods (without associating it with an object) or with the instance method of a particular object (like instance::method()). This is the equivalent of passing the method around by reference. Next, we'll try doing the same with a lambda method:

public static void main(String[] args) {
    int[] arr = {1, 2, 3};
    Arrays.sort(arr, (o1, o2) -> -(Integer.compare(o1, o2)));
    System.out.println(Arrays.toString(arr)); // [3, 2, 1]
}

See how much simpler that was? Notice the syntax here - first the arguments to the method are declared, then an arrow, followed by the return value. Of course, longer and more specific lambda statements are possible, using multiple statments and return statments, and using typed parameters, like so:

    Arrays.sort(arr, (Integer o1, Integer o2) -> {
        System.out.println("Do more stuff here");
        return -(Integer.compare(o1, o2)));
    };

but if the typed parameters are not provided, the compiler will infer them automatically by their use.

Exercise: getOrderedBy & Method Generics

Take a look at DBTable.java. It is a simple thin wrapper around a List<T>. You will be adding functionality for a few common operations on a list of things like what we find in databases. In our example, T will always be a User, but your code should generalize to any type.

The method header for this is:

public <R extends Comparable<R>> List<T> getOrderedBy(Function<T, R> getter)

Notice the new syntax elements here. The first is a method generic: public <R extends Comparable<R>> List<T>. The method generic goes after the visibility keyword (if any) and before the return type declaration, and declares a generic type for the method in question only, and can be used throughout the method like with class generics. Additionally, we provide a bound on the generic type R, stating that only types that extend Comparable<R> will be accepted by the compiler (you can also provide other types of bounds, like ? super R, which requires that the type must have R as a subclass).

The second new interface we introduce is the Function. See the Javadoc on the package java.util.Function and the Javadoc on the specific interface we are using, Function<T, R>, which describes a unary function from T to R. Like before, our Function only requires a single method be implemented, R apply(T t), which represents the behavior of the Function itself. Functions also have other methods, which are marked default, that can do other cool features, like composing functions through compose (similar to currying in CS61A) and chaining functions through andThen.

getOrderedBy should take in a method reference to a getter method of the T class, and return a List<T> ordered on the results of that getter method. Example usage:

DBTable t = new DBTable(Arrays.asList(
        new User(2, "daniel", ""),
        new User(4, "matt", ""),
        new User(1, "sarahjkim", ""),
        new User(1, "alanyao", "")
));
List<User> l = t.getOrderedBy(User::getUsername);
System.out.println(l);

Output:
[User{id=1, username='alanyao', email=''}, User{id=2, username='daniel', email=''}, User{id=4, username='matt', email=''}, User{id=1, username='sarahjkim', email=''}]

Implement getOrderedBy according to specification. You will need to make usage of a custom comparator, preferably without even writing Comparator once through the usage of lambda statements. Do not mutate entries.

D. Streams and Data Manipulation

Introduction

A stream is a sequence of elements supporting sequential (and parallel, but left out of this lab) operations on its elements. The syntax looks like this:

List<String> l = Arrays.asList("a", "bd", "bb", "bc", "cc", "d");
l.stream()
    .filter(s -> s.startsWith("b"))
    .map(s -> s + s)
    .skip(1)
    .sorted(String::compareToIgnoreCase)
    .forEach(System.out::println); // returns void

Output:
bbbb
bcbc

Generally, most of the operations are very readable and self-explanatory. If you have seen list comprehension in Python, streams are moderately similar, but implement a strict superset of list comprehension's functionality.

Streams consist of intermediate and terminal operations. Intermediate operations return some kind of stream, so that we can continue to call stream methods on it. filter, map, skip, sorted are all intermediate operations. Terminal operations return a non-stream object or void (like forEach). You can see the full list of available on the official Javadoc.

Notice how many of these operations take in function references as parameters. It is assumed that these functions do not modify the state of the stream (or the structure the stream originates from) and are typically deterministic (non-random).

The stream itself is one-time-use only - that is, after it is terminated, it cannot be used again.

Working with Streams

It takes a bit of a different mindset to write data manipulation operations in the format of stream operations. One intuitive way to think about streams is as a easily manipulable list. Each of the intermediate operations modifies the stream in-place, and returns the result.

You want to separate out each of the steps you need to make into distinct steps, each of which corresponds to an operation on a stream. As we talk abaout terminal and intermediate operations, almost all common operations over lists, collections, and other general data structures can be represented in terms of these.

The most important operations to think in terms of are the ideas of map, filter, and reduce. These three stream operations are the building blocks of much data manipulation, even to a very large scale. We'll discuss these in detail below.

Finally, there's a lot of operations available, some of which are extremely powerful, and it can be hard to know what the syntax is to use them. One trick is to start the stream by typing out .stream(), then type a dot and some prefix and look for, in the auto-complete menu, which operations you want. IntelliJ will tell you what it takes as parameters, what it returns, and you can infer from the name what it does.

Terminal Operations

Terminal operations convert a stream to something that isn't a stream. Often, this is either void or Optional<T>. In case you haven't actually read through the Javadoc on Streams, I still recommend you do so. We'll discuss a just a few of the most common and important terminal operations.

which is not that different from using an enhanced for loop. However, the true strength shines through when some general operation on strings needs to be done often; for example, a common operation can be passed as a method reference:

l.forEach(this::add); // adds everything to this list

as can be seen in the DBTable::add method.

See the Javadoc for more methods and examples, like these use cases below.

We can specify the collection we want to create (in this case an ordered set):

 // Accumulate names into a TreeSet
 TreeSet<String> set = people.stream()
                         .map(Person::getName)
                         .collect(Collectors.toCollection(TreeSet::new));

or just any Set:

 // Accumulate names into a TreeSet
 Set<String> set = people.stream()
                         .map(Person::getName)
                         .collect(Collectors.toSet());

We can do something similar to python's string.join method:

 // Convert elements to strings and concatenate them, separated by commas
 String joined = things.stream()
                       .map(Object::toString)
                       .collect(Collectors.joining(", "));

We can group up things from lists into mappings:

 // Group employees by department
 Map<Department, List<Employee>> byDept
     = employees.stream()
                .collect(Collectors.groupingBy(Employee::getDepartment));

Collectors have many more powerful uses, which we could talk about spanning twice the length of this lab. We've just listed some of the most commonly used above, but take a look at the javadoc for more ideas of how to use them.

A BinaryOperator<T> takes in two parameters of type T and returns one parameter of type T. This is applied to the first two elements of the stream until the stream becomes a single element. This is a crucial building block of data manipulation.

For example, given a stream of users, we can do something funny like finding the user with the highest id:

t.entries.stream()
        .reduce(((user, user2) -> user.getId() > user2.getId() ? user : user2))
        .ifPresent(System.out::println);

which is the equivalent of calling max() with the right comparator, and is another way of implementing max(). We can also aggregate information, like combining all the fields of users into a single mega-user:

User superuser = t.entries.stream()
        .reduce(new User(0, "", ""),
                ((user, user2) -> new User(user.getId() + user2.getId(),
                        user.getUsername() + user2.getUsername(),
                        user.getEmail() + user2.getEmail())));

One thing to note is that the first way returns Optional<T>, since the stream may be empty, but the second returns T since an identity is given. The identity passed in as the first argument behaves like the first element of the stream.

Take a look at the javadoc for Optional. You can think of it as a container that may or may not contain a single element; you can then define actions to take ifPresent, or to even apply filtering on predicates, mapping a function, or specify default values using orElse.

Reduce is quite powerful, and isn't limited to aggregation or min/max. You can be flexible with your reduction operations and combiners, throw away the result of the reduction, perform some intermediate operations not related to the list, and so on.

Intermediate Operations

Intermediate operations modify a stream in-place, allowing you to continue with stream operations. They all return a stream of some type.

For example, if we just wanted to make a new list of party email users, we could do something like this:

party_users = users.stream()
        .map(u -> new User(u.getUsername(), u.getUsername() + "@party.com")
        .collect(Collectors.toList());

Of course, you don't have to limit yourself to returning the same type - you can change your stream from one type to another:

int num_short_usernames = users.stream()
        .map(u -> u.getUsername()) // convert from Stream<User> to Stream<String>
        .filter(name -> name.length() < 5)
        .count();

One limitation of map is that it cannot increase or decrease the size of the stream. However, a more powerful method, flatMap can; it's like a map, but requires a Function<T, Stream<R>> instead as a parameter. After all the map operations are complete, it merges and closes all the streams into a single stream. For example, you could read all the words from a file:

List<String> words = Files.lines(path, StandardCharsets.UTF_8) //stream of lines
                          .flatMap(line -> Stream.of(line.split("\\s+")) //split on whitespace
                          .collect(Collectors.toList());

You can make the predicate check, or compute anything you want, and even interact with other data structures in your program. Filtering is another crucial building block of data manipulation.

Special Streams

Primitives have their own special streams which the programmer has the option of using. The advantage of these streams is they have additional functionality not present in other streams, like so:

Arrays.stream(new int[] {1, 2, 2, 3, 3, 1, 1, 3})
    .map(n -> n * 2)
    .sequential()
    .distinct()
    .average() // returns java.util.Optional
    .ifPresent(System.out::println)

// 4.66666666667

Of course, these streams also support other interesting terminators and intermediates, like max, min, count, sum, and so on. You can see the Javadoc on IntStream for ideas.

These special streams also support conversion:

List<String> l = Arrays.asList("a3", "b1", "b2", "b4", "c2", "3");
System.out.println(l.stream()        // Find the max of the second part of the strings
        .map(s -> s.substring(1))    // Get only the number at the end
        .mapToInt(Integer::parseInt) // convert stream to intstream
        .max()                       // returns java.util.Optional
        .orElse(-1));                // Default result

with arbitrary mapping methods:

users.stream()
        .mapToInt(u -> u.getEmail().length()) // Map from User to Integer
        .average()
        .ifPresent(System.out::println);

and IntStreams can be converted back:

List<String> l = Arrays.asList(1, 2, 3);
System.out.println(l.stream()
        .mapToDouble() // primitives can also be converted to each other
        .boxed() // convert to Double object stream, not DoubleStream
        //do more here

Exercise: getOrderedBy

Rewrite getOrderedBy to only use a single statement (one semicolon only, but can be multiple lines). Utilize streams. Make sure to reference the Java documentation to figure out what steps need to be taken.

Exercise: Fill out DBTable and UserExercises

Fill out the remainder of the DBTable.java and UserExercises, completing each method to specification using only one statement each, utilizing streams to do so. DBTable contains generic operations. UserExercises contains an assortment of very specific queries.

Each of these exercises represents a query someone (like you in the future!) might want to make on a database or list of things. Think functionally by utilizing anonymomus & higher order functions. Remember the ways we can operate on our data, and the flexibility in which operations like map, filter, and reduce can be used. Being able to write functional code (no pun intended) effectively and recognizing when it should be used leads to concise, readable, and efficient code.

E. Conclusion

Streams and higher order functions give a powerful and readable way to work with data (in the form of Collections). They provide other benefits too - efficiency in execution and easy parallelization. With the onset of multi-core and multi-threaded computing, parallel computations are becoming a necessity for handling larger workloads. Streams can be made parallel extremely easily and functional programming, due to it's stateless nature, lends itself very naturally to scaling upwards. However, as parallelism is far outside the scope of this class, we won't explore further in that direction.

Submission

Submit User.java and DBTable.java by pushing to git.