Before we begin, please note that this is a group project. You must work in a group of total size 3 or 4. Your code will not be accepted unless it is turned in with a group.
A. Overview of Gitlet
In this project you'll be implementing a version control system. This version control system mimics some of the basic features of the popular version control system git, but it is smaller and simpler, so we have named it gitlet.
A version control system is essentially a backup system for files on your computer. The main functionality that gitlet supports is:
- Saving backups of files. In gitlet, this is called committing.
- Restoring a backup version of a file. In gitlet, this is called checking out that version of the file.
- Viewing the history of your backups. In gitlet, you view this history in something called the log.
The point of a version control system is to help you when coding complicated projects. You save various versions of your project when you think it's working pretty well. If at some later point in time you accidentally mess up your code, then you can look back over the history of your backups, and choose one that you would like to restore.
In gitlet, you don't just commit individual files at a time. Instead, you can commit a bunch of files at the same time. We like to think of each commit as a snapshot of some part of your project at one point in time. However, for simplicity, many of the examples in the remainder of this document involve just committing one file at a time. Just keep in mind you could add in multiple files to each commit.
In this project, it will be helpful for us to visualize the commits we make over time. Suppose we have a file wug.txt, we add some text to it, and commit it. Then we modify the file and commit these changes. Then we modify the file again, and commit the changes again. Now we have saved three total backup versions of this file, each one further in time than the previous. We can visualize these commits like so:
Here we've drawn an arrow indicating that each commit contains some kind of reference to the commit that came before it. We call the commit that came before it the parent commit — this will be important later. But for now, does this drawing look familiar? That's right; it's a linked list!
The big idea behind gitlet is that we can visualize the history of the different versions of our files in a list like this. Then it's easy for us to restore old versions of files. You can imagine making a command like: "Gitlet, please revert to the state of the files at commit #2", and it would go to the second node in the linked list and restore the copies of files found there.
If we tell gitlet to revert to an old commit, the front of the linked list will no longer reflect the current state of your files, which might be a little misleading. In order to fix this problem, we introduce something called the head pointer. The head pointer keeps track of where in the linked list we're currently "at". Normally, as we make commits, the head pointer will stay at the front of the linked list, indicating that the latest commit reflects the current state of the files:
However, let's say we revert to the state of the files at commit #2 (technically, this is the reset command, which you'll see later in the spec). We move the head pointer back to show this:
All right, now, if this were all gitlet could do, it would be a pretty simple system. But gitlet has one more trick up its sleeve: it doesn't just maintain older and newer versions of files, it can maintain differing versions. Imagine you're coding a project, and you have two ideas about how to proceed: let's call one Plan A, and the other Plan B. Gitlet allows you to save both versions, and switch between them at will. Here's what this might look like, in our pictures:
It's not really a linked list anymore. It's more like a tree. We'll call this thing the commit tree. Keeping with this metaphor, each of the separate versions is called a branch of the tree. You can develop each version separately:
Notice there are two pointers into the tree, representing the furthest point of each branch. At any given time, only one of these is the currently active pointer, and this what's called the head pointer. The head pointer is the pointer at the front of the current branch.
That's it for our brief overview of the gitlet system! Don't worry if you don't fully understand it yet; the section above was just to give you a high level picture of what its meant to do. A detailed spec of what you're supposed to do for this project follows this section.
But a last word here: one feature of the commit tree that it is in some sense immutable: once a commit node has been created, it can never be destroyed (or changed at all). We can only add new things to the commit tree, not modify existing things. This is an important feature of gitlet! Remember, it's a version control system, and one of our goals with it is to allow us to save things so we don't delete them accidentally.
B. Detailed Spec of Behavior
Overall Spec
The only structure requirement we’re giving you is that you have a class named Gitlet and that it has a main method. Here’s your skeleton code for this project:
public class Gitlet {
public static void main(String[] args) {
}
}
You may, of course, write additional Java classes to support your project — in fact, please do. But don’t use any external code (aside from JUnit), and don’t use any programming language other than Java. You can use all of the Java Standard Library that you wish.
The majority of this spec will describe how Gitlet.java
's main method must react when it receives various arguments which correspond to commands to the gitlet system. But before we break down command-by-command, here are some overall guidelines the whole project should satisfy:
- Gitlet should never delete files ever, for any reason whatsoever. There are cases where Gitlet may overwrite files, but it should never delete them.
- In order for gitlet to work, it will need a place to store old copies of files (it is a backup system, after all), and other metadata. All of this stuff must be stored in a folder called
.gitlet
(Note: files with a.
in front are hidden files. You will not be able to see them by default on most operating systems). A gitlet system is considered "initialized" in a particular location if it has a.gitlet
folder there. Most gitlet commands (except for the init command) only need to work when used from a directory where a gitlet system has been initialized — i.e. a directory that has a.gitlet
folder. Note: the files that aren't in your.gitlet
folder (i.e. the current versions of the files, not the backups), are referred to as the files in your working directory. - Most commands have runtime or memory usage requirements. You must follow these. Some of the runtimes are described as constant "relative to any significant measure". The significant measures are: any measure of number or size of files, any measure of number of commits. You can ignore time required to serialize or deserialize, with the one caveat that your serialization time cannot depend in any way on the total size of files that have been added, committed, etc (what is serialization? You'll see later in the spec). You can also pretend that getting from a hash table is constant time.
- Some commands have failure cases with a specified error message. The exact formats of these are specified later in the spec. If your program ever encounters one of these failure cases, it should print the error message and not change anything else. You don't need to handle any other error cases except the ones listed as failure cases.
There are some failure cases you need to handle that don't apply to a particular command. Here they are:
- If a user doesn't input any arguments, print the message:
Please enter a command.
- If a user inputs a command that doesn't exist, please print the message:
No command with that name exists.
- If a user doesn't input any arguments, print the message:
- Some of the commands have their differences from the real git listed. The spec is not exhaustive in listing all differences from git, but it does list some of the bigger or potentially confusing and misleading ones.
- Do NOT print out anything except for what the spec says. Some of our autograder tests will break if you print anything more than necessary.
- The spec classifies some commands as "dangerous". Dangerous commands are ones that potentially overwrite files (that aren’t just metadata) — for example, if a user tells gitlet to restore files to older versions, gitlet may overwrite the current versions of the files. Just fyi.
C. The Commands
init
- Usage:
java Gitlet init
- Description: Creates a new gitlet version control system in the current directory. This system will automatically start with one commit: a commit that contains no files and has the commit message
initial commit
. - Runtime: Should be constant relative to any significant measure.
- Failure cases: If there is already a gitlet version control system in the current directory, it should abort. It should NOT overwrite the existing system with a new one. Should print the error message
A gitlet version control system already exists in the current directory.
- Dangerous?: No
- Our line count: ~20
add
- Usage:
java Gitlet add [file name]
- Description: Adds a copy of the file as it currently exists to the staging area (more on this in the description of the
commit
command). Sometimes, for this reason, adding a file is also just called staging the file. The staging area should be somewhere in.gitlet
. If the file had been marked for untracking (more on this in the description of therm
command), thenadd
just unmarks the file, and does not also add it to the staging area. - Runtime: In the worst case, should run in linear time relative to the size of the file being added.
- Failure cases: If the file does not exist, print the error message
File does not exist.
- Dangerous?: No
- Our line count: ~20
commit
- Usage:
java Gitlet commit [message]
Description: Saves a backup of certain files so they can be restored at a later time. The collection of versions of files in a commit is sometimes called the commit's snapshot of files, and the commit is said to be tracking those versions of files. By default, each commit's snapshot of files will be exactly the same as its parent commit's snapshot of files; it will keep versions of files exactly as they are, and not update them. A commit will only update the version of a file it is tracking if that file had been staged at the time of commit, in which case the commit will now include the version of the file that was staged instead of the version it got from its parent. In addition, a commit will start tracking any files that were staged but weren't tracked by its parent.
The bottom line: By default a commit is the same as its parent. Staged files are the updates to the commit.
Some additional points about commit:
- Files that were marked for untracking lose this mark after a commit. Also, the staging area is cleared after a commit. This (and the unstaging command) are absolutely the only times gitlet should ever delete files (sorry, we lied to you just a little bit above when we said never to delete files). Notice that since the staging area is inside the
.gitlet
folder, gitlet will only ever delete files inside.gitlet
. Gitlet should never, ever, ever delete files outside of.gitlet
. - After the commit command, the new commit is added as a new node in the commit tree.
- The commit just made becomes the "current commit", and the head pointer now points to it. The previous head commit is this commit's parent commit.
- Each commit should have a unique integer id number.
- Each commit should remember what time it was made.
- Each commit has a message associated with it that describes the changes to the files in the commit. This is specified by the user. Note: The entire message should take up only one entry in the array
args
that is passed tomain
. To include multiword messages, you'll have to surround them in quotes.
- Files that were marked for untracking lose this mark after a commit. Also, the staging area is cleared after a commit. This (and the unstaging command) are absolutely the only times gitlet should ever delete files (sorry, we lied to you just a little bit above when we said never to delete files). Notice that since the staging area is inside the
- Runtime: Runtime should be constant with respect to any measure of number of commits. Runtime must be no worse than linear with respect to the total size of files the commit is tracking. Additionally, this command has a memory requirement: Committing must increase the size of the
.gitlet
folder by no more than the total size of the staged files at the time of commit, not including additional metadata. This means don’t store redundant copies of versions of files that a commit receives from its parent. One more note: You are allowed to save whole additional copies of files; don't worry about only saving diffs, or anything like that. - Failure cases: If no files have been staged (or marked for untracking: more on that next), aborts. Print the error message
No changes added to the commit.
Also, every commit must have a non-blank message. If it doesn't, print the error messagePlease enter a commit message.
Also, by the way, if there is a file marked for untracking that isn't in the current commit, this is NOT a failure case. Just ignore it and continue with the commit (this can only happen if you callrm
and then move the head pointer before committing). - Dangerous?: No
- Differences from real git: In real git, commits are not associated with any old id number, but a special kind of hash code. Using a hash is actually more powerful than using an arbitrary id number; can you think of why?
- Our line count: ~40
Here's a picture of before-and-after commit:
rm
- Usage:
java Gitlet rm [file name]
- Description: Mark the file for untracking; this means it will not be included in the upcoming commit, even if it was tracked by that commit's parent. If the file had been staged, instead just unstage it, and don't also mark it for untracking.
- Runtime: Should run in constant time relative to any significant measure.
- Failure cases: If the file is neither staged nor tracked by the head commit, print the error message
No reason to remove the file.
- Dangerous?: No
- Differences from real git: Be aware the the real git
rm
command works differently: it will actually delete the file! This command is more similar togit rm --cached [file name]
. - Our line count: ~5
log
- Usage:
java Gitlet log
Description: Starting at the current head commit, display information about each commit backwards along the commit tree until the initial commit. This set of commit nodes is called the commit’s history. For every node in this history, the information it should display is the commit id, the time the commit was made, and the commit message. Here is an example of the exact format it should follow:
=== Commit 2 2015-03-14 11:59:26 A commit message. === Commit 1 2015-03-14 11:49:29 Another commit message. === Commit 0 2015-03-14 11:39:26 initial commit
Notice there is a ===
separating each commit. There is also an empty line between each commit. Also notice that commits are displayed with the most recent at the top. By the way, there's a class in the Java standard library that will help you format the dates really easily. Look into that instead of trying to construct it manually yourself!
- Runtime: Should be linear with respect to the number of nodes in head’s history.
- Failure cases: None
- Dangerous?: No
- Our line count: ~5
Here's a picture of the history of a particular commit. If the current branch's head pointer happened to be pointing to that commit, log would print out information about the circled commits:
Note that it ignores other branches and the future. Now that we have the concept of history, let's refine what we said earlier about the commit tree being immutable. It is immutable precisely in the sense that the history of a commit with a particular id may never change, ever. If you think of the commit tree as nothing more than a collection of histories, then what we're really saying is that each history is immutable.
global-log
- Usage:
java Gitlet global-log
- Description: Like log, except displays information about all commits ever made. The order of the commits does not matter.
- Runtime: Linear with respect to the number of commits ever made.
- Failure cases: None
- Dangerous?: No
- Our line count: ~5
find
- Usage:
java Gitlet find [commit message]
- Description: Prints out the id of the commit that has the given commit message. If there are multiple such commits, it prints the ids out on separate lines.
- Runtime: Should be linear relative to the number of commits that have the given message.
- Failure cases: If no such commit exists, prints the error message
Found no commit with that message.
- Dangerous?: No
- Differences from real git: Doesn't exist in real git. Similar effects can be achieved by grepping the output of log.
- Our line count: ~5
status
- Usage:
java Gitlet status
Description: Displays what branches currently exist, and marks the current branch with a
*
. Also displays what files have been staged or marked for untracking. An example of the exact format it should follow is as follows.=== Branches === *master other-branch === Staged Files === wug.txt some_folder/wugs.txt === Files Marked for Untracking === goodbye.txt
Notice there is an empty line between each section. The order of branches/files within each section does not matter.
- Runtime: Make sure this is linear relative to the number of files that have been staged or marked for untracking and the number of branches that exist.
- Failure cases: None
- Dangerous?: No
- Our line count: ~15
checkout
Checkout is a kind of general command that can do a few different things depending on what its arguments are. There are 3 possible use cases. In each section below, you'll see 3 bullet points. Each corresponds to the respective usage of checkout.
Usages:
java Gitlet checkout [file name]
java Gitlet checkout [commit id] [file name]
java Gitlet checkout [branch name]
Descriptions:
- Takes the version of the file as it exists in the head commit, the front of the current branch, and puts it in the working directory, overwriting the version of the file that's already there if there is one.
- Takes the version of the file as it exists in the commit with the given id, and puts it in the working directory, overwriting the version of the file that's already there if there is one.
- Takes all files in the commit at the head of the given branch, and puts them in the working directory, ovewriting the versions of the files that are already there if they exist. Also, at the end of this command, the given branch will now be considered the current branch.
Runtimes:
- Should be linear relative to the size of the file being checked out.
- Should be linear relative to the size of the file being checked out.
- Should be linear with respect to the total size of the files in the commit's snapshot. Should be constant with respect to any measure involving number of commits. Should be constant with respect to the number of branches.
Failure cases:
- If the file does not exist in the previous commit, aborts, printing the error message
File does not exist in the most recent commit, or no such branch exists.
- If no commit with the given id exists, print
No commit with that id exists.
Else, if the file does not exist in the given commit, printFile does not exist in that commit.
- If no branch with that name exists, print
File does not exist in the most recent commit, or no such branch exists.
If that branch is the current branch, printNo need to checkout the current branch.
- If the file does not exist in the previous commit, aborts, printing the error message
In addition, you might wonder: what happens if you have a file name that's the same as a branch name? In this case, let the branch name take precedence.
- Dangerous?: Yes!
Our line counts:
- ~20
- ~20
- ~20
branch
- Usage:
java Gitlet branch [branch name]
- Description: Creates a new branch with the given name, and points it at the current head node. A branch is nothing more than a name for a pointer to a commit node into the commit tree. Before you ever call branch, your code should be running with a default branch called "master". Note: Does NOT immediately switch to the newly created branch.
- Runtime: Should be constant relative to any significant measure.
- Failure cases: If a branch with the given name already exists, print the error message
A branch with that name already exists.
- Dangerous?: No
- Our line count: ~5
All right, let's see what branch does in detail. Suppose our state looks like this:
Now we call java Gitlet branch cool-beans
. Then we get this:
Hmm... nothing much happened. Let's switch to the branch with java Gitlet checkout cool-beans
:
Nothing much happened again?! Okay, say we make a commit now. Modify some files, then java Gitlet add...
then java Gitlet commit...
.
I was told there would be branching. But all I see is a straight line. What's going on? Maybe I should go back to my other branch with java Gitlet checkout master
:
Now I make a commit...
Phew! So that's the whole idea of branching. Did you catch what's going on? All creating a branch does is give us a new pointer. At any given time, one of these pointers is considered the currently active pointer, or the head pointer (indicated by *). We can switch the currently active head pointer with checkout [branch name]
. Whenever we commit, it means we add a new commit in front of the currently active head pointer, even if one is already there. This naturally creates branching behavior.
Make sure that the behavior of your branch
, checkout
, and commit
match what we've described above. This is pretty core functionality of gitlet that many other commands will depend upon. If any of this core functionality is broken, very many of our autograder tests won't work!
rm-branch
- Usage:
java Gitlet rm-branch [branch name]
- Description: Deletes the branch with the given name. This only means to delete the pointer associated with the branch; it does not mean to delete all commits that were created under the branch, or anything like that.
- Runtime: Should be constant relative to any significant measure.
- Failure cases: If a branch with the given name does not exist, aborts. Print the error message
A branch with that name does not exist.
If you try to remove the branch you're currently on, aborts, printing the error messageCannot remove the current branch.
- Dangerous?: No
- Our line count: ~5
reset
- Usage:
java Gitlet reset [commit id]
- Description: Checks out all the files tracked by the given commit. Also moves the current branch's head to that commit node. See the intro for an example of what happens to the head pointer after using reset.
- Runtime: Should be linear with respect to the total size of files tracked by the given commit’s snapshot. Should be constant with respect to any measure involving number of commits.
- Failure case: If no commit with the given id exists, print
No commit with that id exists.
- Dangerous?: Yes!
- Differences from real git: Just note that this command is closest to using the
--hard
option, i.e.git reset --hard [commit hash]
. - Our line count: ~10
merge
- Usage:
java Gitlet merge [branch name]
Description: Merges files from the given branch into the current branch. This method is a bit complicated, so here’s a more detailed description:
- First consider what might be called the split point of the current branch and the given branch. This is their earliest common ancestor in the commit tree.
- Any files that have been modified in the given branch since the split point, but not modified in the current branch since the split point should be changed to their versions in the given branch (checked out from the commit at the front of the given branch). (Also, this does not apply if the modification is a removal, because you should never delete files from the working directory.) These files should then all be automatically staged (or marked for untracking, if the modification was a removal). To clarify, if a file is "modified in the given branch since the split point" this means the version of the file as it exists in the commit at the front of the given branch has different content than the version of the file at the split point.
- Any files that have been modified in the current branch but not in the given branch since the split point should stay as they are.
- For files that have been modified in both branches since the split point, the files should stay as they are. However, in addition, the version of the file from the given branch should be copied into the working directory with the name [old file name].conflicted.
- Special case: If a file has been untracked between commits in a branch, this is considered a "modification" to the file. However, if it looks like you would get a conflict from this case, simply ignore the conflict, because there is nothing to conflict with.
Once files have been updated according to the above, there are two possibilities:
- If
merge
did not just generate any .conflicted files, then merge should automatically commit with a messageMerged [current branch name] with [given branch name].
. - If
merge
generated at least one .conflicted file, then merge should not automatically make a commit. Instead, it should print the messageEncountered a merge conflict.
, and then put gitlet into a conflicted state. During a conflicted state, some commands are not allowed. The only commands allowed areadd
,rm
,commit
,checkout [file]
,checkout [id] [file]
,log
,global-log
,find
, andstatus
. Conflicted state ends once the user commits. The point of the conflicted state is that the user is supposed to decide which of the two conflicting files they want, stage it if necessary, delete the other one, and then commit, completing the merge. That's what's supposed to happen, but you should just end the conflicting state as soon as the user commits, regardless of what they do. If a user tries to do a command that's not allowed while in a conflicted state, print the error messageCannot do this command until the merge conflict has been resolved.
- If
- Runtime: Should be linear in terms of the lengths of the history of each branch. Should also be linear in terms of the total size of new files added in commits in each branch.
- Failure cases: If a branch with the given name does not exist, print the error message
A branch with that name does not exist.
If attempting to merge a branch with itself, print the error messageCannot merge a branch with itself.
If merge would generate an error because the commit that it does has no changes in it, just let the normal commit error message for this go through. - Dangerous?: Yes!
Differences from real git: There are a few. For one, in git, the new commit at the end of merge is special, because it maintains two back pointers remembering which two branches it came from. But gitlet only needs to maintain one normal back pointer on the current branch.
Furthermore, the real git handles merge conflicts differently than gitlet. The real git will splice the two conflicted files together into a single file, then ask the user to pick and choose the correct sections manually. Gitlet does not do this, instead just adding in the .conflicted copy.
Of final note, git will force the user to resolve the merge conflicts before committing to complete the merge. Gitlet just allows the user to commit to complete the merge whenever they want.
- Our line count: ~70
rebase
- Usage:
java Gitlet rebase [branch name]
Description: Conceptually, what rebase does is find the split point of the current branch and the given branch, then snaps off the current branch at this point, then reattaches the current branch to the head of the given branch. Say we are on branch
branch
and we make the calljava Gitlet rebase master
: Now, this may prompt two questions:- Why would you ever want to do this? You can think of it as an alternative to merge, where instead of having two branches that come together, you just pretend as if one of the branches came after the other one. If you use it smartly, this can create a cleaner history than merge.
- Doesn't this ruin what you said about the commit tree being immutable? Yes, it does! That's because we just lied to you in the picture above. In fact, rebase does not break off the current branch. Instead, it leaves the current branch there, but makes a copy of the current branch on top of the given branch (this is called replaying the branch). Then it moves the branch pointer to point to this copy, so that you can pretend you moved it. Here's the real picture: Note: the replayed commits should have new ids, not copies of the original ids. This allows you to still access the original commits using their old ids, if you really wanted to. In addition, the replayed commits should have new time stamps, allowing you to distinguish them from the originals in
global-log
.
Rebase has one special case to look out for. If the current branch pointer is in the history of the given branch, rebase just moves the current branch to point to the same commit that the given branch points to. No commits are replayed in this case.
There's one more point to make about rebase: If the commit at the front of the given branch has files that have been modified since the split point, these these changes should propagate through the replay. This means, essentially, that the versions of the files in the given branch should take the place of their counterparts in the replayed commits, up until one of the replayed commits has a version of the file that had also been modified since the split point. In this case, what you might expect to happen is that you would get conflicted files, much like merge. However, for simplicity, we're not going to have you deal with conflicts in rebase: in this case, just keep the current branch's copies of the files. The bottom line: A file from the given branch stops propagating through once it meets a modified file in the replayed branch.
Finally, after successfully replaying nodes, reset to the node at the front of the replayed branch.
By the way, if there are multiple branches after the split point, you should NOT replay the other branches. For example, say we are on branch branch1
and we make the call java Gitlet rebase master
:
- Runtime: Should be linear relative to the history of the current brach and the given branch. Should also be linear in terms of the number of files added to both branches. Should also be linear relative to the total size of files added to the given branch. Also, be aware that rebase should not need to make any additional backup copies of files.
- Failure cases: If a branch with the given name does not exist, print the error message
A branch with that name does not exist.
If the given branch name is the same as the current branch name, print the error messageCannot rebase a branch onto itself.
If the input branch's head is in the history of the current branch's head, print the error messageAlready up-to-date.
- Dangerous?: Yes.
- Differences from the real git: The real git's rebase is a complicated and many-flagged command. Gitlet's rebase really only gets at the core idea. In particular, note that the way it handles conflicts is much different! For instance, the real rebase will pause when it encounters a conflict, make the user fix it, and then continue on after.
- Our line count: ~70
D. Miscellaneous Things to Know about the Project
Phew! That was a lot of commands to go over just now. But don't worry, not all commands are created equal. You can see for each command the approximate number of lines we took to do each part (note that this only counts code specific to that command — it doesn't double-count code reused in multiple commands). You shouldn't worry about matching our solution exactly, but hopefully it gives you an idea about the relative time consumed by each command. Notice that merge and rebase are lengthier commands than the others, so don't leave them for the last minute!
Anyway, by now this spec has given you enough information to get working on the project. But to help you out some more, there are a couple of things you should be aware of:
- This project requires reading and writing of files. In order to do these operations, you might find the classes
java.io.File
andjava.nio.file.Files
helpful. Actually, you may find various things in thejava.io
andjava.nio
packages helpful. Feel free to explore other stuff in the Java standard library. If you do a little digging through it, you might find a couple of methods that will make the io portion of this project much easier! One warning: If you find yourself using readers, writers, scanners, or streams, you're making things more complicated than need be. You need to use streams once to serialize and once to deserialize (more on this below), but otherwise you shouldn't need these types of objects at all. If you think about gitlet, you'll notice that you can only run one command every time you run the program. In order to successfully complete your version control system, you'll need to remember the commit tree across commands. This means you'll need to maintain some state that carries across multiple runs of your program. If you sit and think about this, you may realize you haven't ever had to do this so far in the class. It would be helpful if you could save objects created one time you run the program for use the next time you run the program, but so far, objects have always been destroyed when the program finishes.
The strategy we recommend for dealing with this is to write your objects to a file before ending the program. Then, next time you start the program, you first read back in the state of the objects from the file. Luckily, this is very easy to do this in Java. Look into the
java.io.Serializable
interface!A note: serializing and deserializing takes time proportional to the number of objects you are serializing or deserializing. However, in order to reduce the difficulty of the serialization (it's not the main focus of the project), you are allowed to ignore serialization time when thinking about the runtime of your commands. With one caveat: serialization time cannot depend in any way on the total size of files backed up by gitlet (this means, essentially, do not serialize the contents of files).
E. Style
A portion of your grade for this project will be based on the style of your code. Here are the requirements.
All methods must have a Javadoc comment. A Javadoc comment is a comment at the top of the method that starts with
/**
(instead of just/*
). The comment describes what the method does, as well as describes the return value, parameters, and thrown exceptions of the method. It uses a specific@
syntax to communicate this information. Below is an example of a Javadoc comment for a method (the method is complete nonsense. It's just to show you an example of the structure of Javadoc)./** * Reveals the secrets of the human mind. * * @param berko * a string that will be used for very important things * @param gleason * a list that will be used for not very important things * @return true if the sentence below is a lie * @throws TooManyListenersException * if the sentence above is not a lie */ private boolean jeansMethod(String berko, List gleason) throws TooManyListenersException { return false == false; }
Hint: If you start typing
/**
at the top of a method and then hit enter, Eclipse will automatically create a skeleton of the Javadoc for you.- No method may be longer than 60 lines (and most should get nowhere near this big). Break them up into helper methods!
- All code must follow Eclipse's standard indentation conventions (essentially the same as the standard for all Java). This is easy to enforce using Eclipse. Just hit ctrl + shift + f (or command + shift + f), and Eclipse will auto-format your code for you (You can even set this up so that Eclipse will auto-format your code every time you save).
F. Testing
So far in this course we've introduced you to unit testing. These tests test small parts of your code in isolation, such as individual methods. Unit tests, however, may not be sufficient for verifying the correctness of large and complicated systems. Another type of test, end-to-end tests, may be helpful. End-to-end tests test higher level behavior of a system. They don't just test individual methods; they test whether the complete system provides its expected functionality as the user would experience it.
For this project, you'll be required to submit both some unit tests and some end-to-end tests. Please at least provide the following end-to-end tests:
- Test that a file that has been committed at some point can be restored by checking it out from the commit it was committed at.
- Test that a file that has been committed at some point can be restored by checking it out from a commit that tracks that version of the file, even if the file wasn't staged prior to that commit.
- Test that you can
checkout [id] [file]
from an arbitrary commit in the graph (even one in another branch). - Test that resetting backward appropriately changes the output of log.
- Test that log adjusts appropriately when switching from one branch to another.
- Test that merge will generate a .conflicted file if a file has been modified in both branches since the split point.
- Test that merge will commit with files from the other branch if those files had been modified in the other branch but not in the current branch since the split point.
- Test that the output of log after a rebase includes the commit messages from both branches involved in the rebase.
- Test that changes in the base branch propagate through the replayed branch during a rebase.
These are just basic ones we're requiring you to have to get you started. You should of course do more of your own testing to ensure your code is fully correct! You should also provide unit tests where you feel they are more appropriate. Although unit tests are not strictly required, we may use them to give partial credit in case your project catastrophically fails in the autograder.
You can write unit tests just as you always have done in this course: using JUnit. But how do you write end-to-end tests? The point of the end-to-end tests is that they should simulate how the user would interact with gitlet, that is, using the terminal. The most straightforward way to write end-to-end tests would be to use shell scripting — this is essentially just automating terminal commands. However, since you're not required to be familiar with shell scripting for this course, we recommend you to just use JUnit even for your end-to-end testing, although it is a little awkward. JUnit, as its name suggests, was really designed for unit testing.
To help you over this awkwardness, we've written some example end-to-end tests you can look at in GitletTest.java. We recommend you pattern your own after these. Notice they use a helper method called gitlet
, which calls your code by executing a terminal command that calls Java. For end-to-end tests, this is the only way you should interact with your code; you shouldn't call your methods directly.
By the way, you should also try running your code from the command line and use it just like git! Don't only test with JUnit. In addition, if you're using Windows, be sure to test out your code and tests on a unix/linux/mac machine, such as the lab computers. You want to make sure that your code does not only work in a Windows environment, since our autograders will be run in linux.
G. Submission, Grading, and Checkpoints
Submit your project by putting it in a directory called proj2
on the lab machines and typing submit proj2
. In your submission, you should include Gitlet.java
, other .java files you wrote to complete the project, and your unit and end-to-end tests.
About grading: because we gave you complete freedom to organize your code how you want, we can't unit test your code, because we don't know exactly what methods you used. We can only end-to-end test your code. What this means is that we have to test sequences of commands together. Be aware that if any of the core functionality is broken (namely add
, commit
, checkout
, or log
), then many of our tests may break, and you will end up with little points. Make sure they work exactly as the spec describes! Although the fringe functionality is more difficult and time consuming to write (like merge
and rebase
), fewer tests depend on these methods, so they won't impact your grade as much.
In addition, this project has two checkpoints due along the way so we can ascertain you're on the right track. These will be worth a small portion of your final project grade.
For the first checkpoint, you'll have to submit a design document for your project. In the design document, describe all classes you plan to have, and all of their instance variables and methods. In addition, describe the structure of your .gitlet folder — what kinds of things will it contain, and how are they organized? It's okay if you change all of this stuff later — we just want to see that your group has thought through the whole project by this point. Bring a paper copy of the design document to lab.
At the checkpoint, expect your TA to randomly quiz individual group members on how the project works. Your whole group's grade will depend on how each member does, so make sure that every group member understands the whole project.
For the second checkpoint, you must be able to demonstrate for your TA that all of the commands except possibly merge and rebase basically work (this is easiest to do if you have tests that demonstrate this). At the second checkpoint, your TA will again randomly quiz different group members on the project's functionality. Everyone should understand how different parts of the project work, even if they didn't code it themselves.
H. Extra Credit: Going Remote
This project is all about mimicking git’s local features. These are useful because they allow you to backup your own files and maintain multiple versions of them. However, git’s true power is really in its remote features, allowing collaboration with other people over the internet. The point is that both you and your friend could be collaborating on a single code base. If you make changes to the files, you can send them to your friend, and vice versa. And you'll both have access to a shared history of all the changes either of you have made.
This project’s extra point is to implement some basic remote commands: namely add-remote
, rm-remote
, push
, fetch
, pull
, and clone
. You will get 2 extra credits point for completing them. This extra credit will be significantly more challenging than the rest of the project: don't attempt or plan for it until you have completed the rest of the project. In fact, we don't even recommend reading the remainder of the spec until you've completed the rest of the project.
We should also mention we think that 2 extra credits point is clearly not worth the amount of effort it takes to do this section. The extra credit is meant as a fun little additional challenge for students who might be bored with the class so far. We think this extra credit portion is super awesome, but we're not expecting everyone to do it. We should also say that there will be significantly less support from the staff to help you complete the extra credit. If you're doing the extra credit, we expect you to be able to stand on your own a little bit more than most students.
And one more note: You cannot use slip days to complete the extra credit. If you take a slip day, you will automatically get 0 on the extra credit.
Setting Up scp
Okay, have you finished the rest of the project now? Let's go on then. But before describing how the specific remote commands work, let’s first go over the basics of how the commands can interact with the internet.
All of the remote commands should work off scp (or pretend like they do). scp is a terminal command that allows you to copy files back and forth between computers across the internet. Your remote gitlet commands should work on anything that you have scp access to. That means, before beginning coding this part of the project, you should check to make sure you can use scp from the terminal. Note that scp is a unix command; Windows users should probably just use the lab computers for this (it is possible to get this set up on Windows, but it's a bit complicated and probably not worth your time unless you're familiar with this sort of thing).
Anyway, you should have scp access to your user account on the berkeley lab computers. For example, try out a command like:
scp [some file] cs61bl-[xx]@torus.cs.berkeley.edu:[some other file]
Where [xx] is your login, [some file] is the name of a file on your local computer, and [some other file] is what you want the file to be named when you copy it onto the lab computers. torus.cs.berkeley.edu is the name of a computer on campus; you can use other ones too, such as cory.eecs.berkeley.edu.
After you do the scp command, you’ll be prompted for your password to log-in to your account on the lab computer.
Unfortunately, it just won’t do to have to enter your password every time you run gitlet’s remote commands. So, we’re actually going to need to take advantage of scp’s password-less login features.
So let’s revise what we said earlier to: Your remote gitlet commands should work on anything that you have password-less scp access to.
In order to get yourself password-less login to stuff over scp, you’ll want to set up an ssh key.
You can look up guides for setting up password-less ssh online. For example, this guide on github has some instructions on creating an ssh key. Only steps 1 - 3 will be relevant to you, though, because you don't want to add the ssh key to your github account; you want to add your ssh key to your user account on the berkeley lab computers. For instructions specifically about the lab computers, you might want to check out inst's help page here (see the sections SSH Public and Private Keys (passphrases) and Password-less Logins (OpenSSH)), though the instructions aren't as clear as github's are.
You can look up other resources too, if these aren't good enough for you. Keep in mind that setting up scp is not supposed to be the difficult part of this extra credit! If you get stuck, ask questions.
Note: the simplest way to get Java to transfer files over scp is probably just to make Java call terminal commands; though there are more legit ways using scp in Java, you’re not required to use them. You can try if you want, but don't feel compelled. That said, please do not just make Java directly call terminal commands in the other portions of the project; take advantage of the file system abstractions that Java offers.
The Commands
All right, now that you've gotten scp working, onto the rest of the project!
A few notes about the remote commands:
- Runtime will not be graded. For your own edification, please don't do anything ridiculous, though.
- All the commands are significantly simplified from their git equivalents, so specific differences from git are usually not notated. Be aware they are there, however.
- Remote commands might fail for internet connectivity issues or if the scp user name and server aren't legit. It doesn't matter what happens in these cases.
- During any commands, it doesn't matter what happens to the working directory on the remote machine at all. The remote machine is only responsible for saving gitlet state and backups.
So now let's go over the commands:
add-remote
- Usage:
java Gitlet add-remote [remote name] [user name on server] [server] [location on server of .gitlet]
- Description: Saves the given login information under the given remote name. Attempts to push or pull from the given remote name will then attempt to use this scp login information and go to the given location to look for a .gitlet folder.
- Failure cases: If a remote with the given name already exists, print the error message:
A remote with that name already exists.
Note that you don't have to check if the user name and server information are legit. - Dangerous?: No.
rm-remote
- Usage:
java Gitlet rm-remote [remote name]
- Description: Remove information associated with the given remote name. The idea here is that if you ever wanted to change a remote that you added, you would have to first remove it and then re-add it.
- Failure cases: If the given remote name has not been added, print
A remote with that name does not exist.
If a remote with the given name does not exist, print the error message:A remote with that name does not exist.
- Dangerous?: No.
push
- Usage:
java Gitlet push [remote name] [remote branch name]
Description: Attempts to append the current branch's commits to the end of the given branch at the given remote. Details:
This command only works if the remote branch's head is in the history of the current local head, which means that the local branch contains some commits in the future of the remote branch. In this case, append the future commits to the remote branch. Then, the remote should reset to the front of the appended commits (so it's head will be the same as the local head). This is called fast-forwarding.
If the gitlet system on the remote machine exists but does not have the input branch, then simply add the branch to the remote gitlet.
There is one additional use case of push: If there is no gitlet system currently on the remote, push will actually initialize it there. Just copy over the entire gitlet state to the remote machine. Ignore the given branch name in this case.
- Failure cases: If the remote branch's head is not in the history of the current local head, print the error message
Please pull down remote changes before pushing.
- Dangerous?: No.
fetch
- Usage:
java Gitlet fetch [remote name] [remote branch name]
Description: Brings down commits from the remote gitlet into the local gitlet. There are two cases:
- The remote branch head is in the history of the local head. Then there is nothing to do. Please print the error message
Already up-to-date.
- There is some node in the history of the local head (possibly the local head itself) that is the latest commit in the history of the remote head. Call this node the latest common commit. Gitlet should copy the remote commits down as a new branch off of the latest common commit, and give them the branch name
remote/[remote branch name]
. Overwrite this branch (move the pointer) if it already exists.
- The remote branch head is in the history of the local head. Then there is nothing to do. Please print the error message
- Failure cases: If the remote gitlet does not have the given branch name, print the error message
That remote does not have that branch.
- Dangerous? No
pull
- Usage:
java Gitlet pull [remote name] [remote branch name]
- Description: Fetches, then merges the newly created branch with the current local branch.
- Failure cases: Just the failure cases of
fetch
andmerge
together. - Dangerous? Yes!
clone
- Usage:
java Gitlet clone [remote name] [user name on server] [server] [directory on server .gitlet is in]
- Description: This is the only command other than
init
which does not require you to be in a folder with a .gitlet in order to use. This command just copies the .gitlet directory at the remote's address into a new folder named the same as the remote name. Also takes the snapshot of files in the head's commit and puts them in the folder. - Failure cases: None. (No need to check if the information provided is legit. You can just crash in this case.)
- Dangerous?: No, unless there is already a folder on your computer named the same as the remote name. Don’t worry about this.
More About Remotes
There are two final notes to make about the remote commands.
If you think about the remote commands hard enough, you'll realize that using an arbitrary id scheme won't work in the hypothetical scenario where you're collaborating with a friend using a remote. If you and your friend make different commits on each of your local machines that end up with the same id even though they're different commits, then the remote commands will break.
One way real git solves this problem is that each commit's id is generated from its contents and metadata. The id number is not simply an arbitrary id number, but a hash. So, if you and your friend make different commits, they have to end up with different ids.
So to do the remote commands, please change your arbitrary id scheme to a hashing id scheme. The hashes should be determined at least from the commit's message, time, and hash of its parent (if it has one). This means it's okay to get collisions in a scenario where you have two commits that have the same message, time, and history. However, it is imperative you don't get collisions otherwise. To ensure this, look into using an existing hashing algorithm, rather than writing your own. Also, to help you avoid collisions, commit ids can be long hex numbers instead of just integers. Look into java standard library methods for hashing strings.
- In order to be able to push after completing a pull, you have to be able to tell whether you've already pulled down everything there is to pull down. However, the pull scheme uses merge, and because merge only maintains one back pointer, the current branch won't necessarily be able to tell that the remote commit is in its history. You'll have to change that. Modify merge so that it adds multiple back pointers to the commit following a merge (The additional histories won't appear in the log, however). Notice this has two consequences. One, the definition of one node being in another's history has changed: being in the history of a commit means being anywhere at all behind it, which can now branch out. Two, during a fetch, there may actually be mutiple latest common nodes. You'll have to attach the newly pulled down commits at all of the appropriate places.
I. Acknowledgements
Thanks to Alicia Luengo, Josh Hug, Sarah Kim, Austin Chen, Andrew Huang, Yan Zhao, Matthew Chow, especially Alan Yao, Daniel Nguyen, and Armani Ferrante for providing feedback on this project. Thanks to git for being awesome.
This project was largely inspired by this excellent article by Philip Nilsson.
This project was created by Joseph Moghadam.