Source Control

From Coder Merlin
Within these castle walls be forged Mavens of Computer Science ...
— Merlin, The Coder
Undo-redo

Prerequisites[edit]

Background[edit]

Introduction[edit]

Source control enables us to track and manage changes to our code. This functionality becomes increasingly critical as the size of our projects grow both in terms of the lines of code and the number of coders. Source control shows us who changed the code and when. We're able to compare one revision of code to another. And, when necessary, we can rollback changes to a previous revision. Source control can be a tremendous help to you (and your team) when you want to easily recover from accidentally damaging a project and as such, provides you with the freedom to experiment without fear. However, you must use the source control system regularly and often or it won't be able to help you. In this experience we'll configure our source control system, learn (a bit) about how to use it, and then place our journals under source control.

Git[edit]

There are many options to choose from when selecting a source control system. We'll be using one called git, created by Linus Torvalds in 2005. It's a distributed version-control system, meaning that every Git directory on every computer is a repository with a complete history and full version-tracking abilities.

Let's say you're a graphic artist, specializing in photo restoration, and working for the director of a museum. You've received an old photo that you're responsible to restore. Every step requires a lot of work and, being human, sometimes you make a mistake. Even if you didn't make a mistake your director might not agree with the choices that you've made. You (smartly) decide that just like every other project you've done, you'll track every version of your file in git.

Let's consider your progress through this process: Alan-Turing-Enhancement.png

You commit each version of your project into git. Git considers each commit to be a snapshot of the current project. Each snapshot includes every current version of every file in your project that you've added to git, so it becomes a simple matter to move back in time to any version. While you're able to attach comments to each commit, internally, git uses a something called an SHA-1 hash to uniquely identify each commit. The hash is a 40-character string generated from the content of the files and directory structure. The hash will look something like this: 24b9da6552252987aa493b52f8696cd6d3b00373. You'll see this type of string in the git log.

Hint.pngHelpful Hint
The SHA-1 hash is a bit long and unwieldy to type. In most cases git will accept the first few characters of a hash so you don't need to type the whole thing.

Configuration[edit]

As a first step, we'll need to let Git know about our name and email address. Be sure to change your name and email address appropriately. You'll need to be able to receive email at the address you specify in order to complete the setup process.

jane-williams@codermerlin:~$ git config --global user.email "jane@williams.org"

jane-williams@codermerlin:~$ git config --global user.name "Jane Williams"

Initialization[edit]

Let's setup a new directory for all of our experiences and within, a directory for this experience:

jane-williams@codermerlin:~$ mkdir Experiences

jane-williams@codermerlin:~$ cd Experiences

jane-williams@codermerlin:~/Experiences$ mkdir W1006

jane-williams@codermerlin:~/Experiences$ cd W1006

jane-williams@codermerlin:~/Experiences/W1006$ 

One can consider the structure of a project, consisting of directories, files, and the associated content, as two dimensional. The repository can then be considered a three-dimensional structure formed by storing every committed version of the project structure across time. In order to initialize the repository, we issue the init command in the root of our project:

jane-williams@codermerlin:~/Experiences/W1006$ git init

Initialized empty Git repository in /home/jane-williams/Experiences/W1006/.git/ 

Note that this repository exists locally, alongside your other files for the project. There is no central server repository.

Hint.pngHelpful Hint
Even though we're using git init on an empty directory, there's no requirement that we do so. It's perfectly fine to initialize git in a directory in which we've already begun work.

Add a File and Check Status[edit]

Let's create a small file that we can add to our project:

jane-williams@codermerlin:~/Experiences/W1006$ echo "This file isn't empty." > file1.txt

Let's find out what Git knows about this file:

jane-williams@codermerlin:~/Experiences/W1006$ git status

...

Untracked files:

  (use "git add <file>..." to include in what will be committed)



  file1.txt

...

Note that "file1.txt" is displayed in red under the title "Untracked files". Git is telling us that this file isn't currently being tracked. If we want to track it we'll need to tell Git to do so:

jane-williams@codermerlin:~/Experiences/W1006$ git add file1.txt

Let's check the status now:

jane-williams@codermerlin:~/Experiences/W1006$ git status

...

Changes to be committed:

  (use "git rm --cached <file>..." to unstage)



  new file: file1.txt

...

Note that "file1.txt" is displayed in green under the title "Changes to be committed". Git is telling us that this file will be included in the next commit.

Going DeeperGoingDeeperIcon.png

A git repository contains a set of commit objects, where each commit object in itself contains:

  • the set of files representing a project at the instant of a particular commit
  • references to parent commit objects
  • a string of characters that uniquely identifies that particular commit object

The very first commit won't have any parents.

Each git repository is essentially a directed graph of commit objects.

The previous status output mentioned a command to be used to unstage. What's a stage?

Git Sections[edit]

There are three states that a file in your project could be in:

  • modified - a tracked file has been modified, but hasn't yet been staged
  • staged - the current version of the file's data has been marked for inclusion in the next commit snapshot
  • committed - the file's data has been safely stored in the repository

Consequently, there are three main sections of a git project:

  • working directory
  • staging area
  • repository

Git Sections

Key Concepts[edit]

Exercises[edit]

References[edit]