Skip to main content

Start

This chapter covers what you need to know before you start using Git. We'll start with some historical background on version control tools, then try to get Git running on your system until it's finally configured and ready to start developing properly. After reading this chapter, you'll understand why Git has become so popular and why you should start using it right away.

Version Control Systemsโ€‹

Version Control Systems (VCS) are systems that record changes to the contents of one or more files so that future revisions of a particular version can be viewed.

Local Version Control Systemsโ€‹

Most local version control systems use a database to record the differences between successive updates of a file.

One of the most popular is called RCS and is still found on many computer systems today. rcs works by keeping sets of patches (patches are changes before and after a file revision) on the hard disk; by applying all the patches, the contents of each version of the file can be recalculated.

A centralised version control systemโ€‹

Centralised Version Control Systems (CVCS) were created to solve the problem of local version control systems not working together. It uses a single, centrally managed server that holds all file revisions; collaborators connect to the server via a client to pull the latest file or commit an update.

Benefits: Compared to a local VCS. Now everyone can see to some extent what everyone else in the project is doing. And administrators can easily control each developer's permissions, and it's far easier to manage a CVCS than to maintain a local database on individual clients.

Disadvantages: The obvious disadvantage is the single point of failure of the central server. If it's down for an hour, for example, then nobody can commit updates and work together for that hour; if the disk where the central database resides is corrupted and not properly backed up, you will undoubtedly lose all your data - including the entire change history of the project - and be left with individual snapshots that people keep on their respective machines.

Distributed version control systemsโ€‹

To address the pains of a centralised version control system, the Distributed Version Control System (DVCS) was introduced. As the name implies, each node is based on a mirror copy of a recognised central node server, so each node has complete information about the code repository. So there is no central node to fail, no physical damage to recover from, no incomplete logging history that prevents collaborators from working with each other.

Further, many of these systems can be specified to interact with a number of different remote code repositories. This allows you to collaborate with different work groups on the same project. You can set up different collaboration processes, such as hierarchical workflows, that you wouldn't have been able to do in a centralized system.

A Brief History of Gitโ€‹

Like many of the great things in life, Git was born in a time of great controversy and innovation.

The Linux kernel open source project had a large number of participants. The majority of Linux kernel maintenance was spent on the tedious task of committing patches and keeping archives (between 1991 and 2002). By 2002, the entire project team started to implement a proprietary distributed version control system, BitKeeper, to manage and maintain the code.

In 2005, the commercial company that developed BitKeeper ended its partnership with the Linux kernel open source community and took back the Linux kernel community's right to use BitKeeper for free. This forced the Linux open source community (and in particular Linus Torvalds, the creator of Linux) to develop their own versioning system based on the lessons learned while using BitKeeper. They set a number of goals for the new system.

  • speed
  • **simple design
  • Strong support for non-linear development models (allowing thousands of branches of parallel development)
  • fully distributed
  • Ability to efficiently manage very large scale projects like the Linux kernel (speed and data volume)

Since its inception in 2005, Git has matured and perfected itself to be highly easy to use while still retaining the goals set at the beginning. It's fast, great for managing large projects, and has an incredibly non-linear branch management system.

Git Featuresโ€‹

So, in short, what kind of system is Git? What are its main features?

Direct snapshotting, not diff comparisonโ€‹

The main difference between Git and other version control systems is the way it treats data; Git records snapshots directly, not diffs. Every commit or save project state in Git creates a snapshot of the entire file and keeps an index of that snapshot; if the file is unchanged, Git doesn't re-store the file, it just keeps a link to the previously stored file. Git treats data more like a stream of snapshots.

Nearly all operations are performed locallyโ€‹

Because Git is a distributed version control system, most operations require only access to local file resources without operating on a recognized central node server. Most Git operations appear to be done instantly, with no network latency.

Git ensures data integrityโ€‹

All data in Git is stored with a checksum calculated using a SHA-1 hash and indexed by the hash value, not the file name. This means that it is impossible to change the contents of any file or directory without Git knowing about it.

// SHA-1 Hash
24b9da6552252987aa493b52f8696cd6d3b00373

Git generally only adds dataโ€‹

Because Git generally only adds data, it's very difficult to make Git perform any irreversible operations or make it purge data in any way.

The Three States of Gitโ€‹

Git has three states, and your file may be in one of them: committed, modified, and staged.

  • Modified means that the file has been modified but not yet saved to the database
  • Staged means that the current version of a modified file has been marked for inclusion in the next commit snapshot.
  • Committed means that the data is safely stored in the local database.

This gives our Git project three stages: Working Directory, Staging Area, and Git Repository.

  • The working area is a separate extract of the contents of a version of a project, placed on disk for you to use or modify.
  • The staging area (term: index) holds information about the list of files that will be committed next.
  • The Git directory is where Git keeps the project's metadata and object database.

So the basic workflow of Git is as follows.

  1. modify the file in the workspace (modified).
  2. Selectively staging the changes you want to commit next time, so that only parts of the changes are added to the staging area (staged).
  3. Commit the update, find the file in the staging area, and store the snapshot permanently in the Git directory (Committed).

Running Git for the First Timeโ€‹

After installing Git, you'll want to customize your Git environment, configure it once, and keep the configuration information when you upgrade.

Git Configurationโ€‹

  1. system configuration: This contains the general configuration for each user and their repository on the system.
  2. global configuration: Only for the current user and works for all repositories on the system.
  3. repository configuration: Only works for the current repository, and is the default option.

The configuration file scopes use the proximity principle, each level will override the configuration of the previous level. To modify a different configuration file, pass different arguments when executing git config.

git config --system
git config --global
git config --local

User Informationโ€‹

Be sure to set your username and email address. This is crucial because every Git commit uses this information and it is written to every commit and cannot be changed.

git config --global user.name <user-name>
git config --global user.email <email@email.com>

View configurationโ€‹

// All Git configurations
git config --list

// Check one of Git's configurations
git config <key>

Get helpโ€‹

git help <verb>
git <verb> --help
man git-<verb>

// A concise help file
git <verb> -h