Git and GitHub
A quick introduction
OverviewQuestions
What are the basic terms used by version control systems?
Which files are contained within the .git directory?
How to install git?
How does the basic collaborative workflow look like?
What are some of the most important git commands?
Objectives
Understand the basic terminology of Git and Github
Install and setup git
Understand the collaborative workflow and commit message etiquette
List some of the most useful commands that can be easily accessed in your everyday work
Version Control Basics
There are various Version Control Systems such as:
A version control system can be either:
centralized - all users connect to a central, master repository
distributed - each user has the entire repository on their computer
Terminology
Version Control System / Source Code Manager
A version control system (or source code manager) is a tool that manages different versions of source code. It helps to create snapshots ("commits") of project files, thereby, supporting the tractability of a project.
Repository / repo
A repository is a directory which the version control system tracks and should contain all the files of your project. Besides your project files, a repository contains (hidden) files that git uses for configuration purposes. Git, by default, tracks all your files in a repository. If there are files you do not wish to track, you can include them in the manually created .gitignore file.
Repositories can be located either on a local computer or on the servers of an online version control platform (such as Github).
Staging Area / Staging Index / Index
Before committing changes to your project code, the files you want to snapshot need to be added to the Staging Index. Changes to these files can be captured in a commit.
Commit
A commit is a snapshot of the files that are added to the Staging Index. Creating a commit can help you to save a particular version of your project. When committing changes, you should also include a commit message that explains the changes of the project files since the previous commit. Therefore, commits track the evolution of your project and allows you to see the changes from one commit to another. It is also useful when experimenting with new code, as git makes it possible to jump back to a previous commit in case your code changes do not work out as planned.
SHA
A SHA("Secure Hash Algorithm") is an identification number for each commit. It is a 40-character string composed of characters (0–9 and a–f) such as e2adf8ae3e2e4ed40add75cc44cf9d0a869afeb6
.
Branch
A branch is a line of development that diverges from the main line of development. It further allows you to experiment with the code without modifying the line of development in the master branch. When the project development in a branch turns out successful, it can be merged back to the master branch.
Checkout
Checkout allows you to point your working directory to a different commit. Therefore, you can jump to a particular SHA or to a different branch.
.Git Directory Contents
The .git directory contains:
config file - it stores the configuration settings
description file - file used by the GitWeb program
hooks directory - client-side or server-side scripts can be placed here to hook into Git's lifecycle events
info directory - contains the global excludes file
objects directory - stores all the commits
refs directory - holds pointers to commits (e.g "branch" and "tag")
Git workflow
Making changes
git status
,git init
(store all changes you commit in this folder),git add
(except.gitignore
)git commit
often, to avoid conflictcommit message should be: If applied, this commit will “your message” (eg.
git commit -m "change label names"
or "update README").commit message could also refer to Issues (eg. “close #3”) (issues are like to-do-s)
git log
: view the history of commits you’ve made
Collaborate with others
In practice, it is good to be sure that you have an updated version of the repository you are collaborating on, so you should git pull before making your changes. The basic collaborative workflow would be:
update your local repo with
git pull origin master
,make your changes and stage them with
git add
,commit your changes with
git commit -m
, andupload the changes to GitHub with
git push origin master
It is better to make many commits with smaller changes rather than of one commit with massive changes: small commits are easier to read and review.
fork: copy to origin / clone: copy remote repo to create local repo / pull: copies changes from a remote repository to a local repository.
.gitignore (eg. large files OR on Mac hidden .DS_Store files OR data in ss-descriptives)
Branching, conflicts
Note, if someone pushes a commit to GitHub before you push your changes, you’ll need to integrate those into your code (and test them!) before pushing up to GitHub.
git diff
displays differences between commits.git checkout
recovers old versions of files /git revert
backs out commitBranching
git branch
creates a new branchgit checkout
switches to a different branchMerging
git merge
merges other_branch into the current branch
Conflicts occur when two or more people change the same file(s) at the same time. The version control system does not allow people to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.
Push: You will not be able to push to GitHub if merging your commits into GitHub’s repo would cause a merge conflict. Git will instead report an error, telling you that you need to pull changes first and make sure that your version is “up to date”. Up to date in this case means that you have downloaded and merged all the commits on your local machine, so there is no chance of divergent changes causing a merge conflict when you merge by pushing.
Pull: Whenever you pull changes from GitHub, there may be a merge conflict! These are resolved in the exact same way as when merging local branches: that is, you need to edit the files to resolve the conflict, then add and commit the updated versions.
Git LFS
GitHub recommends repositories remain small, ideally less than 1 GB, and less than 5 GB is strongly recommended. Smaller repositories are faster to clone and easier to work with and maintain. Individual files in a repository are strictly limited to a 100 MB maximum size limit, and git gives a warning for updating files larger than 50 MB. Git Large File Storage (LFS) is a useful Git extension if someone has to version large files—even those as large as a couple GB in size—with Git. It replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise. You can download the extension here: https://git-lfs.github.com/. Furhter informations about installation for Mac/Windows/Linux can be found here: https://docs.github.com/en/github/managing-large-files/installing-git-large-file-storage.
To decide whether you need Git LFS or not, you can type in the command line: find . -type f -size +1M
, it finds files greater than 1 MB in the given folder. Large files make fetching and pulling quite slow, so it is recommended to git lfs install
in the repo with files larger than 1-10 MB size (depending on the project) and then git lfs track
the given large files. For power users: find . -size +1M | xargs -d '\n' git lfs track \"{}\"
finds and then tracks all files greater than 1 MB in a given repository. LFS tracking modifies the .gitattributes file, so do not forget to commit that as well.
Useful commands
Code
Short description
git init
Initialize local git repository
git status
Check the status of git repository (e.g. the branch, files to commit)
git add
Add files to staging index
git add .
Add all modified files to staging index
git commit -m"Text"
Commit changes with commit message
git log
Check git commits specifying SHA, author, date and commit message
git log --oneline
Check git commits specifying short SHA and commit message
git log --stat
Check git commits with additional information on the files changed and insertions/deletions
git log -p
Shows detailed information on lines inserted/ deleted in commits
git log -p --stat
Combines information from previous two commands
git log -p -w
Shows detailed information on commits ignoring whitespace changes
git show
Show only last commit
git show <options> <object>
View expanded details on git objects
git diff
See the changes that haven’t been committed yet
git diff <SHA> <SHA>
Shows changes between commits
git tag
Show existing tags
git tag -a "tagname"
Tag current commit
git tag -d "tagname
Delete tag
git tag -a "tagname" "SHA pattern"
Tag commit with given SHA pattern
git branch "name_of_branch" "SHA pattern(optional)"
Create new branch – at SHA pattern
git branch “name_of_branch” master
Start new branch at the latest commit of master branch
git checkout “name_of_branch”
Move pointer to the latest commit of the specified branch
git branch -d “name_of_branch
Delete branch, use -D to force deletion
git checkout -b “name_of_branch”
Create branch and checkout in one command
git log --oneline --graph --all
Show branches in a tree
git merge “name_of_branch_to_merge_in”
Merge in current branch to another
Useful resources for mastering git and github:
Technical foundations of informatics book: https://info201.github.io/git-basics.html
Software carpentry course (Strongly recommended): https://swcarpentry.github.io/git-novice/
Github Learning Lab: https://lab.github.com/
If you are really committed (pun intended): https://git-scm.com/book/en/v2
Getting started with Github: https://help.github.com/en/github/getting-started-with-github
Git cheatsheet: https://education.github.com/git-cheat-sheet-education.pdf
Learn git with bicbucket cloud: https://www.atlassian.com/git/tutorials/learn-git-with-bitbucket-cloud
Useful GUI tools for version control:
Sublime Merge: https://www.sublimemerge.com
Version Control in VS Code: https://code.visualstudio.com/docs/editor/versioncontrol
Last updated