Basic local Git Workflow
Objectivesđź“Ť
- init
- add
- commit
- status
- diff
- log
Note: For this tutorial you will need a project folder to work on.
Now that we have a basic understanding of what Git is and what it does, we want to try it. In this section we want to make changes to our project and therefore create different versions of our project. To do so, we first need to learn the basic vocabulary of Git.
Git vocabulary
git init
: to give git initial control over your filesgit add
: adds a change from the working directory into the staging areagit commit
: saves a snapshot of the current version of your project in the repositorygit status
: check the status of the filesgit diff
: let’s you inspect differences between versions (in a certain way)git log
: to view the commit history
First steps
The very first step at the beginning of version controlling your files is to tell git to draw attention to this folder. With git init
we give git the power to track our files.
Navigate to your project folder in your command line/ terminal. You can also use VSCode as an IDE to work on your project. There you have an integrated terminal and are always in the folder which you currently have open in VSCode.
When you are in the right directory, type:
git init
git init
However, our files are not tracked yet, for this we have to add and commit the files with a first commit. We can check this by typing
git status
git status
Through the power we gave git by doing a git init
, git always knows exactly what's going on in our folder. git status
will tell us what that is. When using git only locally, we can have three different outputs of git status
:
- untracked files: git knows there are files in our folder but we didn't tell git to actually track them. Either we created a new file and haven' added it yet or we simply don't want this file to be tracked.
- changes not staged for commit: at some point we added this file and git tracks its status. Now we made changes to this file but we haven't added them, yet.
- changes to be committed: these changes we already added to the staging area and will be included in our next commit.
Let's move on. Because we want EVERYTHING in our folder be tracked by git, we simply add and commit everything at once. First we type
git add .
git add
Like we learned in the previous section, git add
allows us to add files individually to the staging area. With the git add .
we added ALL files to the staging area (that's what the .
does after add). If you want to stage your files individually, you simply exchange the .
with the name of the file you want to stage.
Now that we have all our files staged and waiting for the train to version database, we can commit them. We type
git commit
and our text editor opens automatically. This is because with every commit we need to write a commit message to know later what this commit was about.
The nano and vim text editors
In case you configured nano or vim as your default text editor when using git (vim is the default editor on MaRC3a/JupyterHub): Those text editors are command-line editors which means they open directly in the terminal (no new window opens). With everything, you just need to know how to operate it:
nano:
- when the editor opens for the commit message, you need to first press enter
one time to make new line at the top.
- when you've done writing your commit message, you need to save and close it: CTRL X
closes it but it asks you first if you want to save it by typing a capital Y
, then Enter
.
vim:
- when the editor opens you first need to type i
to change from command mode to insert mode
- make a new line at the top by pressing Enter
- write your commit message and then press Esc
and then :wq
to save and exit
Commit message structure
When the text editor opens and asks us for a commit message we have to insert a title and the actual commit message. The first line we write will be the title. The title should be short and expressive. Then we make an empty line and after this we write a longer commit message. This message can be as long as you wish or need it to be.
Because this is our first commit we can simply write a commit message like
"initial commit"
"added all my files of the choice_rtt project to be tracked by git"
Later, we need to be more specific when describing what we changed.
NOTE FOR MAC USERS: Please make sure your text editor really saves files in .txt
and not .rtf
.
Please make sure that your text editor is really closed!!!
Making changes
Now we want to explore a bit the different functions of git. For this we need to make our first change. Open one of your files and make a change. Afterwards, type git status
again. What does it say now?
So, we need to stage our change before we can commit it. We type
git add *path-to-your-file*
because we only want this file to be committed. Why? Because a commit message is bound to the changes we commit and we want to write a useful commit message that says what we changed. We will see later why this is important. Now, commit your change and give it a useful commit message.
git commit
git commit
When you do a git commit, you don't need to specify which file you are committing as every file in the staging area will be committed. If you don't want certain files to be committed together, you need to specify this by your add-commit workflow. Every time you do a git add
, a file comes into the staging area and is included in the next commit.
Example:
You made changes on file1, file2, and file3. An add-commit workflow like this
git add file1
git add file2
git commit
git add file3
git commit
leads to file1 and file2 committed together, file3 separately.
Whereas an add-commit workflow like this
git add file1
git add file2
git add file3
git commit
leads to all three files being committed together.
Again, our text editor opens and asks us for our commit message, so we should do that.
When we do a git status
again now, it should tells us
"nothing to commit, working tree clean"
which basically means that the version in our working directory is the exact same version as the last committed version (remember the structure of the git repository from the previous section).
Task 1
1) Make a change to one of your files. Add and commit it like we just did.
2) Make another change. Don't add it.
3) Create a new file and write something in it.
4) Do a git status
. What does it say? Do what it says needs to be done to get a clean working tree. Try the tip "git commit -m" below for committing.
git commit -m
If you type git commit -m
instead of git commit
, git expects you to write a very short commit message after the m
, like git commit -m "my commit message"
. This short commit message basically replaces the title+message form and the commit message should be short and precise. You do not have more information to this commit than the short commit message. You should always think about when to make a short commit message and when you might need a longer explanation on what you've changed.
Looking at differences
Another great thing of git, besides storing different versions for us, is its ability to show us exactly what we changed. For this we have an extra command called git diff
. Let`s try this.
Make another change to your file, close it, but DON'T commit it for now. We can now look at the difference between the version in our working directory and the one in our version database (= last committed version). Let's type a git diff
. The output looks weird but is actually really helpful, here's an example of a git diff with an explanation below1:
diff --git a/diff_test.txt b/diff_test.txt
index 6b0c6cf..b37e70a 100644
--- a/diff_test.txt
+++ b/diff_test.txt
@@ -1 +1 @@
-this is a git diff test example
+this is a diff example
- the first line just tells us which file is being compared. If you included multiple files in your commit, the differences for each file will be shown individually. So after all the differences for one file, the differences shown for another file start with this line
diff --git name-of-the-file
again. index
shows us the hashes of the files being compared. You're right, we didn't commit our current version so it shouldn't have commit hash, yet. Well, git assigns a hash to it anyway but it's not really a commit hash but you can think of it more like a "working directory hash".- the third number after
index
tells us which kind of file it is.100644
for example stands for an ordinary text file. --- a/filename
and+++ b/filename
: changes froma/diff_test.txt
are marked with a---
and the changes fromb/diff_test.txt
are marked with the+++
symbol- The remaining diff output is a list of diff 'chunks'. A diff only displays the sections of the file that have changes. In our current example, we only have one chunk as we are working with a simple scenario. Chunks have their own granular output semantics
- The first line is the chunk header. Each chunk is prepended by a header enclosed within
@@
symbols. The content of the header is a summary of changes made to the file. In our simplified example, we have -1 +1 meaning line one had changes - In a more realistic diff, you would see a header like
@@ -34,6 +34,8 @@
: In this header example, 6 lines have been extracted starting from line number 34. Additionally, 8 lines have been added starting at line number 34 - The remaining content of the diff chunk displays the recent changes. Each changed line is prepended with a + or - symbol indicating which version of the diff input the changes come from
git diff
By simply typing git diff
you will be shown the difference between the current version in the working directory and the last committed version. However, we can also look at differences between two committed versions or the difference to a staged version. For this we need the assigned commit hashes, which we can find out through a git log
.
The commit history
git log
will show you your commit history. Your commit history is basically a list of all your commits annotated with metadata. One of the git log
metadata are the commit hashes which you will need to search for/ go back to older versions or creating new branches (which we will learn in a sec). To see your commit history, simply type git log
. Here's an example and explanation of what we see:
commit 211a87947765cb34fe922bb39eed8e2357ea5ae9 (HEAD -> main)
Author: julia-pfarr <pfarr@staff.uni-marburg.de>
Date: Thu Feb 8 18:57:23 2024 +0100
change Git intro
- The first line is the commit hash. That's a unique identifier for this commit
- The next two lines are author and date. If you start working collaboratively on a project, you will also see entries from your collaborators
- the next line is the title you gave your commit. If you put in a longer commit message, you would also see this longer commit message.
(HEAD -> main)
:HEAD
is a symbolic reference pointing to wherever you are in your commit history. When you move in your commit history,HEAD
moves with you, like a shadow. In our example,HEAD
is with the latest commit and is pointing tomain
. This means we are on the latest commit that was made on the branchmain
. We will learn in the next section why having theHEAD
reference is very useful.
git log
- to exit
git log
, pressq
-
to only show a concentrated version of the commit history, type
git log --oneline
. It will only have the last 7 digits of the commit hash (which is all you need), the title of the commit, and theHEAD
andmain
info, like this:211a879 (HEAD -> main) change Git intro c3c16d6 finish intro e4a1e75 add online experiment section to psychopy 8644c34 add graphics to PsychoPy dir for testing 7506aa6 delete Remote WSL extension 541c70c reset 0ec1af7 fix image path in psychopy intro
Task 2
Do a git diff
between your second commit and your second to last commit.
Answer
git log --oneline
to find the commit hashes assigned to the respective commitsgit diff HASH_OLDER_COMMIT HASH_NEWER_COMMIT
Here`s an overview on what we did so far2:
Git and binary files
Git is not good at handling binary files. You can still track them using git by adding and comitting them but git can't do a git diff
of binary files. The reason is that git actually reads the files, thus it needs file formats that are text based (like usual code file types such as .py, .m, .R, or text files like .txt, .md, or tables in .csv, .tsv). File types like .docx, .xlsx, .pdf or images like .png, jpeg and so on are binary files which means the content of the files is actually stored in 0s and 1s. Therefore, git diff
is not really working. There are some options how you can still diff binary files with git (see this entry "diffing binary files") but it is just easier not to.
If you have a lot of binary files and large files in your project you might want to think about excluding those files from the diff operations of git and also checking them into Git LFS instead. Because git checks all files every time and needs to come to the conclusion every time that this particular files is binary. Sometimes this can go wrong and git could possibly destroy your binary file. To avoid this you can create a .gitattributes
files (the .
before the file name is important!!) and tell git to track those files with Git LFS instead.
Task 3
Create a .gitattributes
files on the first level in your project folder:
1) make new file .gitattributes
.
2) put the following content:
```
# Source files
# ============
*.pxd text diff=python
*.py text diff=python
*.py3 text diff=python
*.pyw text diff=python
*.pyx text diff=python
*.pyz text diff=python
*.pyi text diff=python
# Binary files
# ============
*.db binary
*.p binary
*.pkl binary
*.pickle binary
*.pyc binary export-ignore
*.pyo binary export-ignore
*.pyd binary
# Jupyter notebook
*.ipynb text eol=lf
# Graphics: when git-lfs activated, exchange 'binary' with: filter=lfs diff=lfs merge=lfs -text
*.png binary
*.jpg binary
*.jpeg binary
*.gif binary
*.tif binary
*.tiff binary
*.ico binary
# SVG treated as text by default.
*.svg text
# If you want to treat it as binary,
# use the following line instead.
# *.svg binary
*.eps binary
# Documents
*.bibtex text diff=bibtex
*.doc binary
*.DOC binary
*.docx binary
*.DOCX binary
*.dot binary
*.DOT binary
*.pdf binary
*.PDF binary
*.rtf diff=astextplain
*.RTF diff=astextplain
*.md text diff=markdown
*.mdx text diff=markdown
*.tex text diff=tex
*.adoc text
*.textile text
*.mustache text
*.csv text eol=crlf
*.tab text
*.tsv text
*.txt text
*.sql text
*.epub diff=astextplain
.gitignore
Another config file of git is .gitignore
. In this file you can state if you want certain files or whole directories excluded from being tracked by git. Sure, you could also just not add them. But then you will always get the message of untracked files and that might annoy or confuse you with time. So, instead you can just list those files or directories in a .gitignore
file.
For example, Mac Users should set the .DS_Store
files to be ignored. You can either do that globally (echo .DS_Store >> ~/.gitignore_global
, then git config --global core.excludesfile ~/.gitignore_global
), or in a .gitignore
file. The .gitignore
file has the same structure as the .gitattributes
file. Templates can be found here.
-
Example and description retrieved from Atlassian Git Tutorial ↩
-
Image by Peer Herholz ↩