Learn how Git Hooks work by building your own.

August 6, 2024

A wooden board in the ocean with the text: 'GIT HOOKS'.

Summary

If you have been working with the basic git commands (add, commit, push, pull) for a while, and you are interested in taking git to the next level, this blog post is for you. In this post I will introduce you to Git Hooks, show you how you can write your own hooks, and introduce you to the powerful pre-commit framework.

This post assumes basic git knowledge, and familiarity with terms like Pull Request (PR) and Continuous Integration (CI). Note that the bash commands and Python scripts are executed on a Linux operating system (Ubuntu), so you might need to make (small) code adjustments if you want to follow along.

Why should you care about Git Hooks?

When you write code in a production environment, it is likely you cannot push your code directly to the main remote branch. Instead, you have to create a PR so your colleagues can review your code before the code is being merged. As part of a PR, a CI pipeline can be triggered, which checks your contribution for any integration errors. In my current project the following steps are part of the CI pipeline:

Lint with pylint
Unit tests with pytest

pylint checks for errors, code smells, coding standards and formatting standards, while pytest runs any tests in the associated test folder. The CI pipeline never bothered me until a few months ago the CI pipeline migrated from Jenkins to Github Actions. The data-platform team did not control the Github runners, which are the machines that execute the CI pipeline steps, and consequently the execution time increased from ~5 minutes to ~25 minutes.

Waiting for 25 minutes is far from ideal, but waiting for 25 minutes to find out that you made a trailing whitespace linter error is really frustrating. To keep my frustration in check, I regularly ran black (a Python code formatter) and pylint locally before committing and pushing code changes. However, in the heat of the moment (imagine a Monday morning with 15+ broken data pipelines and lots of pressure to get everything up and running again) I sometimes forgot to run these precious commands before submitting my changes, resulting in more waiting time and more pressure.

What I needed was a way to automatically run black and pylint before committing my local code changes and pushing the changes to the server. This is where Git Hooks come into play.

A Git Hook is just a script

Git hooks are just scripts which are stored in the hidden .git/hooks folder. By default the /hooks folder contains example shell scripts, but any properly named executable script will work, including a Python script. You can create a script for triggering operations like committing and merging, but for the purpose of this post we only focus on pre-commit. As the name implies, the pre-commit script executes before any local commit. If you are looking for more information on the available operations and example scripts, I highly recommend the Pro Git book, which is also available online.

Now that we know we can execute a Python script before we commit, it’s time to start coding.

Create a new project folder and initialize git

cd <your projects folder>
mkdir pre-commit-exploration
cd pre-commit-exploration
git init
ls -la

The ls -la command lists all files and directories, including .git, which is hidden by default. Next, move into the .git/hooks directory and open your IDE to easily inspect the folder (I use VSCode in this example). Every git operation for which a hook is available has a .sample script which includes example code and documentation.

cd .git/hooks
code .

To execute code before committing, you need to create a pre-commit file without extensions. Note that I use vim as a text editor, but you can replace this with a text editor of your choice or click on a ‘New file…’ button.

vim pre-commit

As a simple test, enter the following code where you replace the shebang with the name of the folder of the Python executable you want to use. If you don’t know where Python is installed, run which python from your terminal.

#!/home/ernst/miniconda3/bin/python

print("Hello, World")

Next, we need to make the pre-commit file executable.

chmod 777 pre-commit

For the final step of this test you need to move back into the pre-commit-exploration folder, create a file, add it to the staging area, and commit the changes.

cd ../..
vim some_file
git add .
git commit -m "test"

If all went well you should see something like this.

Hello, World
[master (root-commit) e5afdbd] test
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 some_file

Although this was a fun exercise, it’s not yet useful. In the next section we implement running black (“the uncompromising Python code formatter”) on modified Python files.

Automatically format modified Python files before every commit

The goal of this section is to modify the pre-commit script, so that it runs the black code formatter on modified Python files. Therefore, let’s start by adding a few empty files to the pre-commit-exploration folder.

vim example.md
vim hello.py
vim test.py

To retrieve files from the staging area we must use a more complex git command, which is directly copied from chapter 8.4 from the Pro Git book. If you want more details please visit the url and search for the command.

command = "git diff-index --cached --name-only HEAD"

Although we could directly execute this command by changing the shebang to the bash executable, I prefer to use Python since it’s more readable, especially as the script gets longer. To run commands from a Python file, I use the subprocess module from the standard Python library. The module provides a great way to automate tasks that involve running commands from the terminal.

Combining the git diff-index command and the subprocess module results in the following code, which you can replace for the print("Hello, World) line in the pre-commit script.

import subprocess

command = "git diff-index --cached --name-only HEAD"
subprocess_response = subprocess.run(
    command,
    shell=True,
    stdout=subprocess.PIPE,
    text=True
)
changed_files = subprocess_response.stdout.split()
print(changed_files)

The Python code sends the command to the shell, and pipes the response to stdout as text instead of binary (default). Next, we need to add and commit the three new files.

git add .
git commit -m "Added three new empty files."

If all went well, you should see the following output.

['example.md', 'hello.py', 'test.py']
[master 1aaed58] Added three new empty files.
 3 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 example.md
 create mode 100644 hello.py
 create mode 100644 test.py

Next, we need to filter the Python files from ['example.md', 'hello.py', 'test.py'] by extracting and comparing the extension of each file.

changed_python_files = [file for file in changed_files if file.rsplit(".")[-1] == "py"]
print(changed_python_files)

To test this line of code you need to modify both Python files and the Markdown file (just add a single word to each file for example). When you add and commit the changes again, you should only see the Python files.

['hello.py', 'test.py']

In the final step, we want to apply the black formatter on any modified Python files. To run black on multiple Python files, we need to call black followed by space-separated Python files, for example:

black hello.py test.py

Therefore, the final piece of the puzzle are the following lines of code.

black_command = f"black {' '.join(changed_python_files)}"
subprocess.run(
    black_command,
    shell=True,
    stdout=subprocess.PIPE
)

To validate that the pre-commit works as intended, let’s add a poorly formatted function to hello.py.

def hello_world(                         ):
  output = ("Hello" +
  "World"
  )
  return                                      output

Now it would be great if your file was auotomatically formatted, but you are probably facing an error which looks like this.

['hello.py']
/bin/sh: 1: black: not found
[master 325e06f] some changes
 1 file changed, 5 insertions(+), 1 deletion(-)

The easiest solution is to either create a new virtual environment, activate it, install black and re-commit, or to activate a virtual environment that already has black installed. For those who are more familiar with shell scripts, you would expect that if I change the shebang to a different environment where black is installed, it should work. I did try this, but it gets more complicated since I then call subprocess.run, which executes from a new shell, which is not aware of that environment. There is likely a neat and clean solution for this, but for now this works. Furthermore, as we will see in a bit, there is a much better solution available.

conda activate write-tight
pip list | grep black
black              23.12.1

After these changes, try to validate the pre-commit script again. If all went well, you should see the following output.

reformatted hello.py

All done! ✨ 🍰 ✨
1 file reformatted.

And when you inspect hello.py it looks as follows.

def hello_world():
  output = "Hello World"

  return output

Pre-commit will suit most of your needs

Although it’s educational and fun to build your own pre-commit scripts, in most cases you won’t need to. The pre-commit framework supports over 100+ hooks across 10+ languages. Typical hooks that are used in Python projects are:

a code formatter (e.g., black)
a linter framework (e.g., pylint)
an import statement sorter (isort)
a framework to automatically catch PEP8 violations (autopep8)

You might have expected pytest to be in the list of typical hooks, but running tests as part of pre-commit is more tricky than you might think. There are a few important reasons why tests are not directly supported:

running tests is typically slow, while committing should be fast.
tests typically don’t run in a completely isolated virtual environment

Every supported pre-commit hook is able to run in a completely isolated environment. What does that mean? Let’s take black as an example. black does not care about the functions you wrote, the third party packages that you use, and how they interact across different Python modules. black simply analyzes your code as is, and formats where possible. This is different from running tests, which require a virtual environment that includes all your dependencies (e.g., third-party Python packages). Note that it is possible to include pytest in your pre-commit config, but it requires a bit more work.

If you want to get started with pre-commit, I can recommend the official documentation and the video series on Calmcode.

Thank you for your time, hopefully it was helpful. If you have any questions or remarks feel free to reach out. The full pre-commit script can also be found on Github in case you got lost between the code snippets.