git-big

Large file management for git

Overview

A simple plugin for git designed to allow it to handle large files. File transports are user-defined using the big-push and big-pull hooks. Designed to be more efficient, lightweight, flexible, and simpler than alternatives such as git-media and git-annex. git-big is written entirely in C89 for portability and speed.

Plugin Installation

The plugin is self-contained inside a single executable, however building this plugin requires the following to be present on your system:

To build on a Linux or OSX system simply run: git clone --recurse-submodules https://github.com/vitei/git-big.git cd git-big mkdir build && cd build cmake .. sudo cmake --build . --target install Building on Windows can be achieved by following the same steps, however the MinGW system is currently required.

Initialising a repository

To initialise a repository to use git-big, cd into that repository's folder and run: git big init This creates attributes for use on your binary files and sets up the hooks that git-big requires. (git-big makes use of the pre-commit and pre-push hooks.)

Managing Files Using git-big

To manage a binary file using git-big you must add a glob referencing that file to your .gitattributes file. This lets you manage both different file types using git-big as well as single files. Here is an example .gitattributes that will manage some image files using git-big: *.bmp binary filter=big *.jpg binary filter=big *.png binary filter=big Pretty simple right? For more details on .gitattributes please read man gitattributes.

Important: All big files must have their filter property set by a checked-in .gitattributes file. (I.e. not in the global .gitattributes or .git/attributes files.) This allows git-big to ensure all users of that repository can access files managed with git-big.

After creating that .gitattributes file, any bitmap, JPEG and PNG files will now not be added directly to the git database. Instead, a hash representing the image data will be added to the git database and git-big will store the raw data under its own database. (This is located at .git/big.)

Sharing Files Managed by git-big

Your binary files are now being managed automatically by git-big, but if you perform a git push none of those files will be available to your team. Not much use for any real-world project! Fortunately, git-big has two hooks that allow you to push and pull big files using any file transport method you want. They are called, unsurprisingly, big-push and big-pull and are located with the other git hooks inside .git/hooks.

The parameters to these two scripts are the same. The first parameter is the hash of the file's binary data. This should be used as an ID for this big file. The second parameter is a path to the raw data file managed by git-big. This file should be copied to wherever you intend to manage your big data on a push, and be created/overwritten with the stored data on a pull. Both hooks should return 0 on success and any other numerical code on an error. Error output should be written to standard error.

Here are two example scripts which use SCP to push and pull git-big-stored files. Firstly, big-push: #!/bin/sh echo "Pushing $1..." 1>&2 scp $2 username@host:data_dir/$1 And here is the corresponding big-pull: #!/bin/sh echo "Pulling $1..." 1>&2 mkdir -p $(dirname $2) scp username@host:data_dir/$1 $2 Hopefully you will see how easy it is to swap SCP out for any other program to make your own file transports!

Cloning a repository

Cloning a repository takes a little more work. Firstly you must clone the repository without checking the files out. Then you can initialise git-big and create the big-pull hook. Finally, you can run a checkout to get all the repository's files! git clone --no-checkout repo_uri repo && cd repo git big init echo "big-pull hook code" > .git/hooks/big-pull git checkout master

Finished!

You have now seen how to set up a repository to use git-big. Unlike other large file management plugins for git, git-big does not require any special commands to use, it works seamlessly with your normal git workflow. If you are about to commit any changes that will break large files on another user's machine, git-big will inform you and stop the commit.

Disclaimer

git-big is still alpha software. It is in active use at Vitei, however it is almost certain there are still bugs and edge-cases that we have not encountered. Please file bug reports on the bug tracker, or submit patches via Github.