git-clone-subset - Man Page
Clones a subset of a git repository
Synopsis
git-clone-subset [options] repository destination-dir pattern
Description
Clones a repository into a destination-dir and prune from history all files except the ones matching pattern by running on the clone:
git filter-branch --prune-empty --tree-filter 'git rm ...' -- --all
This effectively creates a clone with a subset of files (and history) of the original repository. The original repository is not modified.
Useful for creating a new repository out of a set of files from another repository, migrating (only) their associated history. Very similar to:
git filter-branch --subdirectory-filter
But git-clone-subset works on a path pattern instead of just a single directory.
Options
- -h, --help
show usage information.
- repository
URL or local path to the git repository to be cloned.
- destination-dir
Directory to create the clone. Same rules for git-clone applies: it will be created if it does not exist and it must be empty otherwise. But, unlike git-clone, this argument is not optional: git-clone uses several rules to determine the "friendly" basename of a cloned repo, and git-clone-subset will not risk parse its output, let alone predict the chosen name.
- pattern
Glob pattern to match the desired files/dirs. It will be ultimately evaluated by a call to bash, NOT git or sh, using extended glob '!(<pattern>)' rule. Quote it or escape it on command line, so it does not get evaluated prematurely by your current shell. Only a single pattern is allowed: if more are required, use extglob's "|" syntax. Globs will be evaluated with bash's shopt dotglob set, so beware. Patterns should not contain spaces or special chars like " ' $ ( ) { } `, not even quoted or escaped, since that might interfere with the !() syntax after pattern expansion.
Pattern Examples:
"*.png"
"*.png|*icon*"
"*.h|src/|lib"
Limitations
Renames are NOT followed. As a workaround, list the rename history with 'git log --follow --name-status --format='%H' -- file | grep "^[RAD]"' and include all multiple names of a file in the pattern, as in "current_name|old_name|initial_name". As a side effect, if a different file has taken place of an old name, it will be preserved too, and there is no way around this using this tool.
There is no (easy) way to keep some files in a dir: using 'dir/foo*' as pattern will not work. So keep the whole dir and remove files afterwards, using git filter-branch and a (quite complex) combination of cloning, remote add, rebase, etc.
Pattern matching is quite limited, and many of bash's escaping and quoting does not work properly when pattern is expanded inside !().
See Also
Author
Rodrigo Silva (MestreLion) linux@rodrigosilva.com