git-clone-subset - Man Page

Clones a subset of a git repository

Synopsis

git-clone-subset [options] repository destination-dir pattern

Description

Clones a repository into a destination-dir and prune from history all files except the ones matching pattern by running on the clone:
git filter-branch --prune-empty --tree-filter 'git rm ...' -- --all
This effectively creates a clone with a subset of files (and history) of the original repository. The original repository is not modified.

Useful for creating a new repository out of a set of files from another repository, migrating (only) their associated history. Very similar to:
git filter-branch --subdirectory-filter
But git-clone-subset works on a path pattern instead of just a single directory.

Options

-h--help

show usage information.

repository

URL or local path to the git repository to be cloned.

destination-dir

Directory to create the clone. Same rules for git-clone applies: it will be created if it does not exist and it must be empty otherwise. But, unlike git-clone, this argument is not optional: git-clone uses several rules to determine the "friendly" basename of a cloned repo, and git-clone-subset will not risk parse its output, let alone predict the chosen name.

pattern

Glob pattern to match the desired files/dirs. It will be ultimately evaluated by a call to bash, NOT git or sh, using extended glob '!(<pattern>)' rule. Quote it or escape it on command line, so it does not get evaluated prematurely by your current shell. Only a single pattern is allowed: if more are required, use extglob's "|" syntax. Globs will be evaluated with bash's shopt dotglob set, so beware. Patterns should not contain spaces or special chars like " ' $ ( ) { } `, not even quoted or escaped, since that might interfere with the !() syntax after pattern expansion.

Pattern Examples:

"*.png"
"*.png|*icon*"
"*.h|src/|lib"

Limitations

Renames are NOT followed. As a workaround, list the rename history with 'git log --follow --name-status --format='%H' -- file | grep "^[RAD]"' and include all multiple names of a file in the pattern, as in "current_name|old_name|initial_name". As a side effect, if a different file has taken place of an old name, it will be preserved too, and there is no way around this using this tool.

There is no (easy) way to keep some files in a dir: using 'dir/foo*' as pattern will not work. So keep the whole dir and remove files afterwards, using git filter-branch and a (quite complex) combination of cloning, remote add, rebase, etc.

Pattern matching is quite limited, and many of bash's escaping and quoting does not work properly when pattern is expanded inside !().

See Also

https://github.com/MestreLion/git-tools

Author

Rodrigo Silva (MestreLion) linux@rodrigosilva.com

Info

2021-02-11