How to Cut Git History the Clean Way
My use case
Happened to my team and I to work on a repository that, as many others have, has a main
and devel
branch.
The repo is dnf and, while developing dnf5, the devel
branch became
so different from the initial codebase that it became a new component. Here, the problem to separate the codebase from
dnf to dnf5.
In this specific scenario, it was convenient for my team to cut the git history, to get a lighter repo and to cut some
binaries that were in the history of the old dnf
.
Find where to cut
Firs of all, you should find out where to cut i.e. the commit hash. I will use my hash in this example. 12
CUT_HERE=6d812e72f7c1cf979de1ad404ac5a9a460ade4de
This is the procedure to truncate with git, so let’s do it.
git checkout --orphan temp $CUT_HERE
git commit -m "Truncated history"
git rebase --onto temp $CUT_HERE master
git branch -D temp
Use a temp branch
Let’s checkout the right branch, you don’t want to cut the wrong one right?
git checkout dnf-5-devel
git checkout -b dnf5-test-branch
I just created a branch named dnf5-test-branch
which is where I will rebase my truncated history.
Now, we know already the commit we want to cut at, so let’s create a new orphan branch named temp
:
git checkout dnf-5-devel
git checkout --orphan temp $CUT_HERE
Let’s commit all files to it
git commit -m "Truncated History"
And now we need to rebase all commits from our start on dnf5-test-branch
git rebase --onto temp $CUT_HERE dnf5-test-branch
Now the history should be good to go and it should start at our commit "Truncated History"
. Let’s check it.
* eba1b096 [dnf5] Add empty Base class
* 82d11c6f Import a new project template
* 7ecef58c (temp) Truncated history
(END)
Cool! We can now delete temp
git branch -D temp
Delete binaries and unwanted files from history 3
At this point there are many ways to proceed but I find the tool git-filter-repo
45
the most useful and safe.
Here, for example, I have an entire dir to delete from history. Inside this folder there is the whole codebase of
dnf-4
, moved here for convenience, but we don’t need it anymore, right?
git filter-repo --path dnf-4 --invert-paths --force --refs dnf5-test-branch
Now I know that there are some files in a commit that I need to remove. It’s a bunch of .rpm
files.
I can find the commmit hash quite easily by looking for rpms in the log and once I find the hash (which
changes every time) I can export a file list.
git diff-tree --no-commit-id --name-only -r 0f1e4e3d24268d367b59cafa48337eec46e09192 >> deleted-rpms.txt
Then I can edit the file and select just the files I want to delete from the history. Once my file is ready I procede to delete everything I don’t ever need.
git filter-repo --paths-from-file deleted-rpms.txt --invert-paths --refs dnf5-test-branch
Check the new history
Checking the new history can be touch right? Here’s a way to do it.
git log --stat $CUT_HERE..dnf-5-devel > OLD
git log --stat --root > NEW
vimdiff OLD NEW
And we should only see differences in the commit hash.
Now it’s time to push our new codebase to the new repo so that’s easy.
Just remove all branches except dnf5-test-branch
, rename it to main
, remove all remotes, add correct remotes,
push…1, 2, 3… Done!
https://github.com/rpm-software-management/libdnf/commit/6d812e72f7c1cf979de1ad404ac5a9a460ade4de ↩︎
https://web.archive.org/web/20130116195128/http://bogdan.org.ua/2011/03/28/how-to-truncate-git-history-sample-script-included.html ↩︎
https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository ↩︎
https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html ↩︎