How to Remove a File From a GitHub Repo
- By Bruce Nielson
- ML & AI Specialist
Suppose you need to remove a file from a GitHub repo. How would you do that? If you just remove the file and check it in the older commits still contain that file. Which means it is still part of the overall repo. What to do?
According to GitHub itself you can use a tool like git-filter-repo to fix it.
I often find it helpful to use git as a way to compare text. If I’m working on an AI project, like the BookSearchArchive, I find I need to parse and reparse text until I’ve got it just right, or sometimes we’re dealing with copyrighted text. After I use git to compare versions of output, I can then use git-filter-repo to remove that part of the git history.
But git-filter-repo sure isn’t the easiest tool to figure out if you’re not deeply technical. Git-filter-repo is a tool that can do a lot more than just remove a single file, but that’s all I want to do with it and I don’t’ want to have to filter through a lot of documentation to figure that single use case out. So let me give you a simple set-by-step guide to using git-filter-repo to remove an unwanted file from your GitHub repo.
Main Use Case: Removing a Single File from History
First, don’t use your existing cloned repo. Instead, you need to download a mirror version of the repo that contains all branches and tags so that they can be rewritten. Something like this:
git clone --mirror https://github.com/pathofarchive/NameOfArchive.git
This will create a special version of your repo. It will be in a folder called “NameOfArchive.git” and will NOT contain any of your actual files. This is a bit unnerving I admit. But it is okay.
If you need to reassure yourself this is a real repo, do this:
git clone NameOfArchive.git test-clone
I’m assuming you run that command one level above the mirror repo, so you may have to “cd ..” first. If done correctly, this will create a new folder called ‘test-clone’ that will look like a correct version of your repo.
Okay, now you need to run git-filter-repo. To do this, you need to download this specific file into your mirror repo. The beauty of git-filter-repo is that you just drop this file into your repo and git-filter-repo is ‘installed’ and ready to go. Be sure there is no extension at the end of the file. (i.e. no ‘.txt’ etc.)
Now, if you are in Windows, you can probably run git-filter-repo like this using the python command:
python git-filter-repo --path file_to_remove.txt --invert-paths
Here ‘file_to_remove.txt’ is an example of a file that you are removing. --path specifies you’re going to give a file path. --invert-paths specifies that are wanting to keep everything else ‘as is’.
The process will run, and you’ll see something like this:
Parsed 239 commitsEnumerating objects: 858, done.
Counting objects: 100% (858/858), done.
Delta compression using up to 12 threads
Compressing objects: 100% (306/306), done.
Writing objects: 100% (858/858), done.
Selecting bitmap commits: 227, done.
Building bitmaps: 100% (107/107), done.
Total 858 (delta 547), reused 852 (delta 542), pack-reused 0
Next you need to push this all back up to your repo:
git push --mirror
And you’ll see something like this:
Enumerating objects: 858, done.
Writing objects: 100% (858/858), 7.03 MiB | 2.26 MiB/s, done.
Total 858 (delta 0), reused 0 (delta 0), pack-reused 858
remote: Resolving deltas: 100% (547/547), done.
To https://github.com/brucenielson/BookSearchArchive.git
+ 8d41638...1c4e354 main -> main (forced update)
! [remote rejected] refs/pull/1/head -> refs/pull/1/head (deny updating a hiden ref)
! [remote rejected] refs/pull/10/head -> refs/pull/10/head (deny updating a hdden ref)
! [remote rejected] refs/pull/11/head -> refs/pull/11/head (deny updating a hdden ref)
Etc.
The scary errors are to be expected.
You’ll likely find that you can’t easily update your local repo and that you need to get a fresh clone off of GitHub. So go do that and you should be all set.