Rewriting History

Sometimes files get committed to Git that shouldn’t be. We all do it. This post will cover a process on how to remove the mistake without losing the entire repository history.

As covered in other posts, I like to do the Advent of Code puzzles. This year though I learnt that the creator of AoC had asked for the puzzle inputs not to be made public as the inputs are personal to AoC participants and having a body of them can allow for reverse engineering the site.

I did a bit more digging and found that it’s also covered in the About Pages:

Can I copy/redistribute part of Advent of Code? Please don’t. Advent of Code is free to use, not free to copy. If you’re posting a code repository somewhere, please don’t include parts of Advent of Code like the puzzle text or your inputs. If you’re making a website, please don’t make it look like Advent of Code or name it something similar.

My first thought when I read this was “ah, crap” as I knew that I’d been checking input files into Git and for some years had also been uploading the puzzles themselves. In my mind I was keeping the puzzle, input and solution in one convenient place, what I didn’t consider was the potential impact of this.

I had to put this right.

As far as I was concerned, the content was one of three types:

  1. My own IP, the soltuions I had created to the problems. These included any build files, language specific project files and scripts to perform setup. These are all files I’m happy to share and covered by Creative Commons Zero v1.0 Universal licence.
  2. AoC common IP. This covered the description of the problem and the example input given in the description.
  3. AoC specific IP. This is essentially the puzzle input as each puzzle input is individual to the AoC account.

Dealing with AoC Common IP

Everybody can see these files regardless of whether they have an AoC account or not. So, as these files are public already I think that deleting my copies of them from my repositories should be sufficient. In one repository I had included the puzzle description as a code comment and in another repository I included it in README.md files. Both were easy to resolve and didn’t take too much time. Easy.

Dealing with AoC specific IP

Superficially these files are easy to deal with, delete them and check in, however… I never let my life be that simple 🙄

Once a file is committed to Git, it can be seen in the history of the respository. For example, even though the repo now has the puzzle_input.txt file deleted, this shows the commit where the file was committed:
GitHub view with a puzzle_input.txt

It would be easy for somebody with a bit of Git knowledge to clone the repo and extract all the input files from the history. This means that not only do I need to delete the files, but I also need to delete their presence from Git. Luckily, there is a tool that can help with this, BFG. The tool is written in Java so you will need a runtime to execute it.

  1. First step is to make a copy of any files that you want to keep, these can be easily obtained from the AoC site again but for speed (and safety) it’s worth copying the local clone of the repo to another directory.
  2. After copying the repo, the next step is to make sure all of the input files are deleted. Push the deletions to the remote Git so that they are gone from the main branch.
  3. Now the fun begins. We need to make another clone of the repo, this time using --mirror so that we get a complete history, for example:
    git clone --mirror git@github.com:andrewfitzy/2023-advent-of-code.git
    
  4. Luckily I’ve been giving my input files the same name each day as I’m lazy and can copy/paste quickly. To delete all the input files with a given name from the Git history we now need to use BFG. Run this command from the parent folder of the cloned repo:
    java -jar ~/Development/BFG/bfg-1.14.0.jar --delete-files puzzle_input.txt 2023-advent-of-code.git
    
  5. When the BFG process completes it will present a report about the actions it performed. Next we need to go into the minimal repo and then run the git reflog command:
    cd 2023-advent-of-code.git
    git reflog expire --expire=now --all && git gc --prune=now --aggressive
    
  6. Once reflog completes we need to push the repo:
    git push
    
  7. We can now check in Github to see that the input file has been removed from the commit where is previously appeared:
    GitHub view with no puzzle_input.txt
  8. The remote will now be up to date with the removal of the input files. Our old copy of the report we had locally will still contain the history with the deleted file in it though. Because of this, we need to clone the repository again:
    git clone git@github.com:andrewfitzy/2023-advent-of-code.git
    
  9. We now need to update the .gitignore file to add the puzzle input files. For the repo used in this example, the entry is:
    # Ignore any test input files in resources
    /test/*/puzzle_input.txt
    
  10. Final step is to copy the input files back into the fresh clone of the repo and make sure they can’t be committed. If this is the case we’re free to carry on with solving those AoC puzzles safe in the knowledge that we won’t be making our personal input files public.