Git – GC (Garbage Collection)

Git is a distributed version control system that efficiently handles projects of all sizes. However, over time, a Git repository can accumulate a lot of unnecessary data such as loose objects, unreachable commits, and more. To manage this data, Git employs a garbage collection mechanism, git gc, which cleans up and optimizes the repository. This article explores what Git garbage collection is, how it works, and how to use it effectively.

Table of Content

  • What is Git Garbage Collection?
  • Why is Garbage Collection Important?
  • How Does Git GC Work?
  • Significance of git gc aggressive
  • How is git prune different from git gc?
  • What is the meaning of git gc auto?
  • git gc options

What is Git Garbage Collection?

Garbage collection in Git is the process of cleaning up and compressing unnecessary files and data in a repository. This includes objects that are no longer referenced by any commit and are considered “garbage.” The primary command for invoking garbage collection in Git is git gc.

Why is Garbage Collection Important?

  1. Optimizes Repository Size: By removing unreferenced objects and compressing data, garbage collection helps in reducing the repository size.
  2. Improves Performance: A smaller and more optimized repository can significantly improve Git’s performance, especially for large projects.
  3. Maintains Repository Health: Regular garbage collection helps in maintaining the overall health and integrity of the repository.

How Does Git GC Work?

When you run git gc, Git performs several tasks:

  1. Pruning: Git removes loose objects that are no longer reachable from any commit or reference. These objects can include commits, trees, and blobs.
  2. Repacking: Git repacks all reachable objects into a pack file, which is a compressed file format that stores multiple objects together. This helps in reducing the disk space usage.
  3. Removing Old Data: Git deletes old and unnecessary pack files, logs, and other temporary files.

Git gc checks numerous git config settings before running. These values will aid in the understanding of the rest of git gc’s responsibilities.

Git gc config:

gc.reflogExpire

This is an optional variable with a default value of 90 days. It is used to specify the length of time records in a branch’s reflog should be kept.

gc.reflogExpireUnreachable

This is an optional variable with a default value of 30 days. It is used to specify the length of time inaccessible reflog records should be kept.

gc.aggressiveWindow

This is an optional variable with a default value of 250. When git gc is run with the —aggressive option, it determines how much time is spent in the delta compression phase of object packing. Because this can take longer than expected, the impacts of assertive command are typically long-lasting.

gc.aggressiveDepth

Optional variable with a value of 50 by default. It specifies the compression depth used by git-repack during a git gc —aggressive command.

gc.pruneExpire

This setting is optional and defaults to “2 weeks ago.” It determines how long an inaccessible item will be kept before being pruned.

gc.worktreePruneExpire

This setting is optional and defaults to “3 months ago.” It specifies the amount of time a stale functioning tree will be kept before being removed.

git gc exec:

Git gc really runs a bunch of different private subcommands like git prune, git repack, git pack, and git reference behind the scenes. These commands’ increased responsibility is to find any Git items which are outside of the git gc configuration’s standard limits. These items are then compressed or trimmed as needed once they have been located.

Significance of git gc aggressive

The –aggressive command prompt can be used to run git gc. The  –aggressive option tells git gc to put greater effort into optimizing the code. This makes git gc run slower, but it saves more disc space once it’s finished. The consequences of –aggressive are long-lasting, therefore it’s only necessary to use it after a substantial number of modifications have been made to a repo.

How is git prune different from git gc?

git gc is a parent command and git prune is a child. Essentially, git prune will be triggered by git gc. Git prune is used to delete Git objects that the git gc config has judged unreachable. Learn more about the git prune command.

What is the meaning of git gc auto?

Before executing, the git gc–auto command variant checks if any maintenance is needed on the repository. It exits without even doing work if it determines that cleaning is not required. After execution, several Git tasks run git gc–auto to clear away any loose items they’ve produced. Git gc –auto checks the git settings for threshold levels on free objects and packing compression size before executing. git config can be used to set these values. Git gc–auto will be run if the repository exceeds any of the housekeeping thresholds.

git gc options

$ cd gc --aggressive

The git gc command usually has a quick execution speed, as well as flawless disc space efficiency and desired performance. As a result, the aggressive command will improve memory efficiency while slowing down execution. Because this can take longer than expected, the impacts of assertive command are usually lasting.

$ cd gc --auto

You can use this option to determine whether or not a warehouse is required. It simply moves out if you don’t need it. When configuration variables like gc.auto or gc.autoPackLimit are used in conjunction with the git auto command, the cleaning mechanism is automatically triggered.

$ cd gc --prune=<date>

The prune command is identical to this one. This command’s main aim is to eliminate or keep losing control of things that have been specified on a specific date. It merely displays the older objects that were present at a certain point in time. As a result, if another operation is running in the repository at the same time, the aging and danger of corruption are raised.

$ cd gc --no-prune

This command simply removes all of the repository’s missing objects.

$ cd gc --quite

This command is used to remove all previous progress reports.

$ cd gc --force

Despite the fact that another git gc command may be running in the repository, this command is utilized to conduct the current command. It takes precedence over the previously running git gc command and executes it.

$ cd gc --keep-largest-pack