2022-06-27

Git Notes

This will be a continual work in progress and a testament that I should have made this post over a decade ago.

Dedicated fast-forward merge post of mine.

Reminders: the “index” is what holds staged changes. Useful site for undoing/recovering stuff: https://dangitgit.com .

This post and a lot advice and help for git on the internet assumes the command line.  That is because the git command line is same for everyone and easy to give precise instructions.  But really you should try to do as much as possible inside a nice git gui, possibly your IDE or SourceTree.  A nice git gui makes it easy to go from "I want to ..." to "I did it" by exploring menus and possibly right clicking items instead of looking up weird commands and syntax that you will soon forget.  A nice git gui also makes it much easier to see and understand your current repo state and its history.

Branches And Remotes

If you are dealing with a branch that is also on a remote, there are actually 3 branches:

  • Normal local branch.  This is the local branch that you do your changes on.
  • Remote-tracking branch.  This is a local branch that you cannot modify like a normal branch.  This branch is your local knowledge of the branch on the remote.  For example, "remotes/origin/main" (often referred by the shortcut "origin/main") would be your remote-tracking branch for the "main" branch on origin.  Remote-tracking branches update when you do things like a fetch or pull.
  • The remote branch.  This is the branch on the remote.  Usually your interactions with this branch are limited to things like fetch, pull, or push.

Git has a convention that using "origin/main" in a command means that you're talking about the remote-tracking branch and using "origin main" means you're talking about the remote branch (like "git pull origin main").  For instance, "git merge origin/main" will merge from your local remote-tracking branch origin/main.  And "git pull origin main" will use origin's main to update your local origin/main and then merge from your local origin/main.

There is a git book chapter on remote branches. Stack Overflow also seems to agree (origin/main vs origin main, origin/main vs remotes/origin/main).

Tiny Details You Might Want To Skip

The only thing I know of that is close to an exception on "origin main" vs "origin/main" is when you do something like `git branch NewBranch origin/main -t`.  I believe...

  • The command creates NewBranch from origin/main (the remote-tracking branch).
  • BUT, the `-t` flag sets up NewBranch to track the remote branch that corresponds to the local remote-tracking branch.

Someone might say that they have some normal local branch that "tracks" a remote branch.  What is really going on:

  • Local config (perhaps in .git/config) has entries for `branch.<BranchName>.remote` (ex: `origin`) and `branch.<BranchName>.merge` (ex: `refs/heads/main`) to identify the remote branch to track.  So, looks like even a pedant would say that the local branch tracks a remote branch, rather than saying the local branch actually tracks the remote-tracking branch.
  • These branch config entries let git calculate the corresponding local remote-tracking branch. (ex: `origin/main`, which is short for `remotes/origin/main`, which is short for `refs/remotes/origin/main`, which is also a file underneath your .git folder).
  • Whenever you do a `git pull` on that normal local branch:
    • A fetch is done to make your local remote-tracking branch up to date with the remote branch.
    • Your local remote tracking-branch is merged into your normal local branch.
  • Whenever you do a `git push` on that normal local branch, I think git tries to push the changes to the remote branch and then updates your local remote-tracking branch to match the remote branch.

Here is my non-comprehensive list of git commands that actually communicate with remotes:

  • clone
  • fetch
  • pull
  • push
  • remote prune
  • remote show
  • remote update

Git Commits Are Snapshots, Not Diffs

Some info in this section is from https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/ .

The actual content of a commit object has some stuff like:

  • List of zero or more object ids (hashes) for parent commits
  • Object id (hash) for a tree object that lists out the exact contents of the repo.
    • A tree is basically a list of pointers to tree and/or blob objects.
  • Commit message.
  • Other metadata like author name.

So, a commit is a snapshot of the repo, not a diff.  Diffs are calculated and recalculated as needed.

For the sake of efficiency, diffs do exist in your .git folder, but they are not based on commits.  There are blob objects for holding file content; they are unpacked/loose and just straightforwardly have the file contents.  There are also packfiles that contain, I think, the equivalent of a file content, one or more diffs, and an index.  Decisions in how to pack are optimized for a good end result, not for any relation to the actual change history of the repo.  For instance, I think git likes diffs to be removals while repo changes tend to be additions and git possibly likes to have tip commit stuff not require using any diffs.

(Packfiles can pack up not just blobs, but I think any git object, like a commit, tree, or tag. This has a packfile that contains blobs, trees, commits, and tags all in the same pack file.  Why put such different things in the same packfile? Maybe to keep related stuff close together on disk.)

Commands like rebase and cherry-pick certainly treat commits as if commits are diffs.  Here is roughly how cherry pick behaves:

  • You give it a commit to "apply" to your current branch.
  • Git compares the given commit to its parents, and calculates a diff "equivalent" to the given commit.
  • Git applies this diff to your current branch.

I have not read up on it, but I predict the concept of merging heavily treats commits as diffs.  Merging two snapshots makes a lot less sense than merging two diffs.

A lot of the behavior and written philosophy of git heavily hints at commits being diffs.  I've heard many times that git is about tracking content changes, not files and how git doesn't care about files like other source control systems do.  Yet, in git, each commit is a snapshot of exactly what the files were and the .git folder does not contain anything that directly corresponds to a content change.  I do not think there is actually a contradiction here.  An implementation does not have to be a straightforward reflection of desired functional behavior.  Git development has put a lot of effort into performance, which is important, and I think the world has benefited from that.

Git does not explicitly track renames; git detects and re-detects renames as needed.  I wonder how this works when you are comparing two commits with lots of intervening changes to the renamed files.  It's easy to detect a.txt was renamed it to b.txt if there are zero/small changes between them.  And then if b.txt changes some for each commit, you know b.txt is still b.txt (not renamed).  But if I am comparing commit1 with a.txt to a much later commit100 with a much altered b.txt, how does git detect the rename?  Does it detect the rename?  I will have to look into this.

Git Reset

Git Reset Vs Git Checkout

git reset (when not providing a tree-ish thing) modifies HEAD and the current branch. git checkout modifies HEAD (and files in working tree), often by making HEAD point to another branch (or a commit directly, resulting in a "detached HEAD" where the HEAD is not associated with any branch) and does not modify what any pre-existing branch points to.

Git Reset Modes (soft, mixed, hard)

It’s hard to remember the various reset modes and what they do. The following assumes that you are not providing a “tree-ish” thing to git reset and all discussion is about git reset [<mode>] [<commit>]. From the doc: “ This form resets the current branch head to <commit> and...[possibly does other stuff]”

If you do not provide a commit-ish (HEAD, branch name, tag, commit hash), then HEAD is the implied commit argument. All git reset modes set HEAD to the commit argument.

  • git reset --soft SomeCommit to change just HEAD+branch. File and index are unchanged. Safest in that it can’t destroy changes.
    • git reset --soft with implicit HEAD arg wouldn’t do anything.
  • git reset --mixed SomeCommit to change {HEAD+branch, index} to match SomeCommit. Your files and history are unchanged. If you had changes that were only present in the index (not present in your files), those changes would be destroyed.
    • the --mixed flag is explicitly choosing the default mode; git reset --mixed ... and git reset ... do the same thing.
    • git reset --mixed with implicit HEAD arg would destroy your staged changes. If those staged changes are also present in your files, then you still have the changes as unstaged changes.
  • git reset --hard SomeCommit to change {HEAD+branch, index, files} to match SomeCommit. This includes deleting untracked files and folders. Dangerous in that uncommitted changes are truly gone.
    • git reset --hard with implicit HEAD would destroy your staged and unstaged changes.
  • git reset --merge to change {HEAD+branch, index, and files}, but keeps “those which are different between the index and the working tree (i.e which have changes which have not been added)”. It keeps unstaged changes? Also, “ If a file that is different between <commit> and the index has unstaged changes, reset is aborted.”.
  • git reset --keep is possibly a git reset --hard but aborts if “a file that is different between <commit> and HEAD has local changes”. I don’t know what “local changes” means. Unstaged changes? Staged changes?

Git Reset Examples

You did some commits on one branch (ex: main), but you wish you had put them on their own branch (ex: MyFeatureBranch). From git-reset doc, to put the last 3 commits onto a new MyFeatureBranch branch...

# assume current branch is `main`
$ git branch MyFeatureBranch -u main # also track origin main
$ git reset --hard HEAD~3 # or `git reset --hard origin/main`
$ git switch MyFeatureBranch

I think if I had unstaged changes, I'd stash them or just commit them beforehand.

Git Branch Creation

`git checkout -b NewBranch --track ExistingBranch` Is Surprising

Summary: `git checkout -b NewBranch --track ExistingBranch` and `git switch -c NewBranch --track ExistingBranch` creates NewBranch from ExistingBranch and makes NewBranch track ExistingBranch (or, if Existing Branch is a remote-tracking branch, then you track the corresponding remote branch).  You might think that it creates NewBranch from the current branch, but it does not.

The reference page for git checkout has examples like `git checkout -b <branch> --track <remote>/<branch>`, but `-t`/`--track` does not actually take a branch argument.  You're basically doing a `git checkout -b NewBranch StartingPointForNewBranch` and adding the `-t`.  The `-b` does take an arg, so `git checkout -t ExistingBranch -b NewBranch` is also valid.

You have to go to git-branch's `--track` section to get good explanation of the flag. `-t`, `--track`, and `--track=direct` are all equivalent and "means to use the start-point branch itself as the upstream".  There's also `--track=inherit` "means to copy the upstream configuration of the start-point branch".

Creating A Feature Branch With Good Tracking

See above section for details on `--track=direct` and `--track=inherit`.

When creating a feature branch...

  • `git checkout -b FeatureBranch --track=inherit` is often good.
    • Branches from current branch, which is often what you wanted anyway.  The current branch probably was set up to track the branch you want to track, so you probably get the tracking you want.
    • Can make a shell alias `alias gci='git checkout --track=inherit -b` and use it like `gci FeatureBranch`
  • `git checkout -b FeatureBranch ExistingBranch -t` is usually what you see people recommend.
    • It can specify exactly what you want if you want to start from and track a branch, usually one of your remote-tracking branches.
    • Your current branch doesn't matter.
    • Can do alias for branching from origin/main: `alias gcm=git checkout -t origin/main -b` and use it like `gcm FeatureBranch`.

GitHub Pull Request Branches

To locally work with a GitHub Pull Request do git fetch SOME_REMOTE pull/PR_NUMBER/head:DESIRED_LOCAL_BRANCH_NAME or gh pr checkout PR_NUMBER (gh is GitHub CLI) or git checkout -b DESIRED_LOCAL_BRANCH_NAME remotes/SOME_REMOTE/pr/PR_NUMBERReference.  Also, I think the GitHub Desktop GUI can help with pull requests, but I'm unsure about for upstream remotes.

Git Interactive Rebase

Split A Commit

If the commit you want to split is the most recent one, then...

  • `git reset HEAD~` to "undo" the commit but keep files the same.
  • Do adds and commits as appropriate.

To split a commit, this article seems to say... 

  • `git rebase -i THE_COMMIT~` because you need to give a commit/ref before the commit(s) you want to change.
  • Choose to `edit` the commit that you want to split.  Save and quit the text editor.
  • You will be back at the terminal; git is in the middle of a rebase operation and HEAD will be pointing at the commit you selected to edit.
  • You can do similar to what was advised above for splitting the most recent commit.
    • `git reset HEAD~`
    • Do adds and commits as appropriate.
  • Do `git rebase --continue` if you want to go forward with what you've done or `git rebase --abort` if you want to abandon the interactive rebase.

 

Misc

More Than One Commit/Branch Checked Out To Your File System

Git does support having more than one commit/branch on disk at a time via `git worktree add ...`: https://git-scm.com/docs/git-worktree

You can even do `git worktree add ../new-folder-sister-to-original-repo-folder` to create a worktree folder outside of the original repo folder.

Find Commit That Deleted A File

From stack overflow:

git log -1 --stat -- path/to/file
 

Set Working Tree (Files) To What They Are At Another Commit

git restore path/to/file --source SomeCommitOrBranch # particular file
git checkout SomeCommitOrBranch -- path/to/file # particular file, alternative 
git restore . --source SomeCommitOrBranch # everything, assuming in root folder
  

Merge/Pull Changes To A Non-Current Branch

If you are currently on one branch and want to (fast-forward) merge a second branch into a third branch (stackoverflow), do one of the following ...

git fetch RemoteName RemoteSrcBranch:LocalDstBranch # for remotes
git fetch .  
LocalSrcBranch:LocalDstBranch # for local
Some people have an alias for `git fetch origin main:main` so they can fast-forward main while they are on a feature branch.

Clone Only Certain Directories/Files

The commands in this stackoverflow answer seem to work.  Here is how to download only selected directories of a repo (even skipping commits) with slight tweaks from the stackoverflow answer...

git clone -n --depth 1 --filter=tree:0 GitRepoUrl LocalRepoDir
cd LocalRepoDir
git sparse-checkout set --no-cone Desired/Dir1 Desired/Dir2 ...
git checkout
# now you have git repo in LocalRepoDir with populated child dirs and files
###########################
# if you want to go from sparse checkout to full checkout ...
git sparse-checkout disable
# if you want to go from shallow history to full history ...
git fetch --unshallow

NOTE: even in Windows, you need to provide paths with forward slashes (/), not back slashes (\).  If you use backslashes, there will be no error message, and git will not checkout the directories you thought you specified.

This GitHub blog post on sparse-checkout is a nice read.

Below is a powershell script template I have been using for repos where people often want particular sparse checkouts.  You can choose whether you want commit history or not with the `GetHistory` flag.

[CmdletBinding()]
param (
    [Switch]$GetHistory,
    [string]$RepoUrl = "PutDefaultUrlHere",
    [string]$RepoDir = "PutDefaultDirNameHere"
)

Set-StrictMode -Version Latest
$ErrorActionPreference = 'Stop'

if ($GetHistory) {
    git clone --no-checkout "$RepoUrl" "$RepoDir"
}
else {
    git clone -n --depth 1 --filter=tree:0 "$RepoUrl" "$RepoDir"
}

Push-Location "$RepoDir"
try {
    git sparse-checkout set --no-cone File1 Dir1/File2 Dir2/Dir3
    git checkout
}
finally {
    Pop-Location
}

Git Aliases, Powershell

Even if I don't use all of the below aliases very much, they are a good written record of things I might want to do but don't fully remember, thus the comments.

New-Alias g git
function ga {git add $args}
function gaa {git add --all $args}
function gbr {git branch $args} # 'git branch' to see local branches; '-r' for remote; '-a' for all; '-d SomeBranch' to delete SomeBranch;
function gb {git branch -vv $args} # perhaps a good default when wanting to see branches
function gco {git commit $args} # 'gc' is pwsh alias for Get-Content; '-m "some message"' to set commit msg inline; '--amend' to have staged changes modify previous commit
function gch {git checkout $args} # 'git checkout SomeBranch' to checkout SomeBranch; '-b NewBranch' to create NewBranch
function gnd {git checkout --track origin/main -b $args} # plz provide new branch name
function gni {git checkout --track=inherit -b $args} # plz provide new branch name
function gcp {git cherry-pick $args}
function gd {git diff $args} # shows unstaged changes
function gd9 {git diff -U999999 $args}
function gds {git diff --staged $args} # shows staged changes
function gdh {git diff HEAD $args} # diff of working tree compared to HEAD
function gf {git fetch $args}
function glo {git log $args} # 'gl' is pwsh alias for Get-Location
function glo1 {git log -n 1 $args}
function gls {git ls-files $args}
function glsh {git ls-tree --full-tree --name-only -r HEAD $args} # show files in repo
function gpl {git pull $args}
function gps {git push $args}
function gs {git status $args}
function gw {git switch $args}
function gx {echo "### FETCH" && git fetch && echo "### BRANCH" && git branch -vv && echo "### STATUS" && git status}

function git-clone-dir {
    if ($args.Count -lt 3) {
        Write-Output "need args 1) repo URL, 2) local root dir to create, and 3) dir in repo to clone"
        return
    }
    $repoUrl = $args[0]
    $rootDir = $args[1]
    $dirToClone = $args[2]
    git clone -n --depth 1 --filter=tree:0 "$repoUrl" "$rootDir"
    Push-Location "$rootDir"
    try {
        git sparse-checkout set --no-cone "$dirToClone"
        git checkout
    }
    finally {
        Pop-Location
    }
}

No comments:

Post a Comment