LINUX & HPC : Advanced Large Scale Computing at a Glance !: Git Unlocked:: A Beginner's Guide to Version Control

Git is source code management systems. Its distributed version control system (DVCS) which facilitates multiple developers and teams to work in parallel. Git’s primary emphasis is on providing speed while maintaining data integrity. Git also provides ability to perform almost all the operations offline, when network is not available. All the changes can be pushed to the server on network’s availability.

source: GIT workflow structure

Git is designed as a distributed version control system, which means:

Data Structure:

Git uses a data structure called a Merkle tree, where each commit points to its parent commit, forming a chain of versions. Each commit has a unique SHA-1 hash that identifies it and its content.

Objects:

There are four main types of objects in Git:

Blobs: Store file data.
Trees: Represent directories and contain blobs or other trees.
Commits: Point to trees and contain metadata such as author, date, and commit message.
Tags: Reference specific commits for marking release points24.

Branches:

Branches are pointers to commits, allowing multiple lines of development within the same repository. The default branch is usually called master or main.

Staging Area:

Before committing changes, files are added to the staging area using git add. This allows users to review changes before finalizing them in a commit.

Remote Repositories:

Git allows synchronization between local repositories and remote ones (like GitHub), enabling collaboration among multiple users.

Some of the commonly used terms are listed below :

Repository - is directory that contains all the project files.
Clone - it creates a working copy of local repository.
Branch - is created to encapsulate the development of new features or to fix bugs.
HEAD - points to the last commit.
Commit - commits changes to HEAD and not to remote repository.
Pull - Gets the changes from the remote repository to the local repository.
Push - Commits the local repository changes to the remote repository.

---------------------------------------------------------------------------------------------------------

To understand when to use "git fetch" instead of "git pull", let's break it down into simpler terms with an example.

git fetch: Think of this as checking for updates. It retrieves the latest changes from a remote repository (like GitHub) but does not apply those changes to your current working files. It’s like downloading new episodes of your favorite show but not watching them yet.

git pull: This command does two things at once: it fetches the latest changes and immediately merges them into your current files. It’s like downloading those episodes and then watching them right away.

When to Use "git fetch"

1. You Want to Review Changes First: If you're working on a project and you want to see what others have done before you integrate their changes, `git fetch` allows you to check for updates without making any changes to your own work right away.

2. Avoiding Conflicts : If you have uncommitted changes in your local files, using `git pull` could create conflicts because it tries to merge changes automatically. By using `git fetch`, you can first see what has changed and then decide how to handle it.

3. Working Offline or Later: If you're in a situation where you can't deal with merging right away (like being on a beach without Wi-Fi), you can fetch the updates now and merge them later when you're ready.

Example : Imagine you're working on a team project.

1. You’re writing code on your laptop.

2. Your teammate pushes some updates to the shared repository while you’re still working.

3. You want to see what they changed without disrupting your current work.

git fetch origin

This command checks for any updates from the remote repository named "origin" and downloads the latest information about those changes, but it doesn’t change any of your files yet.

4. After fetching, you can review what your teammate did:

git log origin/main

This shows you the commit history of the remote branch so you can see what’s new.

5. Once you're ready and have made sure there are no conflicts, you can merge those changes:

git merge origin/main

Now your local files are updated with your teammate’s changes.

This way , use "git fetch" when you want to keep your local repository updated without immediately affecting your working files. This gives you more control over how and when to integrate changes from others, especially useful in collaborative environments or when you're not ready to merge yet.

--------------------------------------------------------------------

The recursive clone of a Git repository:

This refers to the process of cloning a repository along with all its submodules. Submodules are essentially repositories nested inside another repository, allowing you to include external libraries or components as part of your project.

How It Works : When you perform a standard clone using the command:

git clone [repository-url]

It only clones the main repository. If that repository contains submodules, those submodule directories will be created but will remain empty until you initialize and update them.

To clone a repository along with its submodules, you use the `--recurse-submodules` option:

git clone --recurse-submodules [repository-url]

This command does the following:

1. Clones the Main Repository: It creates a local copy of the main repository.

2. Initializes Submodules: It automatically initializes any submodules defined in the ".gitmodules" file.

3. Updates Submodules: It fetches the content of those submodules, ensuring they are populated with the correct data.

Example: Imagine you have a project that relies on a library hosted in another Git repository. If you want to include this library as a submodule in your project, your main repository might look like this:

- Main Repository: "MyProject"

- Submodule: "ExternalLibrary"

If you were to clone MyProject without the recursive option, you'd end up with:

MyProject/

├── .git/

├── ExternalLibrary/ (empty)

└── other files...

with that option

git clone --recurse-submodules [repository-url]

You would get:

MyProject/

├── .git/

├── ExternalLibrary/ (with content)

└── other files...

Why use Recursive Clone?

Convenience: It saves time by automatically fetching and setting up all necessary components for your project.
Consistency: Ensures that all required libraries or modules are at the correct version specified in the main repository.

Using a recursive clone is essential when working with projects that depend on multiple repositories to ensure everything is correctly set up from the start.

$ git clone --recursive git@github.xyz.com:smpi/xyz-tests.git

Cloning into 'xyz-tests'...

remote: Enumerating objects: 24, done.

remote: Counting objects: 100% (24/24), done.

remote: Compressing objects: 100% (21/21), done.

remote: Total 18759 (delta 13), reused 7 (delta 3), pack-reused 18735

Receiving objects: 100% (18759/18759), 125.23 MiB | 33.33 MiB/s, done.

Resolving deltas: 100% (12772/12772), done.

Updating files: 100% (5233/5233), done.

Submodule 'smpi-ci/mtt' (https://github.com/open-mpi/mtt-legacy.git) registered for path 'smpi-ci/mtt'

Submodule 'tests/hpc-smpi-fvt' (git@github.xyz.com:smpi/hpc-smpi-fvt.git) registered for path 'tests/hpc-smpi-fvt'

Submodule 'tests/ompi-tests' (git@github.xyz.com:smpi/ompi-tests.git) registered for path 'tests/ompi-tests'

Cloning into '/data/nfs_smpi_ci/xyz-tests/smpi-ci/mtt'...

remote: Enumerating objects: 1abc, done.

remote: Counting objects: 100% (21/21), done.

remote: Compressing objects: 100% (18/18), done.

remote: Total 13184 (delta 7), reused 10 (delta 3), pack-reused 13163

Receiving objects: 100% (13184/13184), 3.94 MiB | 20.67 MiB/s, done.

Resolving deltas: 100% (8686/8686), done.

Cloning into '/data/nfs_smpi_ci/xyz-tests/tests/hpc-smpi-fvt'...

remote: Enumerating objects: 15511, done.

remote: Total 15511 (delta 0), reused 0 (delta 0), pack-reused 15511

Receiving objects: 100% (15511/15511), 81.99 MiB | 28.47 MiB/s, done.

Resolving deltas: 100% (10318/10318), done.

Cloning into '/data/nfs_smpi_ci/xyz-tests/tests/ompi-tests'...

remote: Enumerating objects: 36104, done.

remote: Total 36104 (delta 0), reused 0 (delta 0), pack-reused 36104

Receiving objects: 100% (36104/36104), 103.97 MiB | 26.81 MiB/s, done.

Resolving deltas: 100% (23820/23820), done.

Submodule path 'smpi-ci/mtt': checked out 'mnccd42d37c232883d3a600ac4151868a3327b7'

Submodule path 'tests/hpc-smpi-fvt': checked out 'abc5ab4afacfacccc04dacb2c55a41477d2c02'

Submodule path 'tests/ompi-tests': checked out 'xyx7f6e194df5bbc16e836f8d63556be363a94ca5'

------------

NOTE: With version 2.13 of Git and later, --recurse-submodules can be used instead of --recursive

With older Git , you can use:

git clone --recursive git://github.com/foo/bar.git

For already cloned repos, or older Git versions, use:

git clone git://github.com/foo/bar.git

cd bar

git submodule update --init --recursive

------------

The "git merge" :

The "git merge" command is a fundamental feature in Git that allows developers to combine changes from different branches into a single branch. This process is essential for integrating various lines of development, such as feature branches or bug fixes, into the main codebase.

What git merge Does?

1. Combines Changes: The primary function of "git merge" is to integrate the histories of two branches. When you execute "git merge <branch-name>", Git takes the changes from the specified branch and merges them into your current branch, which is often referred to as the "receiving branch".

2. Creates a Merge Commit: If the branches have diverged (i.e., both have new commits since they last shared a common ancestor), Git creates a new commit known as a merge commit. This commit has two parent commits: one from each branch being merged. This allows Git to maintain a complete history of changes.

3. Finds the Common Base: Before merging, Git identifies the common ancestor (or merge base) of the two branches. It then computes the differences (diffs) between this common ancestor and each of the branches to apply those changes simultaneously.

4. Handling Conflicts: If there are conflicting changes in the same part of a file from both branches, Git will pause the merge process and prompt you to resolve these conflicts manually. Once resolved, you can finalize the merge with a commit.

Types of Merging

1) Fast-Forward Merge: If the current branch is directly behind the branch being merged (meaning it has no new commits since their last common ancestor), Git can simply move the pointer of the current branch forward to the latest commit of the other branch without creating a new commit

2) Three-Way Merge : When both branches have diverged, a three-way merge occurs. Here, Git uses three points (the two branch tips and their common ancestor) to create a new merge commit that reconciles changes from both branches.

Example: To illustrate how "git merge" works, let's go through a practical example involving two branches: main and feature.

1. Initial Setup: You start with a repository that has a `main` branch and you create a new branch called `feature` for development.

git checkout -b feature main

2. Making Changes in the Feature Branch:

You make some changes in the `feature` branch and commit them.

# Edit some files

git add <file>

git commit -m "Add new feature"

3. Switching Back to Main:

After completing your work on the "feature" branch, you switch back to the "main" branch.

git checkout main

4. Making Additional Changes in Main: While you were working on the "feature", someone else made changes to the "main" branch. You can update it before merging:

git pull origin main

5. Merging the Feature Branch into Main: Now, you can merge the changes from the "feature" branch into the "main" branch.

git merge feature

If there are conflicting changes (e.g., both branches modified the same line in a file), Git will pause and prompt you to resolve these conflicts manually before completing the merge:

# Resolve conflicts in your editor, then stage the resolved files

git add <resolved-file>

# Complete the merge after resolving conflicts

git commit -m "Merge feature into main with conflict resolution"

Using "git merge", developers can effectively integrate changes from multiple branches, making it easier to collaborate on projects. Understanding how to handle different types of merges and resolve conflicts is crucial for maintaining a clean project history and ensuring smooth collaboration among team members.

-----------

The "git rebase" :

Git rebase is a powerful command in Git that allows you to integrate changes from one branch into another by moving the base of your current branch to a different commit. This process effectively rewrites the commit history, making it appear as though you started your work from a different point in the project history.

What Does Git Rebase Do?

1. Changing the Base: When you perform a rebase, you change the base of your current branch to a specified commit from another branch. This means that all the commits in your current branch will be reapplied on top of the new base commit, creating new commits in the process.

2. Linear History: One of the main advantages of rebasing is that it helps maintain a clean and linear project history. This is particularly useful for simplifying the commit log and making it easier to follow changes over time. Instead of having a branching structure, rebasing makes it look like all changes were made sequentially.

3. Interactive Rebasing: Git rebase can be performed in two modes: standard and interactive. The interactive mode allows you to modify commits during the rebase process, such as editing commit messages, squashing multiple commits into one, or even removing commits altogether.

To perform a standard rebase, you would typically use:

git rebase <base-branch>

For example, if you want to rebase your feature branch onto the main branch:

git checkout feature

git rebase main

This command will take all commits from "feature" and apply them on top of "main", effectively updating "feature" with the latest changes from `main`.

Example:

1. Initial Setup

- You have two branches: main and feature

- You make several commits on feature while others continue to update main.

2. Rebasing:

- To incorporate the latest changes from main into feature, you would execute:

git checkout feature

git rebase main

3. Resolving Conflicts:

- If there are conflicts during the rebase, Git will pause and allow you to resolve them. After resolving conflicts in the affected files, you would continue the rebase with:

git add <resolved-file>

git rebase --continue

Advantages of Git Rebase

Cleaner Commit History: By avoiding unnecessary merge commits, rebasing results in a more straightforward project history that is easier to read and understand.
Easier Debugging: A linear history simplifies tracking down bugs and understanding how features were developed over time.
Better Collaboration: Rebasing helps keep feature branches up-to-date with ongoing changes in the main branch, which can reduce conflicts when it's time to merge back into main.

What Happens During Rebase?

- Updating Your Work: When you run this command, Git looks at the latest commits in the `main` branch and applies your feature branch's commits as if you started working on them after the latest changes in main. This effectively updates your feature branch with any new changes that have been made in main.

- Cleaner History: Unlike merging, which creates a new commit that combines both branches (often making the history look complicated), rebasing keeps your project history neat and linear. It looks like all your work was done after the latest changes from main, even if it wasn't.

Example : Imagine you are working on a project:

1. You create a branch called feature to add a new feature.

2. While you're working, someone else adds new updates to the main branch.

3. To ensure your feature includes those updates, you use git rebase main.

4. After running this command, your commits from feature will be placed on top of the latest commits from main, making it look like you developed your feature with all the latest updates in mind

NOTE:

While rebasing can be very useful, it is important to note that rewriting history can lead to complications if not handled carefully—especially if you've already pushed your changes to a shared repository. It’s generally advised not to rebase public branches that others may be using.

Git rebase is a powerful tool for managing project history and integrating changes across branches while maintaining clarity and organization in your commit log.

Git rebase and the combination of `git pull` followed by `git merge` are related but not equivalent operations. Here’s a breakdown of their differences and how they function:

Git Pull vs. Git Rebase

1. Git Pull:

- The `git pull` command is essentially a combination of two commands: git fetch followed by git merge. When you run git pull, Git fetches the latest changes from the remote branch and then merges those changes into your current branch, creating a new merge commit if necessary. This can lead to a more complex commit history with multiple merge commits, especially in collaborative environments where many changes are being made concurrently.

git pull origin main

2. Git Rebase:

- The git rebase command, on the other hand, takes the commits from your current branch and re-applies them on top of another branch (often the updated main branch). This effectively rewrites the commit history, creating a linear progression of commits without additional merge commits. This can make the project history cleaner and easier to follow

git rebase main

Key Differences

History Structure:

- Merge: Results in a branching history with merge commits that reflect the integration points between branches.

- Rebase: Produces a linear history that appears as if all changes were made sequentially on top of the base branch.

Conflict Resolution:

- During a merge, if conflicts arise, you resolve them and then create a new merge commit.

- During a rebase, conflicts must be resolved at each commit being reapplied, and you continue the rebase process after resolving each conflict

Use Cases:

- Use git pull when you want to quickly integrate remote changes into your local branch, accepting the additional merge commit.

- Use git rebase when you want to keep a clean history, especially in feature branches that have not yet been shared with others

That way, use git fetch when you want to check for updates without altering your current work, git pull for quickly updating your branch with remote changes (but be cautious of conflicts), and git merge when you want to combine different branches of work.

Git Fork : A fork is a complete copy of a repository that allows you to make changes independently without affecting the original repository. It is often used in collaborative projects, especially on platforms like GitHub.

Purpose: Forking is typically used when you want to contribute to someone else's project. You can create your own version of the repository to experiment with or develop features without needing direct access to the original repository.

Isolation: A fork creates an entirely separate repository, meaning that changes made in your fork do not impact the original repository unless you explicitly submit a pull request.

LINUX & HPC : Advanced Large Scale Computing at a Glance !

Tuesday, October 29, 2024

Git Unlocked:: A Beginner's Guide to Version Control

The recursive clone of a Git repository:

The "git merge" :

No comments:

Post a Comment

Popular Posts

Translate