Sunday, February 9, 2025

Linux Memory Management and Layout

In Linux, RAM is divided into kernel space and user space to manage memory and protect the kernel from user applications. Here's a breakdown of how this division works, along with details on high memory, low memory, and crashkernel allocation:

Kernel Space vs. User Space

Kernel Space: This area is reserved for running the kernel, kernel extensions, and most device drivers. User-space applications cannot directly access this memory.

User Space: This is the memory area where application software and some drivers execute, typically with one address space per process.

The separation ensures that processes are protected from each other and that the kernel is protected from user-space applications.

Memory Division

1.  Address Space: On some architectures like x86-64 ppc64, the virtual address space is split into two halves. The bottom half is for user-space allocations, and the top half is for kernel allocations. For example, on x86, there can be a 3G/1G split, where 3GB is for user space and 1GB for the kernel.

2.  Memory Allocation: Memory is allocated as needed within the address space. The address space split determines the use of virtual addresses but doesn’t dictate physical memory use. The kernel allocates memory for its own binary and any additional needs, which cannot be swapped out.

3.  Memory Mapping: The kernel manages memory mapping with the help of hardware (MMU - memory management unit). The kernel maintains its own mappings shared by all processes, and each process gets its own user-space mapping.

High Memory vs. Low Memory

Applicability: The "high/low memory" split is primarily relevant to 32-bit architectures with relatively large physical RAM (more than ~1 GB). On 64-bit architectures or systems with smaller physical address spaces, the entire physical space can be accessed from the kernel virtual memory space, and all physical memory is considered "low memory".

Low Memory: In Linux systems, low memory is typically used for the kernel. A portion of the kernel virtual address space can be mapped as a single contiguous chunk into physical "low memory".

High Memory: High memory is often used for application space. It is a range of the kernel's memory space where data to be accessed is placed.

In essence, high memory is a region of kernel memory used to map physical memory that cannot be contiguously mapped with the rest of the kernel's virtual memory.

Crashkernel Memory Allocation:

Crashkernel is a reserved memory region that can be used to boot a second kernel in case of a system crash, enabling crash analysis and debugging. Its is design that some part of the this allocated in Low memory and other part at high memory

GRUB (GRand Unified Bootloader) is a boot loader that loads the Linux kernel, presenting a menu of operating systems or kernels to choose from at system startup. It allows users to select which OS to boot and pass arguments to the kernel. GRUB's RAM usage occurs in the real mode area (RMA), which is a part of RAM used by GRUB for its operations and to load boot components. The current size of the RMA is 768 MB. However, GRUB2 might fail with an out-of-memory error depending on bigger size of  kernel image or the initramfs

The memory layout of a program is divided into segments, each with a specific purpose. These segments include the text segment, data segment (initialized and uninitialized), heap, and stack. In the context of memory allocation, the parameters you've provided describe the boundaries and limits of memory usage for a program. Here's an explanation of each term:



memory_limit: This sets the maximum amount of memory in bytes that a script is allowed to allocate. It helps prevent poorly written scripts from using up all available memory. A value of 0 here (0000000000000000) likely means there is no memory limit. However, in some contexts, it might represent the initial memory limit which can be changed. In PHP, setting `memory_limit` to -1 means the script can use all the memory that is left over from the operating system and other important processes running.

alloc_bottom: This indicates the starting address of the memory region available for allocation. In your example, it's 0000000010350000.

alloc_top: This represents the highest address up to which memory can be allocated. Here, it's 0000000030000000.

alloc_top_hi: This could indicate a higher boundary for memory allocation, possibly in a system that uses segmented memory management. In this case, it's 0000000100000000.

rmo_top: This might refer to the top of the read-only memory region. It is 0000000030000000 in your layout.

ram_top: This likely indicates the top address of the available RAM. It's 0000000100000000 in your example.

The stack segment is near the top of memory with a high address, while the text, data, and heap segments have lower addresses. When a function is called, stack memory is allocated for it, and when a new local variable is declared, more stack memory is allocated, causing the stack to grow downwards. Stack memory allocation and deallocation are done automatically. The heap is where dynamic memory allocation takes place using functions like `malloc()` and `calloc()`. Unlike the stack, heap memory allocation is not continuous, and users can free heap memory, causing fragmentation.

Here's the conversion for each of your memory parameters:



memory_limit: 0000000000000000 (hex) = 0 bytes. It means there is no memory limit imposed.

alloc_bottom: 0000000010350000 (hex) = 271,151,104 bytes.

alloc_top: 0000000030000000 (hex) = 805,306,368 bytes.

alloc_top_hi: 0000000100000000 (hex) = 4,294,967,296 bytes.

rmo_top:0000000030000000 (hex) = 805,306,368 bytes.

ram_top:0000000100000000 (hex) = 4,294,967,296 bytes.

Now, calculating the sizes of memory ranges:

1) Allocation Range (alloc_bottom to alloc_top):
    alloc_top - alloc_bottom + 1 = 805,306,368 - 271,151,104 + 1 = 534,155,265$$ bytes
    In MB: $$534,155,265 / 1,048,576 \approx 509.41MB
2) Upper Allocation Range (alloc_bottom to alloc_top_hi):
    alloc_top_hi - alloc_bottom + 1 = 4,294,967,296 - 271,151,104 + 1 = 4,023,816,193$$ bytes
    In GB: 4,023,816,193 / 1,073,741,824 approx 3.75GB
3) RMO Size (from alloc_bottom to rmo_top):
    rmo_top - alloc_bottom + 1 = 805,306,368 - 271,151,104 + 1 = 534,155,265 bytes
    approx 509.41MB
4) RAM Size (from alloc_bottom to ram_top):
    ram_top - alloc_bottom + 1 = 4,294,967,296 - 271,151,104 + 1 = 4,023,816,193bytes
    approx 3.75GB

-----------------------------------------

The Real Mode Area (RMA) in the context of the Linux kernel and boot process refers to the region of memory where the system initially operates in real mode before transitioning to protected mode. In real mode, the processor can address only 1 MB of memory.

Here's a breakdown of how the RMA fits into the boot process:

1.  Real Mode Operation: When the computer starts, it boots into real mode. In this mode, the processor has a limited addressing capability (1MB).

2. GRUB Loading: GRUB (or any bootloader) operates initially in real mode. It loads the kernel image into memory using BIOS disk I/O services.

3.  Memory Arrangement: In real mode, RAM is organized such that the kernel image is loaded into memory by the boot loader. A small part of the kernel containing real-mode code is loaded below the 640K barrier, while the larger part that runs in protected mode is loaded after the first megabyte.

4.  Transition to Protected Mode: After loading the necessary components, the system switches from real mode to protected mode, which allows access to more memory and advanced features.

5. Memory Addressing: In Real Mode, memory access is done using Segmentation via a segment:offset system.


Saturday, January 11, 2025

Key Components in IBM Power Systems Boot Process

Key Components in IBM Power Systems Boot Process

Stage 1 : OPAL Firmware: Initializes hardware and provides runtime services. Passes control to Petitboot as the default bootloader.

Stage 2 : Petitboot: Functions as the primary bootloader. Uses kexec to load the Linux kernel and initramfs directly.

  • Scans available devices for bootable options.
  • Detects core.elf (GRUB binary) as a bootable ELF file.
  • Loads and executes core.elf.

Stage 3: GRUB: May be involved as an intermediate bootloader for Linux distributions that rely on GRUB configuration (e.g., RHEL, SLES, or Ubuntu Server).Works as part of the core.elf file, loaded by Petitboot in some scenarios.

  • GRUB reads its configuration file (e.g., /boot/grub/grub.cfg).
  • Presents a boot menu (if configured) or selects the default kernel.
  • Loads the Linux kernel and initramfs into memory.
  • Passes control to the kernel.

Steage 4: Linux Kernel : The kernel initializes the system and starts the init process.

----------------------------------------------------------------------------------------------------------------

Advantages :

  1. Compatibility: Supports Linux distributions with GRUB-based boot processes.
  2. Flexibility: Allows advanced boot scenarios (e.g., multiple kernels, chainloading).
  3. Optimization: Petitboot handles hardware initialization efficiently, while GRUB adds cross-distro compatibility.
While Petitboot is the default and primary bootloader in IBM Power Systems, GRUB can be used as part of the boot process, particularly through the core.elf file. Petitboot loads and executes GRUB when required, allowing Linux distributions to leverage GRUB's flexibility and maintain consistent boot processes across architectures. This combination ensures optimal performance and compatibility for enterprise-grade Linux distributions on Power Systems.

---------------------------- Power Firmware -------------------

In IBM's PowerPC architecture (commonly used in IBM Power Systems), firmware plays a critical role in managing hardware resources, initializing the system, and providing runtime services. Here's how it works and where firmware resides:

Key Firmware Components in IBM Power Systems

Hostboot:
Responsible for the low-level initialization of the system, such as memory controller setup and processor initialization. Resides in non-volatile storage (e.g., flash memory) on the system board.
Runs on the main processor during the very early stages of boot.

OpenPOWER Firmware (OPAL):
Acts as the interface between the hardware and the operating system.
Provides services such as interrupt handling, power management, and hardware abstraction.
Resides in non-volatile memory (flash storage) but is loaded into main system RAM for execution during boot.

Petitboot:
A Linux-based bootloader that uses the kexec mechanism to load the Linux kernel.
Petitboot itself resides on the system's non-volatile storage and is executed in main system RAM during the boot process.

System Management Firmware:
Manages system-level operations such as monitoring, diagnostics, and recovery.
Runs on a dedicated service processor (e.g., BMC) and resides in the BMC's non-volatile storage.
Where Firmware Resides

Non-Volatile Storage (Flash Memory):
Core firmware components like Hostboot, OPAL, and Petitboot are stored in the system's flash memory on the motherboard or a separate chip. This ensures persistence even when the system is powered off.

System RAM:
During the boot process, firmware like Hostboot, OPAL, and Petitboot are copied from flash memory into main system RAM for execution.
The Linux kernel uses OPAL calls to interact with hardware, and these services are available as long as the system runs.

Service Processor (BMC):
The BMC firmware resides in its own dedicated non-volatile memory on the service processor.
The BMC operates independently of the main system and manages power-on, firmware updates, and error reporting. Interaction with Linux OS on Power Systems

Firmware-to-OS Handoff:
OPAL firmware initializes hardware and performs diagnostics before handing control to the Linux kernel. Petitboot (running on top of OPAL) loads the Linux kernel via kexec.

Runtime Services:
OPAL continues to provide runtime services to the Linux kernel, such as hardware interrupts, error handling, and power state management. Linux interacts with firmware using the OPAL API and device tree structures.

Firmware Location on Filesystem:
Firmware blobs for devices (e.g., network cards, GPUs) are stored in /lib/firmware. Core system firmware (e.g., OPAL, Hostboot) does not reside in the Linux filesystem but in the system's non-volatile memory. Summary for Power Systems

Primary Firmware (Hostboot, OPAL, Petitboot): Resides in non-volatile storage (flash memory) on the system board. Executed in system RAM during boot and runtime.

Service Processor Firmware (BMC): Resides in dedicated non-volatile storage on the service processor. Operates independently of the main CPU and Linux OS.

Device Firmware: Resides in /lib/firmware on the Linux filesystem and is loaded into specific hardware devices by their drivers.

This modular design ensures separation between core firmware, runtime services, and device-specific firmware, enabling robust and scalable operations on IBM Power Systems.

------------------------------------------HMC-----------------------------------
The Hardware Management Console (HMC) is a critical component in managing IBM server systems, including IBM Power Systems and IBM Z mainframes. It serves as a physical or virtual appliance that provides a unified interface for system administrators to control and monitor multiple servers and their partitions.


Key Features of HMC

- Management Interface: The HMC offers both command line (SSH) and web-based interfaces, allowing for flexible access and management of the systems it oversees. This includes functionalities for monitoring system health, configuring hardware, and managing logical partitions.

- Multi-System Management: One of the significant advantages of the HMC is its ability to manage multiple servers simultaneously. This capability is essential for organizations with complex IT infrastructures, as it simplifies administration tasks and enhances operational efficiency.

- Virtualization Support: The HMC plays a crucial role in virtualization by enabling the creation and management of logical partitions (LPARs). This allows for better resource utilization and flexibility in deploying applications across different environments.

- Monitoring and Diagnostics: Administrators can quickly identify hardware issues through the HMC's monitoring tools. It provides real-time status updates and alerts, facilitating proactive maintenance and reducing downtime.

- Redundancy and Reliability: The HMC can be configured in redundant setups to ensure high availability. Dual HMCs can manage the same systems, providing backup capabilities in case one console fails.

- Security Features: The HMC is designed with security in mind, featuring a closed platform that restricts unauthorized software installations and limits access to essential functions. It is firewalled by default, with minimal open ports to enhance security against external threats.

Connection Between mkvterm and Virtual Serial Adapters

The mkvterm command on IBM's Hardware Management Console (HMC) is used to open a virtual terminal connection to a logical partition (LPAR). This command is closely associated with virtual serial adapters, which facilitate the connection between the HMC and the LPAR.

1. Virtual Serial Adapters:
   - Each LPAR typically has two virtual serial server adapters, allowing for console connections. These adapters are configured within the Virtual I/O Server (VIOS) environment.
   - The mkvtermerm command allows users to connect to these virtual serial adapters, effectively opening a console session for managing the LPAR

2. Usage of mkvterm:
   - The mkvterm command on the HMC corresponds directly to commands used in the VIOS for managing virtual terminal sessions. When you execute mkvterm, you specify the LPAR ID to establish a connection through an available virtual serial adapter.
   - If the HMC is unavailable, users can still access the LPAR console via VIOS using a similar command (mkvterm), which underscores the flexibility of managing LPARs through different interfaces.

3. Session Management:
   - It is crucial to manage these sessions properly. For instance, if a console session is active through VIOS, attempting to start another session via HMC will result in an error indicating that a terminal session is already open. This emphasizes the need for careful session handling to avoid conflicts.

4. Command Examples:
   - To create a virtual serial client adapter on VIOS, one might use commands like:
     
     chhwres -m ms02 -r virtualio --rsubtype serial -o a -p ms02-vio1 -s 45 -a adapter_type=client,remote_lpar_name=Machine02,remote_slot_num=0,supports_hmc=0
     ```
   - To start a console session for an LPAR using mkvterm:
     ```bash
     mkvterm -id <LPAR-ID>
     ```

The mkvterm command serves as a bridge between the HMC and virtual serial adapters, allowing administrators to manage LPARs efficiently through console connections. Proper configuration and management of these connections ensure seamless operations within IBM's Power Systems environment.