..

Part IV: Memory Management and Object Layout

Part IV of this guide delves into the intricate details of how Python manages memory and structures its objects in memory. Understanding these concepts is crucial for writing efficient Python code, optimizing performance, and debugging memory-related issues. This part covers the low-level memory layout of Python objects, the garbage collection mechanism, and how Python’s memory allocator works.

Table of Contents

10. Deep Dive Into Object Memory Layout

11. Runtime Memory Management & Garbage Collection

12. Memory Allocator Internals & GC Tuning


10. Deep Dive into Object Memory Layout

Understanding how Python objects are structured in memory is perhaps one of the most fundamental insights for truly comprehending Python’s runtime behavior, performance characteristics, and memory footprint. Every piece of data in Python, from a simple integer to a complex custom class instance, adheres to a specific low-level memory layout. This chapter dissects these layouts in detail, revealing the underlying C structures that power Python’s object model and memory management.

10.1. The Great Unification: PyObject Layout

At the bedrock of CPython’s object system is the PyObject C structure. Every single Python object, regardless of its type, begins with this standard header. This uniformity is what allows the CPython interpreter to treat all data consistently: to count references, determine types, and perform basic operations generically. The PyObject header typically contains two crucial fields:

While PyObject provides the essential object header, most complex Python objects (those that can participate in reference cycles, such as lists, dictionaries, custom class instances, and other mutable containers) are also tracked by the garbage collector. For these objects, an additional header, PyGC_Head, is prepended before the PyObject_HEAD in memory. This PyGC_Head contains two pointers, _gc_next and _gc_prev, which link the object into a doubly-linked list used by the generational garbage collector to manage and traverse objects.

Let’s visualize this common prefix assuming Py_ssize_t and pointers (ptr) are both 8 bytes on a 64-bit system:

Mental Diagram: PyGC_Head and PyObject_HEAD

Assuming ssize_t = 8 bytes, ptr = 8 bytes

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\
|                 *_gc_next                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyGC_Head (16 bytes)
|                 *_gc_prev                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
|                 ob_refcnt                     | \
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD (16 bytes)
|                 *ob_type                      | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /

Following this universal header, the specific data unique to that object type is stored.

10.2. Memory Layout of User Defined Classes

When you define a standard Python class without explicitly using __slots__, instances of that class are highly dynamic. Their attributes are stored in a dynamically allocated dictionary, accessible via a special instance attribute called __dict__. This dictionary provides immense flexibility, allowing you to add, remove, or modify attributes on an instance at any time, even after it’s been created. Consider the following class:

class A:
    def __init__(self):
        self.x = 0
        self.y = 1
        self.z = 2

The memory layout for an instance of this class starts with the standard PyGC_Head and PyObject_HEAD (total 32 bytes on 64-bit). Immediately following these headers, the instance holds pointers to two additional internal objects: a pointer to its __dict__ (the dictionary storing its attributes like x, y, z) and potentially a pointer to __weakref__ (if the object supports weak references). These are themselves Python objects and thus incur their own memory overhead.

Mental Diagram: Class (without __slots__)

             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
             |                    *_gc_next                  | |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyGC_Head (16 bytes)
             |                    *_gc_prev                  | |
C object --> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
instance     |                    ob_refcnt                  | \
pointer      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD (16 bytes)
             |                    *ob_type                   | |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
             |                    *__dict__                  | --> 8 bytes (ptr to dict object holding x, y, z)
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
             |                    *__weakref__               | --> 8 bytes (ptr to weakref object)
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
             |                      ...                      | (additional instance-specific C data, rare)

The __dict__ itself is a PyDictObject which has its own memory footprint. Each attribute like self.x stores a pointer to the actual integer object within this __dict__. This flexibility comes at a memory cost: every instance carries a __dict__ pointer, and the dictionary itself consumes memory, even if the class doesn’t have any instance attributes. This overhead becomes significant when creating a large number of instances.

def recursive_size(obj):
    size = sys.getsizeof(obj)
    print(f"Size of {obj.__class__.__name__}: {size} bytes")
    if hasattr(obj, "__dict__"):
        print("Size of __dict__:", sys.getsizeof(obj.__dict__))
        size += sys.getsizeof(obj.__dict__) + sum([recursive_size(v) for v in obj.__dict__.values()])
    if hasattr(obj, "__slots__"):
        size += sum([recursive_size(getattr(obj, slot)) for slot in obj.__slots__])
    return size

print("A basic size:", A.__basicsize__)   # Output: 16 (PyObject_HEAD)
# sys.getsizeof: PyGC_Head + PyObject_HEAD + __dict__ + __weakref__ pointers
print("A instance size including all pointers:", sys.getsizeof(A()))     # Outout: 48
print("A instance size including all attributes:", recursive_size(A()))  # Output: 428

# Output:
# A basic size: 16
# A instance size including all pointers: 48
# Size of A: 48 bytes
# Size of __dict__: 296
# Size of int: 28 bytes
# Size of int: 28 bytes
# Size of int: 28 bytes
# A instance size including all attributes: 428

10.3. Memory Layout of Slotted Classes

The __slots__ mechanism provides a way to tell Python not to create a __dict__ for each instance of a class. Instead, it reserves a fixed amount of space directly within the object’s C structure for the specified attributes. This significantly reduces the memory footprint for instances, especially when you have many of them, and also speeds up attribute access since there’s no dictionary lookup involved.

class B:
    __slots__ = "x", "y", "z"
    def __init__(self):
        self.x = 0
        self.y = 1
        self.z = 2

When __slots__ is defined, the named attributes ("x", "y", "z") are essentially mapped to offsets within the instance’s memory block, right after the standard object headers. This is much like how C structs work. If __slots__ is an empty tuple (__slots__ = ()), the instance will be as small as possible, containing only the basic PyGC_Head and PyObject_HEAD. If specific slots are defined, pointers to the values for those slots are directly embedded in the instance memory.

Mental Diagram: B object instance layout (with __slots__)

             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
             |                    *_gc_next                  | |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyGC_Head (16 bytes)
             |                    *_gc_prev                  | |
B object --> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /               \
instance     |                    ob_refcnt                  | \               |
pointer      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD |
             |                    *ob_type                   | |               |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /               |
             |                       *x                      | \               | basic size
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |               |
             |                       *y                      | | extra slots   |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |               |
             |                       *z                      | |               |
             +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /               /
print("B basic size:", B.__basicsize__) # PyObject_Head + slot pointers
print("B instance size including all pointers:", sys.getsizeof(B()))    # include PyGC_Head
print("B instance size including all attributes:", recursive_size(B())) # include slot values

# Outout:
# B basic size: 40
# B instance size including all pointers: 56
# Size of B: 56 bytes
# Size of int: 28 bytes
# Size of int: 28 bytes
# Size of int: 28 bytes
# B instance size including all attributes: 140

10.4. Memory Layout of Core Built-ins

Beyond user-defined classes, Python’s core built-in types also adhere to specific memory layouts, often optimized for their particular behavior and common use cases. Most variable-sized built-in types utilize a PyObject_VAR_HEAD, which is an extension of the PyObject_HEAD that includes an ob_size field. This ob_size field stores the number of elements or items within the variable-sized part of the object. On a 64-bit system, the PyObject_VAR_HEAD typically is 24 bytes (16 bytes for PyObject_HEAD + 8 bytes for ob_size).

Integer int

Python integers are (almost) arbitrary-precision, meaning they can represent numbers of any size, limited only by available memory. This is achieved by storing the integer’s value in a sequence of base-$2^{30}$ digits. Type digit is stored in a unsigned 32-bit C integer type, meaning 4 bytes. (Older systems use base-$2^{15}$ digits and unsigned shorts). Python also needs to remember the number of digits in the integer, which is stored in the ob_size field of the PyObject_VAR_HEAD. The sign of the integer is also stored in this field, where a negative value indicates a negative integer, positive values are stored as positive integers, and zero is represented by ob_size = 0.

This means, that the theoretical limit for a 64-bit python integer is in practice bounded only by the available memory. The maximum size of a digit is $2^{30}-1$ and the maximum number of digits is $2^{63}-1$, which means, that the maximum possible python integer is

\[(2^{30}-1)^{2^{63}-1} \approx 2^{30 \cdot 2^{63}} \approx 2^{2^{68}} \approx 10^{88848372616373700000}\]

The number of digits of this number in base 10 is about $10^{20}$ and we would need about 130 milion terabytes of memory to store it.

Mental Diagram: int object layout

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
|                 ob_refcnt                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
|                 *ob_type                      | | PyObject_VAR_HEAD (24 bytes)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
|                 ob_size                       | | (8 bytes: number of 'digits', sign included)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
|               digit[0] (value)                | (4 bytes for each digit)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|               digit[1] (value)                |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     ...                       |

Note that even int(0) has one digit, so the smallest possible python integer takes up 28 bytes.

import sys
print("Int basic size:", int.__basicsize__)              # PyObject_VAR_HEAD (24 bytes)
print("Int item size:", int.__itemsize__)                # Size of each digit (4 bytes)
print("Int (0) size:", sys.getsizeof(0))                 # Even zero has 1 digit
print("Int (2^30-1) size:", sys.getsizeof(2**30 - 1))    # One digit, within 30 bits
print("Int (2^30) size:", sys.getsizeof(2**30))          # Two digits, exceeds 30 bits
print("Int (2^60-1) size:", sys.getsizeof(2**60 - 1))    # Two digits, still within 60 bits
print("Int (512-bit) size:", sys.getsizeof(2**511 - 1))  # Multiple digits

# Output:
# Int basic size: 24
# Int item size: 4
# Int (0) size: 28
# Int (2^30-1) size: 28
# Int (2^30) size: 32
# Int (2^60-1) size: 32
# Int (512-bit) size: 96

You can inspect the C implementation of int here.

Boolean bool

Python booleans are a subclass of integers, with True represented as 1 and False as 0, meaning that they have the memory footprint of a one digit ineteger.

import sys
print("Bool basic size:", bool.__basicsize__)  # Output: 24 (PyObject_VAR_HEAD)
print("Bool full size:", sys.getsizeof(True))  # Output: 28 (24 + 4 for the single digit)

However, only two instances of bool are ever created in a Python session (True and False) and python stores them as singletons. So when you create a list of 10 booleans, it will not take up $10\cdot28=280$ bytes, but only $10*8$ bytes for the pointers to the two singleton instances.

x = bool(1)    # bool(0) = False, bool(>0) = True
y = bool(10)
print(f"True is Bool(1) is Bool(10): {x is True and x is y}")  # Output: True

Floating Point Number float

Python’s floating-point numbers are implemented using the C double type, which typically occupies 8 bytes (64 bits) of memory. This representation follows the IEEE 754 standard for double-precision floating-point arithmetic, allowing for a wide range of values and precision. The float object in Python includes the standard PyObject_HEAD and a field for the actual floating-point value.

You can inspect the C implementation of float here

String str

Python strings are immutable sequences of Unicode characters. Their memory layout is highly optimized. A string object (PyUnicodeObject in C) includes PyGC_Head, PyObject_HEAD, and then specific fields for strings: length (number of characters), hash (cached hash value), state (flags like ASCII/compact), and finally, the actual Unicode character data. CPython uses an optimized “compact” representation where ASCII strings use 1 byte per character, while more complex Unicode characters might use 1, 2 or 4 bytes per character, based on the string content, to save memory. The actual character data is stored directly adjacent to the object header and it may or may not be terminated by a C null-terminator (\0).

Note that the character data needs to be correctly aligned (charachters can be 1, 2, or 4 bytes long), so there is a padding field that ensures the character data starts at the correct memory address.

Mental Diagram: str object layout

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
|                  ob_refcnt                    | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD (16 bytes)
|                  *ob_type                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
|                   length                      | (8 bytes)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    hash                       | (8 bytes)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                   state                       | (8 bytes) - internal details
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                  padding                      | (0 to 24 bytes to correctly align character data)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                char_data[0]                   | (1 byte per char for ASCII)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                char_data[1]                   |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ...                        |

So the basic size of a string object is 40 bytes + padding.

import sys
print("String (empty) basic size:", str.__basicsize__)  # Output: 64 (includes max madding)
print("String (empty):", sys.getsizeof(""))             # Output: 41 = 40 + \0   (no padding needed)
print("String (1 char):", sys.getsizeof("a"))           # Output: 42 = 40 + 1 + \0
print("String (4 chars):", sys.getsizeof("abcd"))       # Output: 45 = 40 + 4 + \0
print("String (Unicode):", sys.getsizeof("řeřicha"))    # Output: 72

You can inspect the C implementation of str here.

Dynamic List list

Lists are mutable, ordered sequences that store pointers to other Python objects. Python lists are implemented as dynamic arrays, meaning they can grow and shrink in size as elements are added or removed. The list object (PyListObject in C) starts with a PyObject_VAR_HEAD, which includes the ob_size field indicating the current number of elements in the list. This is followed by a pointer to the actual array of pointers (**ob_item) that holds references to the elements in the list. The ob_alloc field indicates the total allocated capacity of this array. The following logical invariants always hold:

Mental Diagram: list object layout

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
|                 ob_refcnt                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
|                 *ob_type                      | | PyObject_VAR_HEAD (24 bytes)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
|                 ob_size                       | | (number of elements currently in list)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
|                **ob_item                      | --> pointer to array of pointers to PyObjects (elements)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                 ob_alloc                      | --> allocated capacity for internal array
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Meaning that the __basicsize__ of a list is 40 bytes (on a 64-bit system). Note that when you actually create a list, python adds the GC_Head to it, so the total size of an empty list is 56 bytes (40 + 16 for PyGC_Head).

import sys
print("List basic size:", list.__basicsize__)       # PyObject_HEAD + 2 pointers
print("Empty list size:", sys.getsizeof([]))        # includes GB_HEAD (16 bytes), ob_item=NULL
print("List with 1 item", sys.getsizeof([1]))       # includes the item pointer
print("List with 2 items:", sys.getsizeof([1, 2]))  # includes 2 item pointers

# Output (On a 64-bit system):
# List basic size: 40
# Empty list size: 56
# List with 1 item 64
# List with 2 items: 72

When you append to a list, if the current ob_alloc capacity is insufficient, Python allocates a larger array (size increases by a factor of 1.125 to amortize the allocation cost for average O(1) append), copies the existing pointers, and updates ob_alloc. This pre-allocation strategy means that sys.getsizeof() for a list reports the size of the list object itself plus the currently allocated space for its pointers, not just the space for its ob_size elements. The exact formula python uses to calculate the new array size is:

\[new\_alloc = 4 \cdot \Bigl\lfloor \frac{last\_alloc \cdot 1.125 + 3}{4} \Bigr\rfloor + 4\]

Which rounds up to the next multiple of 4 and adds 4 to ensure that the new allocation is always larger than the previous one.

# Initial capacity: 0 (size: 56 bytes)
# List size changed at 1 elements. New capacity = 4, factor = NaN
# List size changed at 5 elements. New capacity = 8, factor = 2.000
# List size changed at 9 elements. New capacity = 16, factor = 2.000
# List size changed at 17 elements. New capacity = 24, factor = 1.500
# List size changed at 25 elements. New capacity = 32, factor = 1.333
# List size changed at 33 elements. New capacity = 40, factor = 1.250
# List size changed at 41 elements. New capacity = 52, factor = 1.300
# List size changed at 53 elements. New capacity = 64, factor = 1.231
# List size changed at 65 elements. New capacity = 76, factor = 1.188
# List size changed at 77 elements. New capacity = 92, factor = 1.211
# List size changed at 93 elements. New capacity = 108, factor = 1.174
# List size changed at 109 elements. New capacity = 128, factor = 1.185
# List size changed at 129 elements. New capacity = 148, factor = 1.156
# List size changed at 149 elements. New capacity = 172, factor = 1.162
# List size changed at 173 elements. New capacity = 200, factor = 1.163
# List size changed at 201 elements. New capacity = 232, factor = 1.160
# List size changed at 233 elements. New capacity = 268, factor = 1.155
# List size changed at 269 elements. New capacity = 308, factor = 1.149

You can inspect the C implementation of list here.

Tuple tuple

Tuples are immutable, ordered sequences, storing pointers to other Python objects. Unlike lists, tuples are fixed-size once created. A tuple object (PyTupleObject in C) has a PyObject_VAR_HEAD (with ob_size indicating the number of elements), and its elements are stored directly as a contiguous array of PyObject* pointers immediately following the header. Since tuples are immutable, this array is allocated once and its size never changes.

Mental Diagram: tuple object layout

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
|                 ob_refcnt                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
|                 *ob_type                      | | PyObject_VAR_HEAD (24 bytes)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
|                 ob_size                       | / (number of elements in tuple)
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
|            items[0] (ptr to PyObject)         | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
|            items[1] (ptr to PyObject)         | | --> fixed-size array of pointers to elements
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
|                    ...                        | /

The size of a tuple is its __basicsize__ plus ob_size * __itemsize__.

import sys
print("Empty tuple basic size:", tuple.__basicsize__)  # PyObject_VAR_HEAD (24 bytes)
print("Empty tuple item size:", tuple.__itemsize__)    # Size of each item (ptr to element)
print("Empty tuple:", sys.getsizeof(()))               # includes GC_Head (+16 bytes)
print("Tuple (1,2,3):", sys.getsizeof((1, 2, 3)))      # includes 3 item pointers (3 * 8 bytes)

# Output:
# Empty tuple basic size: 24
# Empty tuple item size: 8
# Empty tuple: 40
# Tuple (1,2,3): 64

You can inspect the C implementation of tuple here.

Hashset set

Sets are unordered collections of unique, immutable elements. Their internal implementation (PySetObject in C) is based on a hash table. It is represented by an array of setentry structs. setentry represents an elements and stores the reference to it and its hash. Hash tables need to allocate much more memory that is the actual number of entries, to maintain sparsity, which helps reduce collisions and ensure O(1) average-case performance.

Mental Diagram: set object layout

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
|                  ob_refcnt                    | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD (16 bytes)
|                  *ob_type                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
|                    fill                       | --> Number active and dummy entries
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    used                       | --> Number active entries
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    mask                       | --> The table contains mask + 1 slots, and that's a power of 2
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
|              setentry *table                  | --> pointer to the internal hash table array
|                                               |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
|           other set-specific fields           | --> in total 152 bytes (on a 64-bit system)
|                                               |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Notably, the table points to a fixed-size small-table for small tables or to additional malloc’ed memory for bigger tables. The small-table is stored directly in the set object and contains 8 setentries, which is enough for very small sets.

import sys
print("Empty set basic size:", set.__basicsize__)      # base size
print("Empty set:", sys.getsizeof(set()))              # includes PyGC_Head (16 bytes)
print("Set {1,2,3}:", sys.getsizeof({1, 2, 3}))        # very small set, no resizing needed
print("Set {1,2,3,4,5}:", sys.getsizeof({1,2,3,4,5}))  # larger set

# Output (On a 64-bit system):
# Empty set basic size: 200
# Empty set: 216
# Set {1,2,3}: 216
# Set {1,2,3,4,5}: 472

The set grows proportionaly to the number of elements, with the internal has table always getting 4 times larger.

# Set size changed at 5 elements: New slots = 32
# Set size changed at 19 elements: New slots = 128, factor: 4.0
# Set size changed at 77 elements: New slots = 512, factor: 4.0
# Set size changed at 307 elements: New slots = 2048, factor: 4.0
# Set size changed at 1229 elements: New slots = 8192, factor: 4.0
# Set size changed at 4915 elements: New slots = 32768, factor: 4.0

You can inspect the C implementation of set here.

Dictionary dict

Dictionaries are mutable mappings of unique, immutable keys to values. Like sets, they are implemented using hash tables (PyDictObject in C), storing key-value pairs as entries in an internal array. Each entry typically holds the hash of the key, a pointer to the key object, and a pointer to the value object. Dictionaries also employ a strategy of over-allocating space to maintain a low load factor, which helps ensure efficient O(1) average-case lookup, insertion, and deletion times.

Mental Diagram: dict object layout (simplified)

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
|                   ob_refcnt                   | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD (16 bytes)
|                   *ob_type                    | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
|                   ma_used                     | --> Number of items in the dictionary
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
|       Internal dictionary representation      |
|                                               |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

A dictionary has a smaller base size than a set, because it doesn’t contain a prealocated small table.

import sys
print("Empty dict basic size:", dict.__basicsize__)  # base size
print("Empty dict:", sys.getsizeof({}))              # includes PyGC_Head (16 bytes)
print("Dict {1:'a',2:'b',3:'c'}:", sys.getsizeof({1:'a', 2:'b', 3:'c'}))

# Output (on a 64-bit system):
# Empty dict basic size: 48
# Empty dict: 64
# Dict {1:'a',2:'b',3:'c'}: 224

The dictionary grows proportionaly to the number of elements, the internal representaion always doubling in size.

# Dict size changed at 1 elements: New representation = 160
# Dict size changed at 6 elements: New representation = 288, factor: 1.80
# Dict size changed at 11 elements: New representation = 568, factor: 1.97
# Dict size changed at 22 elements: New representation = 1104, factor: 1.94
# Dict size changed at 43 elements: New representation = 2200, factor: 1.99
# Dict size changed at 86 elements: New representation = 4624, factor: 2.10
# Dict size changed at 171 elements: New representation = 9240, factor: 2.00
# Dict size changed at 342 elements: New representation = 18448, factor: 2.00
# Dict size changed at 683 elements: New representation = 36888, factor: 2.00
# Dict size changed at 1366 elements: New representation = 73744, factor: 2.00
# Dict size changed at 2731 elements: New representation = 147480, factor: 2.00
# Dict size changed at 5462 elements: New representation = 294928, factor: 2.00

You can inspect the C implementation of dict here.

Key Takeaways


11. Runtime Memory Management & Garbage Collection

Python’s reputation for ease of use often belittles the sophisticated machinery humming beneath its surface, especially concerning memory management. Unlike languages where developers explicitly manage memory (e.g., C/C++), Python largely automates this crucial task. However, a deep understanding of its internal mechanisms, particularly CPython’s approach, is vital for writing performant, predictable, and memory-efficient applications, and for diagnosing subtle memory-related bugs. This chapter delves into the core principles and components that govern how Python allocates, tracks, and reclaims memory during program execution.

11.1. PyObject Layout (Revision)

At the very heart of CPython’s memory model lies a fundamental principle: everything is an object. Numbers, strings, lists, functions, classes, modules, and even None and True/False – all are represented internally as PyObject structs in C. This uniformity is a cornerstone of Python’s flexibility and dynamism, allowing the interpreter to handle disparate data types in a consistent manner through a generic object interface. This “object-all-the-way-down” philosophy simplifies the interpreter’s design, as it doesn’t need to special-case different data types for fundamental operations like memory management or type checking.

Every PyObject in CPython begins with a standard header, which provides essential metadata that the interpreter uses to manage the object. This header is typically composed of at least two fields: ob_refcnt (object reference count) and ob_type (pointer to type object). The ob_refcnt is a C integer that tracks the number of strong references pointing to this object, forming the basis of Python’s primary memory reclamation strategy, reference counting. The ob_type is a pointer to the object’s type object, which is itself a PyObject that describes the object’s type, methods, and attributes. This pointer allows the interpreter to perform runtime type checking and dispatch method calls correctly.

For objects whose size can vary, such as lists, tuples, or strings, CPython uses a slightly different but related structure called PyVarObject. This struct extends the basic PyObject header with an additional ob_size field, which stores the number of elements or bytes the variable-sized object contains. Imagine a layered structure: a PyObject provides the universal base, PyVarObject adds variability for collections, and then specific object implementations (like PyListObject or PyDictObject) append their unique data fields. This consistent header allows the interpreter’s core C routines to manipulate objects generically, casting them to a PyObject* pointer to access their reference count or type, regardless of the specific Python type they represent.

It’s crucial to understand that while PyObject_HEAD is fundamental to all Python objects, some objects additionally incorporate a PyGC_Head during their runtime existence. This PyGC_Head, typically prepended before the PyObject_HEAD in memory, contains two pointers: _gc_next and _gc_prev. These pointers are used by Python’s generational garbage collector to link objects into a doubly-linked list, enabling efficient traversal and management of objects that can participate in reference cycles. Therefore, a Python object in memory can be thought of as consisting of an optional PyGC_Head, followed by its fixed PyObject_HEAD, and then any object-specific data.

Mental Diagram: PyObject Layout (General Form)

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\
|                 *_gc_next                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyGC_Head (16 bytes), for GC-tracked objects
|                 *_gc_prev                     | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
|                 ob_refcnt                     | \
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD (16 bytes), all objects have this
|                 *ob_type                      | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
|                                               |
|               Object-Specific Data            | --> optional, varies by type
|                                               |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

This uniform PyObject interface means that any piece of Python code, or any C extension interacting with Python objects, can safely retrieve an object’s reference count or its type, enabling a consistent and efficient underlying object model.

11.2. Reference Counting & The Garbage Collector

CPython employs a hybrid memory management strategy that combines two primary mechanisms: reference counting for immediate object reclamation and a generational garbage collector for resolving reference cycles. This two-pronged approach aims to balance performance (quick reclamation) with correctness (handling circular references).

Reference Counting is the simplest and most prevalent form of memory management in CPython. Every PyObject maintains a counter, ob_refcnt, which tracks the number of strong references pointing to it. When an object is created, its ob_refcnt is initialized. Each time a new reference to the object is created (e.g., variable assignment, passing an object as an argument, storing it in a container), its ob_refcnt is incremented via Py_INCREF (a C macro). Conversely, when a reference is removed (e.g., variable goes out of scope, del statement, container cleared), its ob_refcnt is decremented via Py_DECREF. When ob_refcnt drops to zero, it means no strong references to the object remain. At this point, the object is immediately deallocated, and its memory is returned to the system. This deterministic and prompt reclamation makes reference counting very efficient for most common scenarios, as memory is freed as soon as it’s no longer needed, reducing memory footprint and fragmentation.

However, reference counting has a critical limitation: it cannot detect and reclaim reference cycles. A reference cycle occurs when two or more objects refer to each other in a closed loop, even if no external references point to the cycle. For example, if object A refers to object B, and object B refers to object A, both A and B will have an ob_refcnt of at least 1, preventing them from being deallocated, even if they are otherwise unreachable. This would lead to a memory leak. To address this, CPython incorporates a generational garbage collector (GC) that runs periodically.

The generational garbage collector operates on top of reference counting specifically to find and reclaim objects involved in reference cycles. It is based on the generational hypothesis, which posits that most objects are either very short-lived or very long-lived. To optimize collection, objects are divided into three “generations” (0, 1, and 2). Newly created objects start in generation 0. If an object survives a garbage collection cycle, it is promoted to the next generation. The GC runs more frequently on younger generations (e.g., generation 0 is collected most often, generation 2 least often) because it’s statistically more likely to find short-lived objects that are no longer needed there.

The PyGC_Head plays a crucial role in this generational garbage collection process. As mentioned earlier, this header is present in objects that are collectable, meaning they can potentially participate in reference cycles. The PyGC_Head contains two pointers, _gc_next and _gc_prev, which form a doubly-linked list. Each generation (0, 1, 2) maintains its own linked list of objects currently residing in that generation. When the garbage collector runs, it traverses these linked lists to identify unreachable objects, even those whose reference counts are non-zero due to circular references.

Now, let’s address why some objects, like int and str, do not have a PyGC_Head, while others, such as custom classes, list, set, and dict, do:

11.3. Object Identity and Object Reuse

In Python, every object has a unique identity, a type, and a value. The built-in id() function returns the “identity” of an object. This identity is an integer that is guaranteed to be unique and constant for that object during its lifetime. In CPython, id(obj) typically returns the memory address of the object in CPython’s memory space. This makes id() a powerful tool for understanding object uniqueness and how Python manages memory.

Understanding object identity is crucial for distinguishing between objects that have the same value but are distinct entities in memory, versus objects that are actually the same instance. For mutable objects like lists, this distinction is clear: a = [1, 2], b = [1, 2] will result in id(a) != id(b), even though a == b (they have the same value). This is because a and b are two separate list objects in memory. For immutable objects, the behavior can be less intuitive due to CPython’s optimization known as “interning.”

Interning is a technique where CPython pre-allocates and reuses certain immutable objects to save memory and improve performance. The most common examples are small integers and certain strings. Small integers (typically from -5 to 256) are interned, meaning that any reference to these numbers will point to the exact same object in memory. This is why id(10) == id(10) holds true, and even id(10) == id(5 + 5). This optimization is possible because integers are immutable; their value never changes, so there’s no risk of one user inadvertently modifying another’s “10.” Similarly, short, common strings and string literals in source code are often interned.

# Small integers are interned
a = 10
b = 5 + int(2.5 * 2)
print(f"id(a): {id(a)}, id(b): {id(b)}, a is b: {a is b}")  # True

# Larger integers are not typically interned
x = 1000
y = 500 + int(2.5 * 200)
print(f"id(x): {id(x)}, id(y): {id(y)}, x is y: {x is y}")  # False (usually)

# Large integers with obvious same value start are typically interned
x = 100_001
y = 100_000 + 1
print(f"id(x): {id(x)}, id(y): {id(y)}, x is y: {x is y}")  # True (usually)

# String interning
s1 = "hello_world!!!"
s2 = "hello_world" + 3 * "!"
print(f"id(s1): {id(s1)}, id(s2): {id(s2)}, s1 is s2: {s1 is s2}")  # True

# May or may not be interned depending on CPython version/optimizations
s3 = "hello" + "_" + "world"
s4 = "hello_world"
print(f"id(s3): {id(s3)}, id(s4): {id(s4)}, s3 is s4: {s3 is s4}")  # True or False

# Output (mar vary based on CPython version):
# id(a): 1407361743597, id(b): 14073617435975, a is b: True
# id(x): 2381950788208, id(y): 23819535696482, x is y: False
# id(x): 2381953568560, id(y): 23819535685606, x is y: True
# id(s1): 238195384017, id(s2): 2381953840176, s1 is s2: True
# id(s3): 238195384305, id(s4): 2381953843056, s3 is s4: True

Understanding id() and interning is vital for debugging and for making correct comparisons. The is operator checks for object identity (id(obj1) == id(obj2)), while the == operator checks for value equality (obj1.__eq__(obj2)). While id() provides insight into CPython’s memory management, it should generally not be used for comparison in application logic, as Python’s interning behavior for non-small integers and non-literal strings is an implementation detail and not guaranteed across different Python versions or implementations. Always use == for value comparison unless you specifically need to check if two variables refer to the exact same object in memory.

11.4. Weak References

The reference counting mechanism, while efficient, imposes a strict ownership model: as long as an object has a strong reference, it cannot be garbage collected. This can become problematic in scenarios where you need to refer to an object without preventing its collection, such as implementing caches, circular data structures (where weak references can help break cycles), or event listeners where the listener should not keep the subject alive. This is where weak references come into play.

A weak reference to an object does not increment its ob_refcnt. This means that if an object is only referenced by weak references, and its strong reference count drops to zero, it becomes eligible for garbage collection. When the object is collected, any weak references pointing to it are automatically set to None or become “dead” (they will return None when dereferenced). This allows you to build data structures where objects can be “observed” or “linked” without creating memory leaks.

Python provides the weakref module to work with weak references. The most common weak reference types are weakref.ref() for individual objects and weakref.proxy() for proxy objects that behave like the original but raise an ReferenceError if the original object is collected. You can also use weakref.WeakKeyDictionary and weakref.WeakValueDictionary, which are specialized dictionaries that hold weak references to their keys or values, respectively. This makes them ideal for caches where you want entries to be automatically removed if the cached object is no longer referenced elsewhere.

For instances of standard user-defined classes (i.e., those not using __slots__), CPython’s memory layout includes a dedicated, internal slot for __weakref__ management. This is not an entry in the instance’s __dict__, but rather a specific pointer or offset within the underlying C structure of the PyObject itself, typically named tp_weaklistoffset in the PyTypeObject that defines the class. This internal slot serves as the anchor point for all weak references pointing to a particular object instance. When you create a weak reference (e.g., weakref.ref(obj)), the weakref machinery registers this weak reference with the object via this internal slot. This registration allows Python to efficiently iterate through and “clear” (set to None) all associated weak references immediately before the object is deallocated. It ensures that once an object is gone, any weak pointers to it become invalid.

import weakref

class MyObject:
    def __init__(self, name):
        self.name = name
        print(f"MyObject {self.name} created")
    def __del__(self):
        print(f"MyObject {self.name} deleted")
    def __str__(self) -> str:
        return f"MyObject({self.name})"

obj = MyObject("Strong")
weak_obj_ref = weakref.ref(obj)

# You can access the __weakref__ slot, though it's typically an internal detail
# It will be a weakref.ReferenceType object or None if no weakrefs exist
print(f"__weakref__ attribute before del: {type(obj.__weakref__)}")
print(f"Dereferencing weak_obj_ref: {weak_obj_ref()}")
del obj
print(f"Dereferencing weak_obj_ref after del: {weak_obj_ref()}")

# Output:
# MyObject Strong created
# __weakref__ attribute before del: <class 'weakref.ReferenceType'>
# Dereferencing weak_obj_ref: MyObject(Strong)
# MyObject Strong deleted
# Dereferencing weak_obj_ref after del: None

In contrast to standard classes, instances of classes that explicitly define __slots__ do not have the __weakref__ attribute by default. The core purpose of __slots__ is to achieve greater memory efficiency and faster attribute access by creating a fixed, compact memory layout for instances, without the overhead of a dynamic __dict__. When __slots__ are used, Python only allocates memory for the attributes explicitly listed in the __slots__ tuple. Since the __weakref__ slot is an optional feature for object instances, it is not included in this minimalist layout unless you specifically request it. If you define __slots__ in a class and still need its instances to be targetable by weak references, you must explicitly include '__weakref__' as one of the entries in the __slots__ tuple. This tells Python to reserve the necessary space in the instance’s fixed memory layout for managing weak references, thus allowing weakref.ref() to target instances of that slotted class.

The weakref module is an indispensable tool for advanced memory management patterns. It allows developers to break undesired strong reference cycles, implement efficient caches (like memoization or object pools) that don’t indefinitely hold onto memory, and design flexible event systems where listeners don’t prevent the objects they’re observing from being collected. By understanding how the __weakref__ attribute ties into the memory layout and the implications for slotted classes, you can write more robust and memory-efficient Python applications, especially those dealing with long-running processes or large numbers of objects where precise memory control is paramount.

11.5. Memory Usage Tracking

While Python’s automatic memory management simplifies development, it can sometimes hide memory issues, such as unexpected object retention or gradual memory growth (memory leaks). Python provides powerful tools in its standard library to inspect and track memory usage, helping developers diagnose and resolve such problems. The gc module provides an interface to the generational garbage collector, and tracemalloc offers detailed tracing of memory blocks allocated by Python.

The gc module allows you to interact with the generational garbage collector. You can manually trigger collections using gc.collect(), which can be useful for debugging or in scenarios where you need to reclaim memory at specific points. More importantly, gc provides debugging flags (gc.DEBUG_STATS, gc.DEBUG_COLLECTABLE, gc.DEBUG_UNCOLLECTABLE) that can be set to get detailed output during GC runs. gc.DEBUG_STATS will print statistics about collected objects, while gc.DEBUG_COLLECTABLE will print information about objects found to be collectable, and gc.DEBUG_UNCOLLECTABLE (the most critical for debugging leaks) will show objects that were found to be part of cycles but could not be reclaimed (e.g., due to __del__ methods in cycles).

import gc

# Enable GC debugging stats
gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_UNCOLLECTABLE)

class MyLeakObject:
    def __init__(self, name):
        self.name = name
        self.ref = None # Will create a cycle later

    def __del__(self):
        print(f"__del__ called for {self.name}")

# Create a reference cycle
a = MyLeakObject("A")
b = MyLeakObject("B")
a.ref = b
b.ref = a

# Delete external references; objects are now only part of a cycle
del a
del b

print("Attempting to collect garbage...")
gc.collect()
print("Collection finished.")

# Output:
# Attempting to collect garbage...
# gc: collecting generation 2...
# gc: objects in each generation: 357 5086 0
# gc: objects in permanent generation: 0
# __del__ called for A
# __del__ called for B
# gc: done, 2 unreachable, 0 uncollectable, 0.0010s elapsed
# Collection finished.

For more granular memory profiling, the tracemalloc module is invaluable. Introduced in Python 3.4, tracemalloc tracks memory allocations made by Python. It allows you to:

import tracemalloc

tracemalloc.start()

# Simulate memory allocation in a loop
snapshots = []
data = []
for i in range(4):
    data += [str(j) * (100 + i * 50) for j in range(1000 + i * 500)]       # line 9
    more_data = [bytearray(1024 + i * 512) for _ in range(500 + i * 200)]  # line 10
    snapshots.append(tracemalloc.take_snapshot())
    # Deallocate some memory
    if i % 2 == 1:
        data.clear()
        del more_data

# Compare snapshots to see memory changes over iterations
for idx in range(1, len(snapshots)):
    print(f"\n--- Snapshot {idx} vs {idx-1} ---")
    stats = snapshots[idx].compare_to(snapshots[idx - 1], "lineno")
    for stat in stats[:2]:
        print(stat)

tracemalloc.stop()

# Output:
# --- Snapshot 1 vs 0 ---
# module.py:9:  size=1118 KiB (+788 KiB),  count=2501 (+1500), average=458 B
# module.py:10: size=1095 KiB (+563 KiB),  count=1401 (+400),  average=800 B

# --- Snapshot 2 vs 1 ---
# module.py:10: size=1858 KiB (+763 KiB),  count=1802 (+401),  average=1056 B
# module.py:9:  size=1441 KiB (+323 KiB),  count=2001 (-500),  average=738 B

# --- Snapshot 3 vs 2 ---
# module.py:9:  size=3731 KiB (+2290 KiB), count=4501 (+2500), average=849 B
# module.py:10: size=2820 KiB (+962 KiB),  count=2202 (+400),  average=1311 B

By combining the gc module for understanding garbage collection behavior and tracemalloc for detailed allocation tracing, developers can gain profound insights into their application’s memory footprint, detect unwanted object retention, and efficiently debug memory-related performance bottlenecks or leaks.

11.6. Stack Frames & Exceptions

Exception handling is a critical aspect of robust software, and Python’s mechanism for this is closely tied to its runtime execution model, particularly the concept of stack frames. When a function is called in Python, a new frame object (a PyFrameObject in C) is created on the call stack. This frame object encapsulates the execution context of that function call.

A stack frame contains all the necessary information for a function to execute and resume correctly:

When an exception occurs within a function, Python looks for an appropriate except block in the current frame. If none is found, the exception propagates up the call stack. This process is called stack unwinding. The interpreter uses the f_back pointer of the current frame to move to the caller’s frame, and the search for an except block continues there. This continues until an except block is found to handle the exception, or until the top of the call stack (the initial entry point of the program) is reached, at which point the unhandled exception causes the program to terminate and prints a traceback.

def third_function():
    print("Inside third_function - dividing by zero")
    # This will raise a ZeroDivisionError
    result = 1 / 0
    print("This line will not be reached.")

def second_function():
    print("Inside second_function")
    third_function()
    print("This line in second_function will not be reached.")

def first_function():
    print("Inside first_function")
    try:
        second_function()
    except ZeroDivisionError as e:
        print(f"Caught an error in first_function: {e}")
    print("first_function finished.")

first_function()

# Output:
# Inside first_function
# Inside second_function
# Inside third_function - dividing by zero
# Caught an error in first_function: division by zero
# first_function finished.

In the example above, when 1 / 0 occurs in third_function, an exception is raised. third_function doesn’t handle it, so the stack unwinds to second_function. second_function also doesn’t handle it, so it unwinds further to first_function. first_function has a try...except block for ZeroDivisionError, so it catches the exception, prints the message, and then continues execution from that point. The traceback you see when an unhandled exception occurs is essentially a representation of these stack frames, showing the function calls (and their locations) from where the exception originated, all the way up to where it was (or wasn’t) handled.

Understanding stack frames is essential for effective debugging and for optimizing recursive functions. Each frame object consumes memory, and excessively deep recursion can lead to a RecursionError (due to exceeding the interpreter’s recursion limit, which prevents stack overflow from unbounded recursion) before a true system stack overflow. This knowledge allows developers to reason about program flow, debug exceptions more effectively, and understand the memory overhead associated with function calls.

Advanced Exception Handling: try, except, else, and finally

While the basic try and except blocks are fundamental for catching and handling errors, Python’s exception handling construct offers more nuanced control flow through the else and finally clauses. Mastering these allows for cleaner, more robust, and semantically correct error management within your applications, moving beyond simple error suppression to proper resource management and distinct success/failure pathways.

The full syntax of an exception handling block is try...except...else...finally. Each part plays a distinct role:

A key best practice is to keep except blocks specific and minimal, handling only the direct error conditions you anticipate and know how to recover from. Avoid broad except Exception: unless absolutely necessary, as it can hide unexpected bugs. When using finally, focus solely on resource deallocation or state reset. Avoid complex logic within finally to prevent new exceptions that could obscure the original error. Properly structured try-except-else-finally blocks are a hallmark of robust Python code, ensuring both error resilience and proper resource management.

Key Takeaways


12. Memory Allocator Internals & GC Tuning

Having explored the fundamental PyObject structure, reference counting, and the generational garbage collector, we now descend another layer into CPython’s memory management: the underlying memory allocators and advanced garbage collector tuning. Understanding how CPython requests and manages memory from the operating system, and how it optimizes for common object types, is crucial for truly mastering performance and memory efficiency in long-running or memory-intensive Python applications. This chapter will reveal these intricate mechanisms and provide the tools to inspect and fine-tune them.

12.1. Memory Allocation: obmalloc and Arenas

CPython doesn’t directly call malloc and free for every single object allocation and deallocation. Doing so would incur significant overhead due to frequent system calls and general-purpose allocator complexities. Instead, CPython implements its own specialized memory management layer on top of the system’s malloc (or VirtualAlloc on Windows), primarily for Python objects. This layer is often referred to as obmalloc (object malloc). The obmalloc allocator is designed to be highly efficient for the small, numerous, and frequently created/destroyed objects that characterize typical Python programs.

The core strategy of obmalloc revolves around a hierarchical structure of arenas and pools. At the highest level, CPython requests large chunks of memory from the operating system. These large chunks, typically 256KB on 64-bit systems, are called arenas. An arena is essentially a large, contiguous block of memory designated for Python object allocation. CPython pre-allocates a few arenas, and more are requested from the OS as needed. This reduces the number of direct system calls to malloc, as subsequent Python object allocations can be satisfied from within these already-allocated arenas.

Each arena is further subdivided into a fixed number of pools. A pool is a smaller, fixed-size block of memory, typically 4KB. Critically, each pool is dedicated to allocating objects of a specific size class. For instance, one pool might only allocate 16-byte objects, another 32-byte objects, and so on. This “size-segregated” approach is incredibly efficient because it eliminates the need for complex metadata or fragmentation management within the pool. When an object of a particular size is requested, obmalloc can quickly find a pool designated for that size and allocate a block from it.

Mental Diagram: obmalloc Hierarchy

+-------------------------------------------------+
|                  Operating System               |
+-------------------------------------------------+
                        ^
                        |  Requests Large Chunks
                        v
+-------------------------------------------------+
|               CPython obmalloc Layer            |
+-------------------------------------------------+
|                                                 |
+-------------------------------------------------+
|            Arena 1  (e.g., 256KB)               |
|  +----------+ +----------+ +----------+         |
|  | Pool A   | | Pool B   | | Pool C   | ...     |
|  | (16-byte)| | (32-byte)| | (64-byte)|         |
|  +----------+ +----------+ +----------+         |
|     | Allocates specific size objects           |
|     v                                           |
|   +---+ +---+ +---+                             |
|   |Obj| |Obj| |Obj|  (e.g., 16-byte objects)    |
|   +---+ +---+ +---+                             |
+-------------------------------------------------+
|            Arena 2  (e.g., 256KB)               |
|  +----------+ +----------+ +----------+         |
|  | Pool D   | | Pool E   | | Pool F   | ...     |
|  +----------+ +----------+ +----------+         |
+-------------------------------------------------+
|           (More Arenas as needed)               |

This tiered allocation strategy offers several benefits: reduced system call overhead, improved cache locality (objects of similar sizes are often grouped), and minimized internal fragmentation within pools since all blocks within a pool are of the same uniform size. This specialized allocator is a cornerstone of CPython’s ability to handle the rapid creation and destruction of numerous small Python objects efficiently.

12.2. Small-object Optimizations: Free Lists

Building upon the obmalloc arena/pool structure, CPython employs further optimizations for very common, small, and frequently deallocated objects: free lists. A free list is essentially a linked list of deallocated objects of a particular type or size. When an object is deallocated (i.e., its reference count drops to zero), instead of immediately returning its memory to the system or even to the obmalloc pool, CPython might place it onto a type-specific free list.

The most prominent examples of objects managed by free lists are integers, floats, tuples, lists, and dicts, especially for smaller sizes or values. For instance, there’s a free list for small integer objects (outside the -5 to 256 range, which are singletons), a free list for float objects, and free lists for empty or small tuples, lists, and dictionaries. When a new object of that type and size is requested, CPython first checks its corresponding free list. If a previously deallocated object is available, it’s simply reinitialized and reused, bypassing the entire allocation process (system call, arena, pool, etc.). This is incredibly fast.

# Example of potential free list reuse (implementation detail, not guaranteed)
a = [1, 2]
print(f"id(a): {id(a)}")
del a # a is deallocated, its memory might go to a free list for empty lists

c = [1, 2] # Might reuse the memory block from 'a'
print(f"id(c): {id(c)}") # Could potentially be the same as id(a) if free list reuse happened

# Output:
id(a): 1725905330560
id(c): 1725905330560

The benefits of free lists are substantial: they virtually eliminate the overhead of memory allocation and deallocation for very common operations, drastically reducing CPU cycles spent on memory management. This mechanism leverages the observation that many programs exhibit patterns of creating and destroying temporary objects of similar types and sizes. By holding onto these deallocated blocks, CPython avoids repeated expensive trips to the underlying memory allocator. However, free lists are finite in size; if a free list exceeds a certain maximum length (e.g., 80 elements for empty tuples), excess deallocated objects are then returned to the obmalloc pool for general reuse. This prevents free lists themselves from consuming excessive amounts of memory for rarely reused objects.

12.3. String Interning

String interning is a powerful optimization in CPython aimed at reducing memory consumption and speeding up string comparisons. Because strings are immutable, identical string literal values can safely point to the same object in memory without any risk of one being modified and affecting others. String interning is the process by which CPython maintains a pool of unique string objects. When a new string literal is encountered, CPython first checks this pool. If an identical string already exists, the existing object is reused; otherwise, the new string is added to the pool. This is also one of the reasons why the str type has its hash directly stored in the PyObject structure, allowing for fast equality checks.

CPython automatically interns certain strings:

Strings created dynamically (e.g., from user input, network data, or string concatenation results) are generally not automatically interned unless they meet specific criteria or are explicitly interned using sys.intern().

import sys

s1 = "my_string" # Literal, likely interned
s2 = "my_string" # Same literal, refers to the same object
print(f"s1 is s2: {s1 is s2}") # True

s3 = "my" + "_" + "string" # Dynamically created, might not be interned
s4 = "my_string"           # Depends on CPython optimization at compile time
print(f"s3 is s4 (dynamic vs literal): {s3 is s4}") # False or True

s5 = sys.intern("another_string") # Explicitly interned
s6 = "another_string"
print(f"s5 is s6 (explicitly interned): {s5 is s6}") # True

The benefits of interning are two-fold:

  1. Memory Reduction: Instead of multiple copies of identical string data, there’s only one. This can significantly reduce memory footprint in applications that use many repeated strings (e.g., parsing JSON/XML where keys repeat, or large sets of identical categorical data).
  2. Performance Improvement for Comparisons: When comparing two interned strings, Python can simply compare their memory addresses (using is or a quick internal PyObject_RichCompareBool check) instead of performing a character-by-character comparison. This O(1) identity check is much faster than an O(N) character-by-character comparison, where N is the string length. While == still performs a value comparison, it often benefits from interning checks first.

Beyond strings, CPython also shares other immutable objects:

These sharing mechanisms contribute significantly to CPython’s overall memory efficiency and performance, reducing both allocation overhead and the need for expensive comparisons.

12.4. GC Tunables and Thresholds

The generational garbage collector, described in Chapter 10, is not a black box; its behavior can be inspected and subtly tuned through the gc module. Understanding its internal “tunables” allows developers to optimize its performance for specific application workloads, especially in long-running services where predictable memory behavior is critical. The primary tunables are the collection thresholds.

CPython’s GC maintains three generations (0, 1, and 2). Each generation has a threshold associated with it: threshold0, threshold1, and threshold2. These thresholds represent the maximum number of new allocations (or more precisely, “allocations minus deallocations” of collectable objects) that can occur in that generation before the GC considers running a collection for that generation.

You can inspect and modify these thresholds using gc.get_threshold() and gc.set_threshold(). The default thresholds are typically (2000, 10, 10). This means:

import gc

# Get current thresholds
print(f"Default GC thresholds: {gc.get_threshold()}")

# Set new thresholds (e.g., for more frequent/less frequent collection)
gc.set_threshold(1000, 5, 5) # Example: Collect Gen 0 less often, Gen 1/2 more often
print(f"New GC thresholds: {gc.get_threshold()}")

# Output:
# Default GC thresholds: (2000, 10, 10)
# New GC thresholds: (1000, 5, 5)

Tuning these thresholds depends heavily on your application’s memory allocation patterns. For applications with many short-lived objects, you might consider decreasing threshold0 to collect more frequently, freeing memory sooner. For applications with many long-lived objects and fewer cycles, increasing thresholds might reduce the overhead of unnecessary GC runs. However, overly aggressive collection can introduce performance pauses, while overly infrequent collection can lead to higher memory usage. The best approach involves profiling and experimentation, as discussed in the next section.

12.5. Optimizing Long-Running Processes

For long-running Python services, such as web servers, background workers, or data processing pipelines, memory behavior can be a critical concern. Gradual memory growth (memory leaks), sudden spikes in memory usage, or unpredictable pauses due to garbage collection cycles can severely impact performance and stability. Effective optimization requires systematic profiling and careful tuning.

Profiling Memory Behavior:

Tuning Strategies:

  1. Adjusting GC Thresholds: Based on profiling data, you might adjust gc.set_threshold(). If your application frequently creates and destroys many short-lived objects, a lower threshold0 might free memory faster. If objects tend to be long-lived, higher thresholds could reduce collection overhead. Experimentation with different values while monitoring memory and performance is key.
  2. Disabling/Enabling GC: For short, bursty tasks, or specific phases of an application where you know no cycles will form, gc.disable() can temporarily turn off the GC to avoid collection overhead. Remember to re-enable it with gc.enable() and ideally call gc.collect() afterward to clean up any cycles that might have accumulated. This is a powerful but risky tool and should only be used after thorough analysis.
  3. Manual Collection: In some long-running processes, especially after processing a large batch of data or completing a significant logical unit of work, explicitly calling gc.collect() can be beneficial. This allows you to reclaim memory deterministically rather than waiting for the automatic thresholds to be met, which can smooth out performance by preventing large, unpredictable collection pauses.
  4. Identifying and Breaking Cycles: The most effective way to optimize is to prevent memory leaks from reference cycles. Use gc.DEBUG_UNCOLLECTABLE and tracemalloc to find uncollectable objects. Often, these arise from circular references involving objects with __del__ methods (which make them uncollectable by the standard GC, as the order of __del__ calls in a cycle is ambiguous). Restructuring your code to break these cycles (e.g., using weakref as discussed in Chapter 10) is the ultimate solution.

Optimizing long-running processes is an iterative process of profiling, identifying bottlenecks, applying tuning strategies, and re-profiling to measure the impact.

12.6. GC Hooks and Callbacks

Beyond simple threshold tuning, the gc module provides powerful introspection and extensibility points through its advanced hooks and callbacks. These features allow developers to gain deeper insights into the GC’s operation and even influence application-specific behavior around collection events, facilitating advanced debugging and resource management.

The most prominent feature in this category is gc.callbacks. This is a list that you can append callable objects to. These callbacks are invoked by the garbage collector before it starts and after it finishes a collection cycle. Each callback receives two arguments:

By registering callbacks, you can:

import gc

def gc_callback(phase, info):
    if phase == "start":
        print(f"GC: Collection started for generation {info['generation']}")
    elif phase == "stop":
        print(f"GC: Collection ended. Collected: {info['collected']}, Uncollectable: {info['uncollectable']}")
        if info['uncollectable'] > 0:
            print("  Potential memory leak detected! Uncollectable objects found.")
            for obj in gc.garbage:
                print(f"    Uncollectable: {type(obj)} at {id(obj)}")

# Register the callback
gc.callbacks.append(gc_callback)
# Trigger a collection to see the callback in action
gc.collect()
# Don't forget to remove callbacks if no longer needed, especially in tests
gc.callbacks.remove(gc_callback)

The gc module also offers gc.get_objects() and gc.get_referrers(), which can be invaluable for advanced debugging. gc.get_objects() returns a list of all objects that the collector is currently tracking. This can be a very large list but is powerful for inspecting the state of your program. gc.get_referrers(*objs) returns a list of objects that directly refer to any of the arguments objs. This is incredibly useful for debugging reference cycles: if gc.get_referrers() shows an unexpected reference, it can lead you to the source of a leak. By combining these tools with custom callbacks, developers gain unparalleled control and insight into the garbage collection process, enabling them to build highly optimized and memory-stable Python applications.

Key Takeaways


Where to Go Next