Load

A pickle file is loaded using the function pypickle.pypickle.load(). Because of security reasons, that are various restrictions for loading.

Risk Modules

Risk modules refer to Python modules that, when deserialized using pickle, may execute unintended or harmful code due to their built-in capabilities. Modules such as os, subprocess, sys, or custom classes with overridden __reduce__ or __setstate__ methods can introduce severe security risks during unpickling. These modules are often classified as high-risk because they enable file system access, process execution, or system-level interactions. To mitigate these threats, pypickle includes validation mechanisms that block or explicitly require user approval for such modules before loading, ensuring safer handling of untrusted pickle files.

Risky Modules Blocked by Default
Module or Function	Risk / Reason for Blocking
os.system	Execute shell commands
os.popen	Open pipe to or from command
os.execve	Execute a new program, replacing the current process
os.remove	Remove files (can delete data)
os.rmdir	Remove directories
os.makedirs	Make directories (can modify filesystem)
subprocess	Arbitrary system command execution
subprocess.Popen	Start subprocess with pipe access
subprocess.call	Run system commands
sys.exit	Exit interpreter
sys.modules	Manipulate loaded modules
sys.path	Modify import path (can load arbitrary code)
nt	Windows native system calls (like os)
posix	Unix equivalent of nt
importlib	Dynamic imports and module loading
socket	Network access, can open sockets
selectors	Low-level network/socket multiplexing
multiprocessing	Starts subprocesses and parallel processes
threading	Can spawn threads (potential concurrency hazards)
asyncio	Asynchronous tasks (can be misused)
ctypes	Load arbitrary C libraries (very risky)
platform	Access to detailed system info (potential info leak)
webbrowser	Can open URLs or trigger external browser actions
shutil	File operations, including deleting files
tempfile	Temporary files and directories (file system access)
glob	Wildcard filesystem access
pathlib	Filesystem path operations (safe if used carefully, but can be risky)
codecs	Decoding arbitrary formats (rare but possible exploits)
builtins.eval	Execute arbitrary code from string
builtins.exec	Execute arbitrary code dynamically
builtins.open	File open (can read/write files)
builtins.__import__	Dynamic import of modules

Security Mechanisms (load)

pickle files are directly loaded if all modules are in the allowlist. If the pickle file contains unknown modules, the modules needs to be validated using the validate parameter. pickle files that contain risky modules, i.e., those that can automatically make changes on the system or start (unwanted) applications are not allowed unless specifically specified using the validate parameter.

Module Type	Allowed?	How to Change Behavior
Unknown	✅	Allowed unless in risky list
Risk modules (`os`, etc.)	❌	No risk modules can be loaded unless exlicitely stated
Risky modules (`os`, etc.)	✅	Must be explicitly added via `validate=['nt']` or `validate=False`

import pypickle
filepath = 'test.pkl'

# Some data in a list
data = [1,2,3,4,5]

# Save
status = pypickle.save(filepath, data)

# Load file
data = pypickle.load(filepath)

Load with Validation

To prevent exploits when loading pickle files, use the validate parameter to explicitly allow only trusted modules. This ensures that only known-safe objects are deserialized. For example, you can safely load any pickled module such as sklearn by specifying the expected modules as allowed. See the example below for how to use this mechanism securely.

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pypickle

# Load example dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a model
model = LogisticRegression()
model.fit(X_train, y_train)

# Save the model
status = pypickle.save('model.pkl', model, overwrite=True)

# Gather the modules which you can use later to load the pickle file with the expected modules.
mods = pypickle.validate_modules('model.pkl')
# ['classes_',
#  'numpy',
#  'numpy._core.multiarray',
#  'sklearn.linear_model._logistic']
model_loaded = pypickle.load('model.pkl', validate=mods)

# Load the model but without adding ``validate=False``)
model_loaded = pypickle.load('model.pkl', validate=False)

# Predict
predictions = model_loaded.predict(X_test)
print(predictions)

Load without Validation

All pickle files can be loaded by setting the validate=False parameter. This disables all module and risk validation checks, allowing any object to be deserialized—including potentially unsafe ones. Below is an example where pypickle refuses to load a known exploit unless validation is explicitly bypassed. While disabling validation is possible, it is strongly discouraged unless you fully trust the source.

# Create a pickle that can perform risky operations.
class risky_module:
    def __reduce__(self):
        import os
        return (os.system, ("echo 'Unsafe action executed'",))

import pypickle
filepath = 'risky_module.pkl'
pypickle.save(filepath, risky_module(), overwrite=True)

# Attempt safe load
loaded = pypickle.load(filepath)

# Brute-force load and ignore all risk modules
loaded = pypickle.load(filepath, validate=False)

# Load using validation
mods = pypickle.validate_modules(filepath)
# Load
loaded = pypickle.load(filepath, validate=mods)

[1] https://www.datacamp.com/community/tutorials/pickle-python-tutorial