Load

A pickle file is loaded using the function pypickle.pypickle.load(). Because of security reasons, that are various restrictions for loading.

Risk Modules

Risk modules refer to Python modules that, when deserialized using pickle, may execute unintended or harmful code due to their built-in capabilities. Modules such as os, subprocess, sys, or custom classes with overridden __reduce__ or __setstate__ methods can introduce severe security risks during unpickling. These modules are often classified as high-risk because they enable file system access, process execution, or system-level interactions. To mitigate these threats, pypickle includes validation mechanisms that block or explicitly require user approval for such modules before loading, ensuring safer handling of untrusted pickle files.

Risky Modules Blocked by Default

Module or Function

Risk / Reason for Blocking

os.system

Execute shell commands

os.popen

Open pipe to or from command

os.execve

Execute a new program, replacing the current process

os.remove

Remove files (can delete data)

os.rmdir

Remove directories

os.makedirs

Make directories (can modify filesystem)

subprocess

Arbitrary system command execution

subprocess.Popen

Start subprocess with pipe access

subprocess.call

Run system commands

sys.exit

Exit interpreter

sys.modules

Manipulate loaded modules

sys.path

Modify import path (can load arbitrary code)

nt

Windows native system calls (like os)

posix

Unix equivalent of nt

importlib

Dynamic imports and module loading

socket

Network access, can open sockets

selectors

Low-level network/socket multiplexing

multiprocessing

Starts subprocesses and parallel processes

threading

Can spawn threads (potential concurrency hazards)

asyncio

Asynchronous tasks (can be misused)

ctypes

Load arbitrary C libraries (very risky)

platform

Access to detailed system info (potential info leak)

webbrowser

Can open URLs or trigger external browser actions

shutil

File operations, including deleting files

tempfile

Temporary files and directories (file system access)

glob

Wildcard filesystem access

pathlib

Filesystem path operations (safe if used carefully, but can be risky)

codecs

Decoding arbitrary formats (rare but possible exploits)

builtins.eval

Execute arbitrary code from string

builtins.exec

Execute arbitrary code dynamically

builtins.open

File open (can read/write files)

builtins.__import__

Dynamic import of modules

Security Mechanisms (load)

pickle files are directly loaded if all modules are in the allowlist. If the pickle file contains unknown modules, the modules needs to be validated using the validate parameter. pickle files that contain risky modules, i.e., those that can automatically make changes on the system or start (unwanted) applications are not allowed unless specifically specified using the validate parameter.

Module Type

Allowed?

How to Change Behavior

Unknown

Allowed unless in risky list

Risk modules (os, etc.)

No risk modules can be loaded unless exlicitely stated

Risky modules (os, etc.)

Must be explicitly added via validate=['nt'] or validate=False

import pypickle
filepath = 'test.pkl'

# Some data in a list
data = [1,2,3,4,5]

# Save
status = pypickle.save(filepath, data)

# Load file
data = pypickle.load(filepath)

Load with Validation

To prevent exploits when loading pickle files, use the validate parameter to explicitly allow only trusted modules. This ensures that only known-safe objects are deserialized. For example, you can safely load any pickled module such as sklearn by specifying the expected modules as allowed. See the example below for how to use this mechanism securely.

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pypickle

# Load example dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a model
model = LogisticRegression()
model.fit(X_train, y_train)

# Save the model
status = pypickle.save('model.pkl', model, overwrite=True)

# Gather the modules which you can use later to load the pickle file with the expected modules.
mods = pypickle.validate_modules('model.pkl')
# ['classes_',
#  'numpy',
#  'numpy._core.multiarray',
#  'sklearn.linear_model._logistic']
model_loaded = pypickle.load('model.pkl', validate=mods)

# Load the model but without adding ``validate=False``)
model_loaded = pypickle.load('model.pkl', validate=False)

# Predict
predictions = model_loaded.predict(X_test)
print(predictions)

Load without Validation

All pickle files can be loaded by setting the validate=False parameter. This disables all module and risk validation checks, allowing any object to be deserialized—including potentially unsafe ones. Below is an example where pypickle refuses to load a known exploit unless validation is explicitly bypassed. While disabling validation is possible, it is strongly discouraged unless you fully trust the source.

# Create a pickle that can perform risky operations.
class risky_module:
    def __reduce__(self):
        import os
        return (os.system, ("echo 'Unsafe action executed'",))

import pypickle
filepath = 'risky_module.pkl'
pypickle.save(filepath, risky_module(), overwrite=True)

# Attempt safe load
loaded = pypickle.load(filepath)

# Brute-force load and ignore all risk modules
loaded = pypickle.load(filepath, validate=False)

# Load using validation
mods = pypickle.validate_modules(filepath)
# Load
loaded = pypickle.load(filepath, validate=mods)

[1] https://www.datacamp.com/community/tutorials/pickle-python-tutorial