Load
A pickle file is loaded using the function pypickle.pypickle.load()
.
Because of security reasons, that are various restrictions for loading.
Risk Modules
Risk modules refer to Python modules that, when deserialized using pickle
, may execute unintended or harmful code due to their built-in capabilities.
Modules such as os
, subprocess
, sys
, or custom classes with overridden __reduce__
or __setstate__
methods can introduce severe security risks during unpickling.
These modules are often classified as high-risk because they enable file system access, process execution, or system-level interactions.
To mitigate these threats, pypickle
includes validation mechanisms that block or explicitly require user approval for such modules before loading, ensuring safer handling of untrusted pickle files.
Module or Function |
Risk / Reason for Blocking |
---|---|
os.system |
Execute shell commands |
os.popen |
Open pipe to or from command |
os.execve |
Execute a new program, replacing the current process |
os.remove |
Remove files (can delete data) |
os.rmdir |
Remove directories |
os.makedirs |
Make directories (can modify filesystem) |
subprocess |
Arbitrary system command execution |
subprocess.Popen |
Start subprocess with pipe access |
subprocess.call |
Run system commands |
sys.exit |
Exit interpreter |
sys.modules |
Manipulate loaded modules |
sys.path |
Modify import path (can load arbitrary code) |
nt |
Windows native system calls (like os) |
posix |
Unix equivalent of nt |
importlib |
Dynamic imports and module loading |
socket |
Network access, can open sockets |
selectors |
Low-level network/socket multiplexing |
multiprocessing |
Starts subprocesses and parallel processes |
threading |
Can spawn threads (potential concurrency hazards) |
asyncio |
Asynchronous tasks (can be misused) |
ctypes |
Load arbitrary C libraries (very risky) |
platform |
Access to detailed system info (potential info leak) |
webbrowser |
Can open URLs or trigger external browser actions |
shutil |
File operations, including deleting files |
tempfile |
Temporary files and directories (file system access) |
glob |
Wildcard filesystem access |
pathlib |
Filesystem path operations (safe if used carefully, but can be risky) |
codecs |
Decoding arbitrary formats (rare but possible exploits) |
builtins.eval |
Execute arbitrary code from string |
builtins.exec |
Execute arbitrary code dynamically |
builtins.open |
File open (can read/write files) |
builtins.__import__ |
Dynamic import of modules |
Security Mechanisms (load)
pickle files are directly loaded if all modules are in the allowlist. If the pickle file contains unknown modules, the modules needs to be validated using the validate parameter. pickle files that contain risky modules, i.e., those that can automatically make changes on the system or start (unwanted) applications are not allowed unless specifically specified using the validate parameter.
Module Type |
Allowed? |
How to Change Behavior |
---|---|---|
Unknown |
✅ |
Allowed unless in risky list |
Risk modules ( |
❌ |
No risk modules can be loaded unless exlicitely stated |
Risky modules ( |
✅ |
Must be explicitly added via |
import pypickle
filepath = 'test.pkl'
# Some data in a list
data = [1,2,3,4,5]
# Save
status = pypickle.save(filepath, data)
# Load file
data = pypickle.load(filepath)
Load with Validation
To prevent exploits when loading pickle files, use the validate
parameter to explicitly allow only trusted modules. This ensures that only known-safe objects are deserialized.
For example, you can safely load any pickled module such as sklearn
by specifying the expected modules as allowed. See the example below for how to use this mechanism securely.
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pypickle
# Load example dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# Train a model
model = LogisticRegression()
model.fit(X_train, y_train)
# Save the model
status = pypickle.save('model.pkl', model, overwrite=True)
# Gather the modules which you can use later to load the pickle file with the expected modules.
mods = pypickle.validate_modules('model.pkl')
# ['classes_',
# 'numpy',
# 'numpy._core.multiarray',
# 'sklearn.linear_model._logistic']
model_loaded = pypickle.load('model.pkl', validate=mods)
# Load the model but without adding ``validate=False``)
model_loaded = pypickle.load('model.pkl', validate=False)
# Predict
predictions = model_loaded.predict(X_test)
print(predictions)
Load without Validation
All pickle files can be loaded by setting the validate=False
parameter. This disables all module and risk validation checks, allowing any object to be deserialized—including potentially unsafe ones.
Below is an example where pypickle
refuses to load a known exploit unless validation is explicitly bypassed. While disabling validation is possible, it is strongly discouraged unless you fully trust the source.
# Create a pickle that can perform risky operations.
class risky_module:
def __reduce__(self):
import os
return (os.system, ("echo 'Unsafe action executed'",))
import pypickle
filepath = 'risky_module.pkl'
pypickle.save(filepath, risky_module(), overwrite=True)
# Attempt safe load
loaded = pypickle.load(filepath)
# Brute-force load and ignore all risk modules
loaded = pypickle.load(filepath, validate=False)
# Load using validation
mods = pypickle.validate_modules(filepath)
# Load
loaded = pypickle.load(filepath, validate=mods)
[1] https://www.datacamp.com/community/tutorials/pickle-python-tutorial