Limiting TensorFlow GPU Memory
- TensorFlow by default attempts to allocate the entire memory of all GPUs available on the machine. To manage GPU memory allocation effectively and avoid exhausting GPU resources, you can set GPU memory limits.
Enable Memory Growth
- TensorFlow provides the option to set memory growth for a specific GPU. This approach allows the memory to expand as needed, without pre-allocating all memory.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
Set Per-Process GPU Memory Limit
- If you prefer setting a fixed upper limit on the amount of GPU memory a TensorFlow process can use, follow these steps:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
except RuntimeError as e:
print(e)
Using Logical GPUs
- You can subdivide a physical GPU into multiple logical GPUs. This method is useful for testing multiple configurations in parallel without needing multiple physical devices.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
print(e)
Utilize TensorFlow ConfigProto
- The `ConfigProto` class allows you to configure settings such as allowing GPU memory growth or setting a soft memory limit for the GPU in older versions of TensorFlow.
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.5
session = tf.compat.v1.Session(config=config)
# Use this session for other computations
Monitor GPU Usage
- Monitoring your GPU usage can ensure that your changes are having the desired effect. Utilize tools such as `nvidia-smi` to track memory usage and process details.
nvidia-smi
Considerations and Best Practices
- Avoid over-allocating GPU memory among multiple TensorFlow processes; unexpected crashes can occur if running out of memory.
- While logical GPUs allow parallel testing, share carefully to avoid interference among processes.
- Always ensure your code is compatible with the version of TensorFlow you are using, especially when using VirtualDeviceConfiguration or ConfigProto APIs.