This page explains Agent Platform's TensorFlow integration and provides resources that show you how to use TensorFlow on Agent Platform. Agent Platform's TensorFlow integration makes it easier for you to train, deploy, and orchestrate TensorFlow models in production.
Run code in notebooks
Agent Platform provides two options for running your code in notebooks, Colab Enterprise and Agent Platform Workbench. To learn more about these options, see choose a notebook solution.
Prebuilt containers for training
Agent Platform provides prebuilt Docker container images for model training. These containers are organized by machine learning frameworks and framework versions and include common dependencies that you might want to use in your training code.
To learn about which TensorFlow versions have prebuilt training containers and how to train models with a prebuilt training container, see Prebuilt containers for custom training.
Distributed training
You can run distributed training of TensorFlow models on Agent Platform. For multi-worker training, you can use Reduction Server to optimize performance even further for all-reduce collective operations. To learn more about distributed training on Agent Platform, see Distributed training.
Prebuilt containers for inference
Similar to prebuilt containers for training, Agent Platform provides prebuilt container images for serving inferences and explanations from TensorFlow models that you either created within or outside of Agent Platform. These images provide HTTP inference servers that you can use to serve inferences with minimal configuration.
To learn about which TensorFlow versions have prebuilt training containers and how to train models with a prebuilt training container, see Prebuilt containers for custom training.
Optimized TensorFlow runtime
The optimized TensorFlow runtime uses model optimizations and new proprietary Google technologies to improve the speed and lower the cost of inferences compared to Agent Platform's standard prebuilt inference containers for TensorFlow.
TensorFlow Cloud Profiler integration
Train models cheaper and faster by monitoring and optimizing the performance of your training job using Agent Platform's TensorFlow Cloud Profiler integration. TensorFlow Cloud Profiler helps you understand the resource consumption of training operations so you can identify and eliminate performance bottlenecks.
To learn more about Agent Platform TensorFlow Cloud Profiler, see Profile model training performance using Profiler.
Resources for using TensorFlow on Agent Platform
To learn more and start using TensorFlow in Agent Platform, see the following resources.
Prototype to Production: A video series that provides and end-to-end example of developing and deploying a custom TensorFlow model on Agent Platform.
Optimize training performance with Reduction Server on Agent Platform: A blog post on optimizing distributed training on Agent Platform by using Reduction Server.
How to optimize training performance with the TensorFlow Cloud Profiler on Agent Platform: A blog post that shows you how to identify performance bottlenecks in your training job by using Agent Platform TensorFlow Cloud Profiler.
Custom model batch prediction with feature filtering: A notebook tutorial that shows you how to use the Agent Platform SDK for Python to train a custom tabular classification model and perform batch inference with feature filtering.
Vertex AI Pipelines: Custom training with prebuilt Google Cloud Pipeline Components: A notebook tutorial that shows you how to use Agent Platform Pipelines with prebuilt Google Cloud Pipeline Components for custom training.
Co-host TensorFlow models on the same VM for predictions: A codelab that shows you how to use the co-hosting model feature in Agent Platform to host multiple models on the same VM for online inferences.