Use case

Using GKE to support high GPU workloads

1. Overview

Stockwise provides stock market analysis services to help investors make informed investment decisions.
Additionally, they provide real-time market data, expert analyses, and informational resources.

Its analysis algorithm requires high GPU usage.

Pain
Stockwise’s infrastructure was distributed across various cloud services, including GCP VMs, Azure VMs, and Heroku Dynos. However, this architecture was unable to support the extensive GPU usage required. Furthermore, its complexity presented challenges for the development team in their daily tasks.

2. Goal

Simplify Stockwise’s architecture and enhance GPU utilization by consolidating the infrastructure within GCP services.

The solution will utilize a GKE cluster through Kubernetes manifests.

Before: The fragmented infrastructure across GCP VMs, Azure VMs, and Heroku Dynos limited GPU utilization
After: Consolidated GKE cluster with Kubernetes manifests streamlined GPU usage, enhanced scalability, and simplified development through containerization and automation

What is GKE?

GKE is a robust and efficient managed service offered by GCP that simplifies the deployment, scaling, and management of containerized applications.

It’s designed to handle a variety of workloads, including GPU-intensive tasks, making it an ideal solution for platforms like Stockwise that require extensive GPU usage.

3. IAMOPS Solution- High Level Design

Based on the requirements, IAMOPS utilized the existing resources of Stockwise on GCP and provided the solution to utilize GKE clusters that enable easy handling of GPU workloads..

Technologies configured with Google Cloud Architecture Framework:

Performance Optimization:

Containerization with Kubernetes allows for efficient resource utilization and scaling. DCGM Exporter and Prometheus facilitate detailed monitoring and resource allocation for GPU workloads, optimizing performance.

Cost Optimization:

Utilizing Spot VMs reduces costs for GPU workloads by taking advantage of unused capacity at discounted rates.

Reliability:

GKE offers high availability features to ensure uptime, while VPC with regional subnet and multiple AZs provides redundancy and fault tolerance.

Security, Privacy and Compliance:

IAM restricts access to GKE resources, network policies isolate applications, and encryption protects data confidentiality and integrity. Terraform helps manage infrastructure consistently and track changes for auditing and compliance purposes.

System Design:

GKE and Kubernetes provide a scalable and flexible containerized platform. VPC and Firewall Rules secure the cluster and control traffic flow.

Operational Excellence:

Terraform automates infrastructure management, and Spot VMs reduce manual intervention for cost optimization.

Proposed Infrastructure

System Design:

GKE and Kubernetes provide a scalable and flexible containerized platform. VPC and Firewall Rules secure the cluster and control traffic flow.

Operational Excellence:

Terraform automates infrastructure management, and Spot VMs reduce manual intervention for cost optimization.

4. Summary

By consolidating services from different platforms like Heroku, GCP VMs and Azure VMs, Stockwise accomplished the integration of a GKE infrastructure with GPU and CPU base nodepools, enabling them to streamline the management and deployment of their containerized application through a centralized solution.

The automation and monitoring capabilities provided by the infrastructure increased operational efficiency and reduced downtime, resulting in improved customer satisfaction and business growth.

Let's get the Conversation started!

Click below to explore the DevOps journey with us.

Looking for DevOps to join my team

Looking for a job opportunity

Apply to

Using GKE to support high GPU workloads

Thanks for applying!

Your application has been sent to our recruitment team successfully. If your profile is selected, our recruitment team will get in touch with you.

We wish you all the best!