Clusterone's Kubernetes Deployment
Case Study
Devopsbay, a custom software development company specializing in Kubernetes applications and machine learning software development, collaborated with Clusterone to deploy a Kubernetes cluster on varied hardware. Utilizing a custom Ansible installer, Devopsbay ensured seamless integration and operational efficiency, enabling effective machine learning execution within the Kubernetes framework.
Key aspects of the project
Technical Challenges
Deploying Kubernetes on heterogeneous hardware with different Nvidia graphics card models and restricted infrastructure access.
Cost Savings
Efficient use of existing hardware and infrastructure, reducing the need for additional investments.
Project Results
Successful deployment of a Kubernetes cluster, enabling automated distribution of ML tasks and maximizing hardware utilization.
Technical challenges
Project implementation stages
- 1
Initial Phase
Preparation of an elaborate inventory for the custom Ansible installer, detailing specific hardware configurations and ensuring comprehensive access to each server.
- 2
Harmonization
Standardizing operating system versions across all servers and installing essential software and tools.
- 3
Containerization and networking
Setting up containerization and networking support, culminating in the creation of a Kubernetes cluster.
- 4
Integration
Integrating all servers as nodes and establishing a distributed disk storage system using Ceph.
- 5
Configuration
Configuring Persistent Volumes for Ceph within Kubernetes, installing appropriate GPU drivers, and enabling GPUs for ML tasks executed as Pods in the cluster.
Problems and solutions
Hardware Diversity
The varied Nvidia graphics card models required unique driver versions. The custom Ansible installer was meticulously prepared to handle these variations.
Restricted Access
Limited to a single SSH port redirection through the firewall, the team ensured secure and efficient access to the infrastructure.
Storage Solutions
While MooseFS and LizardFS were considered, their lack of native support for Persistent Volume functionality in Kubernetes clusters led to the adoption of Ceph, which was optimized for performance.
Team involvement
- Project Results
- Machine Learning Engineers
- QA Engineers
- Front-end Developers
- Project Managers
Technological insights
Ceph
Successful deployments of Ceph tailored for both virtualization and Kubernetes cluster environments, optimizing configurations to meet specific performance criteria.
Infrastructure Provisioning
Proficiency with Ansible, complemented by tools like Foreman, Forklift, Terraform, and Packer, played a pivotal role in the project's success.
Conclusion
This case study highlights Devopsbay's technical prowess in navigating complex infrastructure challenges and their commitment to delivering solutions that meet and exceed client expectations. By leveraging advanced technologies and automation strategies, Devopsbay enabled Clusterone to fully utilize their available infrastructure for effective machine learning execution within a Kubernetes framework.