Use case
Securing Image Management for
EKS Monitoring and Observability Stack
About the Customer
NexaHR is an end-to-end employee management system that helps companies digitize their HR processes, increase profitability, improve performance, and offer an enhanced employee experience
The platform enables HR managers to achieve a seamless, contactless onboarding experience, provide additional professional development services, and personalize communication with employees, all while maintaining high standards of quality and efficiency.
Customer Challenge
The customer encountered significant challenges in managing their EKS cluster and related infrastructure, including:
- Security Risks: Reliance on public images from Docker Hub for monitoring and observability services (such as Grafana, Prometheus, Loki, Mimir) posed security risks due to potential vulnerabilities (CVEs) in outdated dependencies. The lack of visibility into these vulnerabilities increased the risk of running insecure applications.
- Version Control Issues: There was a lack of control over the versions of the services being deployed, leading to inconsistencies and making it difficult to maintain the stack.
- Deployment Failures: The system utilized spot EC2 instances to run workloads that required pulling Docker images from Docker Hub. Due to Docker Hub’s rate limits, the customer frequently encountered issues with image retrieval, resulting in deployment failures.
- Operational Complexity: The challenges with Docker Hub’s rate limits added to the operational complexity, making it more difficult to manage and deploy services efficiently.
- Increased Costs: The frequent deployment failures and operational complexities contributed to higher operational costs, impacting the overall efficiency of the system.
Solution
To address the customer’s issues, the following solution was designed and implemented:
Manual Image Management and Security:
- All essential monitoring stack images are manually pulled from Docker Hub
- These images are then scanned for vulnerabilities (CVEs) to ensure they are secure
- After scanning, the images are pushed to a private Elastic Container Registry (ECR) within the customer’s AWS account.
- The ECR repositories are regularly scanned for CVEs, providing continuous monitoring and awareness of any vulnerabilities in the images.
Controlled Image Deployment:
- Application pods in the EKS cluster are configured to pull images exclusively from the private ECR.
- This setup provides greater control over the versions and security of the images deployed in the production environment.
Transition to Amazon Elastic Container Registry (ECR):
- The transition to Amazon ECR, which offers unlimited pulls and enhanced reliability, was implemented.
- This switch mitigates the issues related to Docker Hub’s rate limits, thereby reducing deployment failures.
- The transition also simplifies operations and helps in lowering overall operational costs due to fewer deployment issues.
High Level Architecture Diagram
Results and Benefits
- Enhanced Security: By scanning images before deployment, we identified vulnerabilities, being aware of the security risks.
- Enhanced Network Security: Additionally, pulling images using ECR Endpoint within a private network improved security by avoiding public network exposure through Docker Hub.
- Improved Compliance: Keeping a controlled and auditable list of images aligns with best practices and compliance requirements.
- Version Control: Ensured consistent and up-to-date versions of monitoring tools across the cluster, avoiding issues related to deprecated dependencies.
- Operational Efficiency: Streamlined the update process for monitoring services, simplifying maintenance and upgrades.
The key learnings from the project were:
- Proactive Security Management: Implementing proactive measures for image security can prevent potential breaches and ensure compliance.
- Importance of Visibility: Gaining visibility into the vulnerabilities in images used within the cluster is critical for maintaining a secure and stable environment.
- Continuous Improvement: Regularly updating and scanning images is essential to keep up with the evolving security landscape and ensure the deployment of safe and reliable applications.
About IAMOPS
IAMOPS is a full DevOps suite company that supports technology companies to achieve intense production readiness.
Our mission is to ensure that our clients’ infrastructure and CI/CD pipelines are scalable, mitigate failure points, optimize performance, ensure uptime, and minimize costs.
Our DevOps suite includes DevOps Core, NOC 24/7, FinOps, QA Automation, and DevSecOps to accelerate overall exponential growth.
As an AWS Advanced Tier Partner and Reseller, we focus on two key pillars: Professionalism by adhering to best practices and utilizing advanced technologies, and Customer Experience with responsiveness, availability, clear project management, and transparency to provide an exceptional experience for our clients.