A proven track record of translating AI research into production-grade systems across education, infrastructure, and enterprise domains.
AUG 2025 โ PRESENTCURRENTFULL-TIME
MLOps Engineer
BetaCodes Pvt Ltd
๐ Islamabad, Pakistan
Provisioned and managed NVIDIA DGX systems and GPU nodes (H100, H200) for distributed AI model training and high-throughput inference workloads.
Managed Kubernetes clusters using Kubeadm, Terraform, and BCM (Base Command Manager) for high-availability production environments.
Built AI model serving infrastructure using vLLM and Triton Inference Server; applied quantization (INT4/INT8), tensor parallelism, and continuous batching to maximize GPU utilization.
Implemented model scaling strategies including horizontal pod autoscaling and GPU-aware K8s scheduling to handle variable production loads.
Designed end-to-end ML pipelines with CI/CD, Docker, and cloud-native workflows on AWS, GCP, and Azure โ from model development to production.
Deployed comprehensive monitoring, alerting, and observability stacks for production AI models ensuring SLA compliance.
Developed and deployed Retrieval-Augmented Generation (RAG) systems and agentic AI frameworks, enhancing the iQera Schools e-learning platform with generative AI capabilities.
Built scalable backend services with FastAPI and Django, integrating vector databases for semantic search and comprehensive educational resource access.
Managed cloud infrastructure on AWS including model hosting, API gateways, and vector database deployments.
Led project management for AI feature delivery across cross-functional teams spanning the US and Pakistan.