Implementing and Training LLM Inferencing Systems with Cisco and NVIDIA Data Center Technologies (DCLLM)

This comprehensive training equips participants with the knowledge and skills required to design, deploy, and optimize Large Language Models (LLMs) using NVIDIA GPUs and Cisco infrastructure. Through in-depth modules, hands-on labs, and real-world case studies, participants will learn how to manage data preparation, build scalable pipelines, optimize performance, ensure security, and migrate from cloud to on-premises deployments. The course provides a holistic approach to mastering the technical complexities of LLM systems while leveraging cutting-edge NVIDIA and Cisco technologies for scalability, efficiency, and security.

Retail Price: $3,995.00

Next Date: Request Date

Course Days: 5


Request a Date

Request Custom Course


Course Objectives:

By the end of this workshop, participants will:

  1. Master the Foundations of LLMs: Gain an in-depth understanding of LLM architecture, scaling principles, and design trade-offs.
  2. Prepare and Manage Large Datasets: Learn techniques for sourcing, preprocessing, and managing large-scale, high-quality datasets for LLM training.
  3. Deploy LLMs for Production: Use NVIDIA TensorRT and Cisco Nexus Dashboard to build efficient, low-latency inferencing pipelines.
  4. Optimize LLM Performance: Apply advanced optimization techniques like quantization, pruning, and dynamic batching to improve throughput and reduce latency.
  5. Design Scalable Pipelines: Build fault-tolerant, high-performance pipelines for real-time and batch inferencing.
  6. Monitor and Maintain Systems: Use NVIDIA and Cisco tools to monitor GPU and network performance, ensuring reliability and uptime.
  7. Ensure Security and Privacy: Implement robust security measures using Cisco Nexus Dashboard, Cisco XDR, and NVIDIA encryption tools.
  8. Build On-Premises Data Centers: Design and implement LLM inferencing systems using NVIDIA GPUs and Cisco UCS for maximum scalability and efficiency.
  9. Migrate Cloud Models to On-Premise: Transition cloud-trained LLMs to on-premise infrastructure while optimizing performance and costs.

 

Intended Audience:

This course is tailored for professionals involved in designing and managing AI and data infrastructure, including:

  • Systems Architects: To understand the integration of LLM systems into broader IT environments.
  • Network Architects: To optimize network configurations for high-speed LLM training and inferencing.
  • Storage Architects: To manage the storage and retrieval of large-scale datasets used in LLM systems.
  • AI Infrastructure Architects: To build robust and scalable AI platforms optimized for LLM workloads.
  • Data Scientists: To prepare high-quality datasets and fine-tune LLMs for specific use cases.
  • Machine Learning Engineers: To deploy and optimize LLMs for real-world applications with low latency and high throughput.

Course content

Module 1: Large Language Model (LLM) Foundations

  • Objectives:
    • Understand the architecture and mathematical principles of LLMs.
    • Learn design trade-offs for scalability and performance.
    • Explore emerging innovations in LLM development.
  • Topics:
    • Transformer architecture, self-attention mechanism, and positional encoding.
    • Types of LLMs: Encoder-only, decoder-only, and encoder-decoder.
    • Training objectives: Masked language modeling (MLM), causal language modeling (CLM), and sequence-to-sequence modeling.
    • Scaling laws and challenges: Parameter size, dataset size, and compute.
    • Emerging architectures: Reformer, Longformer, and multi-modal LLMs.
  • Labs:
    • Exploring Transformer model architectures
    • Compute attention scores manually for a small sequence.

Module 2: Data Collection and Preparation for LLM Training

  • Objectives:
    • Understand data requirements for LLMs and their impact on performance.
    • Learn techniques for sourcing, cleaning, and managing large-scale datasets.
    • Explore NVIDIA and Cisco tools for efficient data handling.
  • Topics:
    • Data sourcing: Open-source, proprietary, and domain-specific datasets.
    • Preprocessing: Cleaning, deduplication, tokenization, and filtering.
    • Data management: Sharding, scalable storage, and high-speed data transfer.
    • Ethical considerations: Bias detection, privacy compliance, and fairness.
  • Labs:
    • Preprocessing lab: Clean, deduplicate, and tokenize a dataset using NVIDIA RAPIDS.
    • Tokenization exercise: Implement and analyze subword tokenization methods.

Module 3: Deployment of LLMs for Inferencing

  • Objectives:
    • Deploy LLMs for production inferencing with high performance and scalability.
    • Use NVIDIA TensorRT and Cisco Nexus Dashboard for optimized deployment.
  • Topics:
    • Deployment architectures: On-premises, cloud, and hybrid.
    • Optimizing inferencing with NVIDIA TensorRT: Precision calibration, layer fusion, and batching.
    • Traffic management and load balancing with Cisco Nexus Dashboard.
    • Exposing LLM APIs: RESTful and gRPC endpoints with security mechanisms.
  • Labs:
    • Deploy an LLM as a REST API using NVIDIA TensorRT.
    • Configure traffic policies in Cisco Nexus Dashboard for inferencing workloads.

Module 4: Optimizing LLM Models for Inferencing

  • Objectives:
    • Optimize LLM inferencing pipelines for low latency and high throughput.
    • Learn techniques like quantization, pruning, and model compression.
  • Topics:
    • Quantization: FP16, INT8, and mixed precision.
    • Pruning and knowledge distillation for lightweight models.
    • TensorRT optimization: Dynamic batching and asynchronous execution.
    • Benchmarking tools: NVIDIA Triton Inference Server, TensorRT Profiler.
  • Labs:
    • Apply quantization and pruning to optimize a pre-trained LLM.
    • Benchmark latency, memory usage, and accuracy of optimized models.

Module 5: Scalable Pipeline Design for LLM Inferencing

  • Objectives:
    • Build robust, scalable, and fault-tolerant pipelines for inferencing.
    • Use batching, caching, and dynamic scaling for efficient pipelines.
  • Topics:
    • Pipeline components: Batching, caching, and queuing.
    • Load balancing with Cisco Nexus Dashboard for traffic optimization.
    • Fault tolerance: Automatic failover and disaster recovery plans.
    • Monitoring pipeline performance with NVIDIA DCGM and Cisco Nexus Dashboard.
  • Labs:
    • Design a scalable pipeline with batching and caching strategies.
    • Configure routing and scaling policies for GPU nodes using Nexus Dashboard.

Module 6: Monitoring, Logging, and Maintenance for LLM Systems

  • Objectives:
    • Monitor and maintain LLM deployments using NVIDIA and Cisco tools.
  • Topics:
    • Key metrics: Latency, throughput, GPU utilization, and memory usage.
    • Monitoring tools: NVIDIA DCGM and Cisco Nexus Dashboard Insights.
    • Maintenance workflows for hardware and software reliability.
  • Labs:
    • Configure dashboards for real-time monitoring of GPU and network performance.
    • Simulate hardware failures and evaluate maintenance workflows.

Module 7: Security and Privacy Considerations in LLM Training and Inferencing

  • Objectives:
    • Secure LLM pipelines using Cisco Nexus Dashboard, Cisco XDR, and NVIDIA tools.
  • Topics:
    • NVIDIA runtime encryption and secure boot.
    • Cisco Robust Intelligence for adversarial defense and vulnerability detection.
    • Cisco XDR for unified threat detection and automated response.
    • Traffic segmentation and endpoint authentication.
  • Labs:
    • Analyze and secure an LLM using Cisco Robust Intelligence.
    • Configure Cisco XDR to monitor and respond to threats across pipelines.

Module 8: Migrating from Cloud-Based Training to On-Premises Inferencing

  • Objectives:
    • Transition LLM models from cloud training to on-premises Cisco infrastructure.
  • Topics:
    • Migration strategies for exporting and deploying models.
    • Data transfer optimization using Cisco Nexus Dashboard.
    • Integrating models with on-premises inferencing pipelines.
  • Labs:
    • Export a cloud-trained model and deploy it on Cisco UCS for inferencing.
    • Optimize data transfer pipelines for low-latency inferencing.

Module 9: On-Premises Data Center Design for LLM Inferencing Systems

  • Objectives:
    • Design an on-premises data center with Cisco and NVIDIA technologies.
  • Topics:
    • Cisco UCS and NVIDIA GPUs for high-performance compute.
    • Network design and automation with Cisco Nexus Dashboard.
    • Storage solutions for large-scale data management.
  • Lab:
    • Design a complete data center architecture for LLM inferencing.

Module 10: On-Premises Data Center Implementation for LLM Inferencing Systems

  • Objectives:
    • Implement and configure an LLM inferencing data center using NVIDIA and Cisco technologies.
  • Topics:
    • Physical setup: NVIDIA GPUs on Cisco UCS and Nexus networking configuration.
    • Performance testing and validation of inferencing pipelines


Sorry! It looks like we haven’t updated our dates for the class you selected yet. There’s a quick way to find out. Contact us at 502.265.3057 or email info@training4it.com


Request a Date