As a leading source for cutting-edge information on cryptocurrency and blockchain technology, Extreme Investor Network is thrilled to share the latest breakthroughs in AI model training from IBM Research.
IBM Research recently unveiled significant advancements in the PyTorch framework, designed to enhance the efficiency of AI model training. These groundbreaking improvements were announced at the PyTorch Conference and include a new high-throughput data loader and enhancements to large language model (LLM) training throughput.
Enhancements to PyTorch’s Data Loader:
The new high-throughput data loader in PyTorch allows developers to seamlessly distribute LLM training workloads across multiple machines. Developed out of necessity by Davis Wertheimer and his colleagues, this innovative tool enables more efficient checkpoint saving and reduces duplicated work. By overcoming challenges with existing data loaders, the IBM Research team created a PyTorch-native data loader that supports dynamic and adaptable operations, ensuring that previously seen data isn’t revisited even if resource allocation changes mid-job. In stress tests, the data loader successfully streamed 2 trillion tokens over a month of continuous operation without any failures, demonstrating the capability to load over 90,000 tokens per second per worker on 64 GPUs.
Maximizing Training Throughput:
IBM Research is also focused on optimizing GPU usage to prevent bottlenecks in AI model training. By employing fully sharded data parallel (FSDP) techniques and torch.compile, the team has enhanced the efficiency and speed of model training and tuning. Using FSDP in combination with torch.compile, IBM Research achieved a training rate of 4,550 tokens per second per GPU on A100 GPUs. Further optimizations, such as integrating the FP8 datatype supported by Nvidia H100 GPUs, have shown up to 50% gains in throughput, leading to reduced infrastructure costs.
Future Prospects:
Looking ahead, IBM Research is exploring new frontiers in AI model training, including the use of FP8 for model training and tuning on IBM’s artificial intelligence unit (AIU). The team is also focusing on Triton, Nvidia’s open-source software for AI deployment and execution, which aims to optimize training by compiling Python code into specific hardware programming languages. These advancements aim to accelerate cloud-based model training from experimental stages to broader community applications, potentially transforming the landscape of AI model training.
Stay ahead of the curve with Extreme Investor Network for the latest updates on cryptocurrency, blockchain technology, and groundbreaking advancements in AI model training. Join us in exploring the exciting future of AI innovation in the world of blockchain and cryptocurrency.