Sharada Yeluri’s Post

View profile for Sharada Yeluri, graphic

Technologist and Sr. Director of Engineering @ Juniper Networks

Networking for AI is undoubtedly giving rise to more consortiums! Last month, industry stalwarts announced the formation of the UALink consortium. At a high level, the consortium's goal is to develop an open standard alternative to Nvidia's NVLInk that can be used for intra-server or inter-sever high-speed connectivity between GPU/Accelerators to build scale-up AI/HPC systems. The plan is to use AMD's interconnect (Infinity Fabric) as the baseline for this standard.   When wider adoption happens across non-Nvidia GPUs and custom hardware accelerators, it creates an open ecosystem that encourages vendor diversity for the fabric switches that connect GPUs inside a server node in scale-up systems. This could theoretically allow a mix/match of different accelerators. Having open standard interfaces could help the many startups building AI accelerators focus their energy on optimizing the computing hardware and use IPs/chiplet based on UALink to scale up, thus driving faster innovations.   The UALink press release says it can connect up to 1,024 accelerators within an AI computing pod, supporting direct memory access across devices. This scale is more than what NVLink can do today. However, Two things that could slow down adoption: ❌ Existing proprietary interconnects: Many companies building scale-up systems with custom accelerators have their own interconnects already in the second or third generation of their chips. Switching to a standard interface might pose integration and performance challenges. ❌ Technical limitations: InfinityFabric, based on XGMI, might have much lower bandwidth than Nvidia’s NVLink and other proprietary interfaces currently in use. Despite the hurdles, collaboration with substantial experience from leading technology firms is always good. They may be able to meet the NVLink performance in later versions, if not in the first release of the spec. However, they may need to pivot away from Infinity Fabric? 🤔   Overall, the industry is moving in the right direction, with UEC for scale-out and now UALink for scale-up, taking on Nvidia's dominance in AI/HPC clusters. Any thoughts? #LLMs #GenAI

Ammar Khan

Product Management Leader | Cisco Pioneer Award Recipient 2019 | Marketing | Networking| Security | Cloud | ASICs | FPGAs | Systems | Software

1w

Sharada is there a link to UAlink consortium page ?

Like
Reply
Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1w

The formation of the UALink consortium represents a pivotal shift towards fostering an open standard for high-speed interconnects in AI and HPC systems. Your post highlights that while UALink aims to surpass Nvidias NVLink by supporting up to 1,024 accelerators with direct memory access, the real challenge lies in overcoming entrenched proprietary solutions and technical constraints of Infinity Fabric. Historically, similar shifts, such as the adoption of PCIe over proprietary interfaces, faced significant hurdles before achieving widespread success. Considering the potential for UALink to drive innovation in AI hardware, how do you envision overcoming the technical limitations of Infinity Fabric compared to NVLink? Additionally, what specific strategies could startups employ to leverage UALinks open standard for creating scalable, high-performance AI solutions?

Ameya Joshi

Stanford | IIT Gandhinagar

1w

Sharada Yeluri could you please share your thoughts on how nvlink compares to infinity fabric today in terms of bandwidth and latency ? Thanks

Like
Reply
Jeff Cooper

I like cloud security and I cannot lie...

1w

Sharada - The formation of the UALink consortium to create an open standard for high-speed connectivity in AI/HPC systems is a significant step forward - thanks for sharing! How does this situation compare to past instances where a technology started as closed or proprietary and was later opened? Is it similar to Windows initially dominating the market, only to be later challenged by the rise of Linux? Given Linux's benefits, how can we maintain openness without fragmenting into numerous distributions and branches, thereby ensuring true interoperability across the ecosystem? Or could we compare this to TCP/IP emerging as the open standard over competing protocols like IPX/SPX and AppleTalk? How can the industry avoid the pitfalls of proprietary interconnects and technical limitations to foster a truly open and innovative environment?

Charan Sundararaman

Bridging Smart I/O and High Performance Compute. CXL, UCIe, PCIe, SoC Architecture, AI/ML Qualcomm, Nuvia, Ex-Intel, Ex- Marvell

1w

Interesting development and much needed for accelerator scale-up. Other consortia such as CXL had similar origins being proprietary but got substantial momentum and group-think in a relatively short period of time. While CXL could technically pivot to support this use case, industry standards that get very popular bear the burden of legacy and backwards compatibility and a fresh approach may be more lightweight and better suited for this purpose.

Pratik Das

5G Technical Marketing at Qualcomm | Opinions are my own

1w

Thank you for consistently distilling so much information and real-world experiences into your posts!

Gajanana hegde

Software Designer at Aruba, a Hewlett Packard Enterprise company

1w

Interesting!

See more comments

To view or add a comment, sign in

Explore topics