The Right Silicon Architecture for Data Center Networks

Why should you care about an optimized silicon architecture for your network?

Suppose you need an electric sports car. An automotive company, in an effort to streamline OPEX, architects a vehicle with both gas and electric transmissions, which can transport over ten people, for both racing and transportation applications. It is one architecture and certainly feasible. When asked for an electric sports car, they depopulate the gas transmission and most of the seats. It is easy to see why that would be inelegant, unoptimized, and no match for the responsiveness and attractiveness of a grounds-up electric sports car. Optimized products and architectures have always won over time across industries: cars, airlines or electronics. With a focus on the right customer and application requirements, they are simply the right, superior choice.

How does this analogy work for Data Center Networks and Silicon architectures?

Data Centers: Networks are Critical

Over the last decade, cloud data centers have become the engine for our digital world. And in the current environment, where everyone is working, studying and connecting remotely, these data centers have become even more critical. Cloud-native applications, running in these data centers, are enabling businesses and providing exceptional customer experiences to drive our digital economy. These applications use microservices based application architecture along with increasing AI/ML, based on a highly distributed, scalable computing model. A single transaction or query leads to hundreds and thousands of IO requests spanning across racks in the data center going over the data center network. It drives higher bandwidth within the data center (aka east-west IO). It puts higher emphasis into latencies at every network hop. It implies that network congestion or delays at any of these hops impact the user experience and requires rich telemetry/analytics to diagnose and resolve issues quickly.

Figure 1: Typical Data Center Network

As these data centers scale up to hundreds of thousands of servers, customers have to deploy 100s up to 10,000 switches in each of them. While these switches need to scale in bandwidth, they also need significant power and cost efficiency to meet carbon footprint and TCO goals. And customers want to protect their investment in hardware so that as requirements evolve, they are able to deliver those emerging requirements without sacrificing power, latency, or performance.

The Best Switch Architecture Focused on Data Centers

Clean Sheet Grounds-up vs Legacy, Bloated Architectures

Innovium’s switch R&D team has deep data-center experience with the world’s best track record. This team has delivered multiple generations of switch products powering the world’s largest data centers resulting in tens of billions of dollars in switch system revenues. At the outset, Innovium’s focus was to create a clean-sheet, grounds-up optimized architecture that would be ultra-efficient for data centers. This resulted in the patented, data-center optimized TERALYNX switch family. The Innovium TERALYNX family uses a breakthrough, scalable architecture that delivers the best performance, power-efficiency, lowest latency, largest on-chip buffers, unmatched telemetry and leading performance/$. It delivers a highly elegant and responsive solution to address the needs of data-centers for the next decade and beyond.

Optimized Architecture for Data Centers

Some alternate switch products have been architected to serve an expansive and disparate range of markets with a single product, like service provider, WAN and data centers, resulting in a bloated, inefficient architecture. Target markets drive critical feature requirements to make the products successful. And these critical features influence the architectural design and impact key success factors like latency, performance, scale , power and cost. For instance, hierarchical queues and hierarchical QoS are required for service provider/ WAN solutions, but burden data center applications where these are not needed. Another example is scale of features and table sizes, which are heavily influenced by the target market and applications. In order to support these market specific features, the switch architecture requires additional data structures, silicon logic, silicon area (resulting in higher cost), higher latency and higher power consumption. In order to serve multiple markets, alternate solutions have had to compromise on key attributes, like performance, power-efficiency or telemetry, which are critical for the data center.

Figure 2: Target Markets Impact Key Success Factors

Power Efficient Architecture to deliver Carbon Negative Goals

Power efficiency is a highly critical requirement in modern data centers today. Large data-center operators like Microsoft, Google, Amazon, Facebook and others have set aggressive schedule targets to become Carbon-neutral or Carbon-negative, which require bold, non-incremental decisions on the part of their R&D and supply chain groups.

Network switches in the ToR, Leaf and Spine networks constitute an increasingly sizeable portion of the overall infrastructure power. In a data center with 100K+ servers, networks can have up to 10K switches and power consumed by these switches is growing at a rapid pace as connectivity speeds increase.

From day one, power efficiency has been a key area of focus in TERALYNX architecture, going beyond process power savings provided by Moore’s law. The team made the most optimal trade-off design decisions in the architecture to achieve the best power efficiencies without compromising features, flexibility, performance and latency. As seen in the figure, ultra-efficient TERALYNX architecture consumes 27 – 35% lower power for TERALYNX switches compared to alternate products, including some that are designed in a next generation process technology.

Figure 3: TERALYNX Power-efficiency compared with Alternative Switch Silicon

Optimized Architecture to deliver Lowest Latency & Best Application Performance

As we discussed earlier, low latency is a critical requirement for a data center network where customers run distributed microservices based cloud-native applications and AI/ML applications. The packet processing architecture of TERALYNX provides much lower latency compared to alternate switch silicon architectures. As shown in figure 4, latency of alternate switch silicon is 60 – 120% higher than TERALYNX switch latency. Applications often need to communicate over multiple network hops as microservices communicate with each other. Latency from each network hop adds up and impacts application performance significantly. With the industry’s lowest latency, TERALYNX delivers best application performance in a data center.

Figure 4: TERALYNX Architecture Deliver Lowest Latency

Optimized Buffer Architecture for Data Centers

Buffers are needed in a switch to queue packets before they are forwarded. The architecture and size of buffers are influenced by factors such as throughput of the switch, connectivity speeds and radix, congestion control mechanisms deployed and placement in the specific network tier. Innovium has architected its TERALYNX product family on-chip buffers such that customers get optimal network quality, lowest latency and best performance, even when incasts or microbursts occur.

There is a mis-conception perpetuated by some that microbursts in a large data-center with hundreds of applications only occur on a single port at a given point of time. In large data centers where communication occurs between tens of thousands of microservices/servers, especially in upper network tiers, microbursts often happen on many ports simultaneously. Hence, highlighting microburst absorption in a simplistic scenario is very misleading. Further, even in that scenario, latency delays can build up to tens or hundreds of milliseconds negatively impacting application performance through TCP delays and retransmits. The TERALYNX architecture has been designed to optimally handle microbursts while maintaining high throughput and low latency in real life data center deployments. In these deployments, congestion control mechanisms like Data Center TCP (DCTCP), a standard part of Linux, are typically used to dynamically size the TCP congestion window. Companies like Amazon, Google and others have implemented congestion control mechanisms to enable smooth, large-scale data center operation. The optimized TERALYNX buffer architecture enables the best application performance, even for demanding applications such as Hadoop and MongoDB, with plenty of margin. TERALYNX silicon and systems deliver a range of optimized products to handle micro-bursts at the top of rack and upper tiers, without compromising latency, power-efficiency or visibility.

Unmatched Telemetry at Terabit Scale

Innovium architected its TERALYNX switch family with a breakthrough technology called FLASHLIGHT that can provide high-frequency telemetry and hardware-driven telemetry and analytics required by network admins to address the toughest troubleshooting scenarios – when microbursts occur and packets start experiencing delays and drops resulting in poor application performance. Traditionally, application admins point finger at network admins and it often takes several days before root cause is found and resolved.

Innovium FLASHLIGHT provides network admins deep real-time analytics that are useful to troubleshoot and resolve these scenarios quickly. The deep analytics provides correlation to applications and flows, so network admin can determine which application/flow caused network congestion during a microburst event as well as the applications getting impacted. TERALYNX is able to provide actionable and relevant real-time information when that microburst happens rather than sending too much data that alternate switch systems provide today, which overload telemetry collectors/systems. FLASHLIGHT does that in hardware and running at line-rate in hardware. FLASHLIGHT can also provide telemetry information in-band, and diagnose network issues ahead of escalation. FLASHLIGHT provides end-to-end network health telemetry and analytics to simplify operations.

Figure 5: Rich Telemetry & Analytics Architected into TERALYNX

Future Proof, Programmable Infrastructure

TERALYNX has been architected to future proof the data-center network infrastructure without compromises. TERALYNX switches are programmable, which enables users to support new network protocols and innovation through standards-based programming and field upgradeability. More importantly, programmability in TERALYNX is delivered without compromises – i.e., not impacting performance, power or latency.

Summary

As highlighted above, the Innovium team continues to innovate at an incredible pace in architecting and designing the TERALYNX product family to deliver critical capabilities required by data center customers. The figure below highlights total number of patents at Innovium (as of Nov ‘20) relative to other comparable companies at the time of their IPO, showcasing Innovium’s tremendous grounds-up, architectural innovations.

Figure 6: Tremendous Architectural Innovations: Patents of comparable companies

The architectural innovations in TERALYNX deliver customers the following breakthrough advantages:

  • Best application performance from lowest latency and optimal buffers
  • Design green data centers to expediate goals for carbon negative targets from highly power efficient switch architecture
  • Best TCO (total cost of ownership) from power and cost optimized solutions
  • Investment protection for the network infrastructure without any compromises
  • Simplified operations from deep network insights and analytics

Switches based on TERALYNX architecture are proven and deployed at scale by the world’s top cloud and data center customers. They now have 24% market share worldwide, as of Q3’2020, for DC switches with 50G SerDes.

If you are architecting modern best-in-class data-centers of the future with an open, disaggregated model, please contact us at: [email protected] to learn more on how we can support.