Unmatched FLASHLIGHT™ Telemetry on SONiC

This blog discusses Innovium’s Breakthrough FLASHLIGHT™ Telemetry, a comprehensive telemetry solution addressing the emerging challenges faced by Operators today in keeping their modern, terabit-scale data center networks operating at peak performance. Data Center Operators are grappling with tremendous bandwidth growth while at the same time acceleration of cloud-native applications are demanding lower latency and higher throughput in order to deliver high service quality, high availability and offer SLAs. Detecting and troubleshooting anomalous network behavior require deep, highfrequency network telemetry to capture undetectable events such as short-lived microbursts. FLASHLIGHT, was designed from the ground-up to address the following customer needs

  • Correlate network telemetry data to application flows
  • To not overwhelm network collectors with deluge of telemetry data
  • Deliver end-2-end network visibility and actionable insights for traffic flows
  • Provide telemetry data for traffic running at line-rate without involving processors/cpus

In this blog, we will take a closer look at Innovium’s comprehensive FLASHLIGHT Telemetry solution offering unmatched insights into difficult to detect yet often performance crippling events. FLASHLIGHT enables Application centric, flow focused telemetry, including comprehensive monitoring, high frequency sampling and line-rate hardware driven predictive analytics.

FLASHLIGHT Telemetry on SONiC

Innovium’s FLASHLIGHT telemetry is now supported on open source SONiC. FLASHLIGHT provides operators running SONiC, the tools to help tackle the ever-increasing challenges in getting actionable insights in real-time to implement meaningful traffic engineering. In fact, Innovium has supported several SONiC releases on top of Innovium’s clean slate SDK. Innovium supports common SAI API across its entire product line. The diagram below shows releases made as well as our upcoming SONiC release roadmap.

SONiC provides a great platform to easily integrate FLASHLIGHT’s powerful telemetry features. The Redis database approach has ensured fast integration of FLASHLIGHT’s finegrained control and deep visibility features. Additionally, comprehensive test suites have enabled fast validation of FLASHLIGHT’s telemetry features and ensure no impact to existing functionality, low latency and high thru-put. These together have enabled fast customer delivery of high value FLASHLIGHT Telemetry features in SONiC.

Innovium recently presented the unique benefits of FLASHLIGHT at OCP2020 tech week summit. Here we will highlight a few of FLASHLIGHT’s capabilities, now available in SONiC, that enables in-depth visibility into Data Center traffic.

Buffer Drop Capture (BDC)

FLASHLIGHT’s Buffer Drop Capture (BDC) telemetry solution provides full visibility of flows impacted by packet drops. Packet drops may occur due to persistent congestion in the network or microburst events due to incasts, leading to application performance degradation.

Data center operators need visibility info about flows impacted during drop events without overwhelming collectors. Fine grained control in FLASHLIGHT enables capturing just drops due to microbursts or congestion with unmatched flow centric analytics. Collectors can be the local CPU or any server in the network. BDC metadata enables full view of all flows impacted by packet drops.

BDC provide fine-grained flow visibility, down to packet level view, including time of event capture, full 5-tuple identification, identifying application flows impacted, size of packet experiencing drops and the real-time queue occupancy when the drop event occurred.

High Delay Capture (HDC)

Packet delays may occur due to congestion or microburst events due to incasts. It is fairly obvious that packet drops can degrade application performance, but high delays can equally result in the same crippling effect. FLASHLIGHT’s High Delay Capture (HDC) Telemetry helps operators gain full visibility into Flows impacted by High Delays in the network.

Data Center operators can gain invaluable insights from visibility info about flows impacted during high delay events. FLASHLIGHT’s powerful hardware analytics engines prevent a deluge of data from overwhelming collectors. Fine grained control in FLASHLIGHT enables capturing specific Application Flow packets delayed beyond a user set threshold, giving unmatched insights into microbursts and early indication of congestion. HDC metadata enables full view of all flows impacted by packet delays.

HDC also enables packet level analysis of flows impacted by high delays with nanosecond level granularity needed to analyze 100G flows.

Innovium Path Telemetry (IPT)

Innovium Path Telemetry (IPT) provides Comprehensive Network Health Statistics, endto- end to help pinpoint hot spots in the network. IPT enables flow centric visibility along an Application flow’s packet path in the network, showing where in the network high latencies are experienced.

FLASHLIGHT gives detailed visibility and statistics showing physical switch, port and queues along the path where congestion maybe occurring, giving an early indicator of congestion events before application degradation occurs, as shown in figure 7.

Summary

In summary, SONiC community will benefit greatly from Innovium’s comprehensive FLASHLIGHT Telemetry solution which enables an unparallel network visibility solution for difficult to capture microburst events that can impact network performance significantly. Innovium’s Buffer Drop Capture, High Delay Capture and Innovium Path Telemetry, FLASHLIGHT unleash unprecedented insights to help network operators address their toughest troubleshooting scenarios and simplify operations.