Surfing Interconnect Mark R. Greenstreet and Jihong Ren University of British Columbia, Rambus Surfing Interconnect p.1/17
Motivation Wires are the problem: Wires scale poorly with feature size: Gate delays scale with feature size. Wire delays invariant under feature size scaling. Long (i.e. cross-chip) wires have delays that increase quadratically with inverse feature size. Long-wires consume substantial power. Long wires have serious signal integrity and timing concerns. Surfing Interconnect p.2/17
Outline Surfing The timing chain Comparison with traditional synchronous and asynchronous techniques Surfing Interconnect p.3/17
Outline Surfing Surfing pipelines Wire buffering A surfing buffer for long wire interconnect The timing chain Comparison with traditional synchronous and asynchronous techniques Surfing Interconnect p.3/17
Surfing pipelines in out in out in out Data path Timing path Datapath elements are er when is asserted than when it is not. If the maximum delay of a datapath element in mode is less than the minimum timing chain delay, and the minimum delay of a datapath element in slow mode is greater than the maximum timing chain delay, then events in the datapath are attracted to coincide with the rising edge of. Surfing Interconnect p.4/17
Unbuffered Interconnect l source Wire resistance Wire capactance = r w l = c w l destination Wire delay r wc w 2 l2 Wire delay grows quadratically with length. Surfing Interconnect p.5/17
Buffered Interconnect l source 1 2 3 n Wire resistance (per segment) Wire capactance (per segment) l = r w n l = c w n Wire delay (per segment) r wc w 2 Buffer delay (per segment) Total delay destination ( ln ) 2 = δ buf = r wc w 2n l2 + nδ buf Total delay minimized when wire delay and buffer delay are equal. Optimal delay grows linearly with length δ total l 2r w c w δbuf Surfing Interconnect p.5/17
Pipelined Interconnect D Q en D Q en D Q en Φ 1 Φ 2 Φ 1 Total delay for a long wire can be greater than a clock period. Pipelining allows high throughput even with long total delay. Latches add extra overhead because they have larger delays than inverters. Handshaking alternatives are considered later. Surfing Interconnect p.6/17
Surfing Interconnect Data path edge to pulse converter e2p e2p e2p Timing path Variable delay inverers in data path provide surfing. A separate data wave surfs on each edge of the timing signal. This reduces the speed at which the timing channel must operate by a factor of two (compared with level-sensitive signaling). The edge-to-pulse converter provides a pulse on for each edge of the timing signal. Surfing Interconnect p.7/17
The Surfing Buffer in out Added drive when is asserted reduces delay. The circuit is fully static, no extra short-circuit currents, or charge-sharing. Unlike other surfing circuits, this buffer does not achieve negative overhead. Thus, we also refer to it as a soft latch. Surfing Interconnect p.8/17
The Edge-To-Pulse Converter risingedge fallingedge rising edge detector falling edge detector Separate edge detectors for rising and falling edges. Each edge detector is self-resetting outputs a pulse in response to the appropriate edge. The pulses from the two edge detectors are combined with self-resetting NOR gates. Surfing Interconnect p.9/17
Comparison: Framework Compare surfing with existing interconnect techniques: Synchronous, Two-phase, Time-borrowing, Transparent Latches Micropipeline GasP with twin-control Criteria: Velocity vs. Throughput: Velocity is distance traversed divided by latency. Velocity decreases with increasing throughput because of increasing overhead for more latches. Energy weighted velocity vs. Throughput Methodology: Optimize each approach for given metric using Elmore delay models (logical effort with wire delays). Assume wide data bus; thus control energy is dominated by datapath energy. Model parameters based on TSMC 0.18µ bulk CMOS process. Surfing Interconnect p.10/17
Design Margins Surfing: 30% difference between and slow delays for datapath. Control path delay set to midpoint. Handshaking: 30% timing margin between data path and control path. Synchronous: Assume parts graded by speed. Thus, we only consider typical process parameters here. Typically, a synchronous design must achieve its target clock fuency over all temperature and voltage conditions. Thus, we report results for a derated design operating at 1.6 volts and 100C. Temperature and power supply voltage change slowly enough that an asynchronous design can benefit from average case. We report results for surfing and the asynchronous designs operating at 1.8 volts and 25C. We include a non-derated synchronous design operating under the same conditions for the sake of comparison. Surfing Interconnect p.11/17
Velocity velocity (m/s) 12 x 106 10 8 6 Surfing Two Phase GasP twin control Micropipeline Two Phase derated 4 2 0 1 2 3 4 f (GHz) (Higher is better) Surfing Interconnect p.12/17
Energy Weighted Velocity 1 x 10 16 energy delay product (J*s/m 2 ) 0.8 0.6 0.4 Surfing Two Phase 0.2 GasP twin control Micropipeline Two Phase derated 0 0 1 2 3 4 f (GHz) (Lower is better) Surfing Interconnect p.13/17
Robustness Verified correct operation with five-corner HSPICE simulations. Verified correct operation with 0.4V peak-to-peak V dd noise in HSPICE simulations. The clock forwarding chain is the weakest link. Very long chains will drop clock edges due to drafting induced jitter amplification. Surfing Interconnect p.14/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action a action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action a b action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action a b c action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action a b c d action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action a b c d e f action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action a b c d e f g action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c i h action a b c d e f g action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action a Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action a b Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action a b h Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action a b h i Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action a b h i j Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action a b h i j Surfing Interconnect p.15/17
The Edge To Pulse Converter, Timing e d j g f a b c h i action a b c d e f g action a b h i j Surfing Interconnect p.15/17
The Event Attractor In Action delay from data edge to its corresponding edge (ns) 0.5 0.4 0.3 0.2 0.1 0 0.1 rising edge max rising edge min falling edge max falling edge min Fail after the 15 th stage delay from data edge to its corresponding edge (ns) 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 rising edge max rising edge min falling edge max falling edge min 0.2 0 5 10 15 20 stage number Delay Variation Without Surfing 0.25 0 5 10 15 20 stage number Delay Variation With Surfing Surfing Interconnect p.16/17
Conclusions and Future Work What We ve Shown Surfing works for interconnect: Better performance than pure asynchronous approaches. Competitive with best synchronous. Outperforms synchronous derated for V dd and temperature variation. A simple, fully static, surfing buffer with no short-circuit currents. Clear reduction in timing variation. Future Work Use with low voltage swing negative overhead(?) Use surfing techniques for clock forwarding, not just data. Examine crosstalk and other signal integrity issues. Surfing Interconnect p.17/17