A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

TitleA Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors
Publication TypeJournal Articles
Year of Publication2011
AuthorsHorak MN, Nowick SM, Carlberg M, Vishkin U
JournalComputer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on
Volume30
Issue4
Pagination494 - 507
Date Published2011/04//
ISBN Number0278-0070
Keywords1.36, 800, 90, architecture;size, architectures;shared, asynchronous, benchmark, chip, chips;multiprocessor, distribution, distribution;frequency, GALS, GHz;frequency, interconnection, kernel;post-layout, layout;clock, locally-synchronous, memory, MHz;globally-asynchronous, multiple, multiprocessor;clock, multiprocessor;interface, network;mixed-timing, network;network, networks;microprocessor, networks;network, nm;circuit, Parallel, routing;network-on-chip;parallel, routing;parallel, simulation;random, synchronous, systems;, timing;low-overhead, traffic;shared-memory
Abstract

A new asynchronous interconnection network is introduced for globally-asynchronous locally-synchronous (GALS) chip multiprocessors. The network eliminates the need for global clock distribution, and can interface multiple synchronous timing domains operating at unrelated clock rates. In particular, two new highly-concurrent asynchronous components are introduced which provide simple routing and arbitration/merge functions. Post-layout simulations in identical commercial 90 nm technology indicate that comparable recent synchronous router nodes have 5.6-10.7 more energy per packet and 2.8-6.4 greater area than the new asynchronous nodes. Under random traffic, the network provides significantly lower latency and identical throughput over the entire operating range of the 800 MHz network and through mid-range traffic rates for the 1.36 GHz network, but with degradation at higher traffic rates. Preliminary evaluations are also presented for a mixed-timing (GALS) network in a shared-memory parallel architecture, running both random traffic and parallel benchmark kernels, as well as directions for further improvement.

DOI10.1109/TCAD.2011.2114970