A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms

TitleA Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms
Publication TypeConference Papers
Year of Publication2011
AuthorsWei Z, JaJa JF
Conference NameParallel Distributed Processing Symposium (IPDPS), 2011 IEEE International
Date Published2011/05//
Keywordsarchitecture;graphics, B-tree, C1060;central, construction;multicore, CPU;multithreaded, data, device, dictionary, equipment;coprocessors;data, files, GPU;computer, graphic, indexer;Intel, pipelined, platform;high-throughput, PROCESSING, Quad-core;NVIDIA, strategy;hybrid, structure;CUDA, structure;inverted, structures;multiprocessing, systems;, Tesla, trie, unified, unit;computer, unit;heterogeneous, X5560, Xeon
Abstract

Given a collection of documents residing on a disk, we develop a new strategy for processing these documents and building the inverted files extremely fast. Our approach is tailored for a heterogeneous platform consisting of a multicore CPU and a highly multithreaded GPU. Our algorithm is based on a number of novel techniques including: (i) a high-throughput pipelined strategy that produces parallel parsed streams that are consumed at the same rate by parallel indexers, (ii) a hybrid trie and B-tree dictionary data structure in which the trie is represented by a table for fast look-up and each B-tree node contains string caches, (iii) allocation of parsed streams with frequent terms to CPU threads and the rest to GPU threads so as to match the throughput of parsed streams, and (iv) optimized CUDA indexer implementation that ensures coalesced memory accesses and effective use of shared memory. We have performed extensive tests of our algorithm on a single node (two Intel Xeon X5560 Quad-core) with two NVIDIA Tesla C1060 attached to it, and were able to achieve a throughput of more than 262 MB/s on the ClueWeb09 dataset. Similar results were obtained for widely different datasets. The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters.

DOI10.1109/IPDPS.2011.107