Using N-grams to Process Hindi Queries with Transliteration VariationsReport
Retrieval systems based on N-grams have been used as alternatives to word-based systems. N-grams offer a language-independent technique that allows retrieval based on portions of words. A query that contains misspellings or differences in transliteration can defeat word-based systems. N-gram systems are more resistant to these problems. We present a retrieval system based on N-grams that uses a collection of Hindi songs. Within this retrieval system, we study the effect of varying N on retrievability. Additionally, we present an alternative spell-checking tool based on N- grams. We conclude with a discussion of the number of N-grams produced by different values of N for different languages and a discussion of the choice of N.
All rights reserved (no additional license for public reuse)
Natrajan, Anand, Allison Powell, and James French. "Using N-grams to Process Hindi Queries with Transliteration Variations." University of Virginia Dept. of Computer Science Tech Report (1997).
University of Virginia, Department of Computer Science