Using N-grams to Process Hindi Queries with Transliteration Variations

Report
Authors:Natrajan, Anand, Department of Computer ScienceUniversity of Virginia Powell, Allison, Department of Computer ScienceUniversity of Virginia French, James, Department of Computer ScienceUniversity of Virginia
Abstract:

Retrieval systems based on N-grams have been used as alternatives to word-based systems. N-grams offer a language-independent technique that allows retrieval based on portions of words. A query that contains misspellings or differences in transliteration can defeat word-based systems. N-gram systems are more resistant to these problems. We present a retrieval system based on N-grams that uses a collection of Hindi songs. Within this retrieval system, we study the effect of varying N on retrievability. Additionally, we present an alternative spell-checking tool based on N- grams. We conclude with a discussion of the number of N-grams produced by different values of N for different languages and a discussion of the choice of N.

Rights:
All rights reserved (no additional license for public reuse)
Language:
English
Source Citation:

Natrajan, Anand, Allison Powell, and James French. "Using N-grams to Process Hindi Queries with Transliteration Variations." University of Virginia Dept. of Computer Science Tech Report (1997).

Publisher:
University of Virginia, Department of Computer Science
Published Date:
1997