Determining Stopping Criteria in the Generation of Web-Derived Langua ge Models

Report
Authors:Monroe, Gary, Department of Computer ScienceUniversity of Virginia Mikesell, David, Department of Computer ScienceUniversity of Virginia French, James, Department of Computer ScienceUniversity of Virginia
Abstract:

In this work, we present a small-scale evaluation of two query-based sampling techniques for building language models, using a database comprised of world-wide web documents. We propose a metric by which it is possible to determine when to cease sampling a given web database, and we compare this new metric to other metrics that have been used in previous work to determine the fidelity of sampled language models.

Rights:
All rights reserved (no additional license for public reuse)
Language:
English
Source Citation:

Monroe, Gary, David Mikesell, and James French. "Determining Stopping Criteria in the Generation of Web-Derived Langua ge Models." University of Virginia Dept. of Computer Science Tech Report (2000).

Publisher:
University of Virginia, Department of Computer Science
Published Date:
2000