Determining Stopping Criteria in the Generation of Web-Derived Langua ge ModelsReport
In this work, we present a small-scale evaluation of two query-based sampling techniques for building language models, using a database comprised of world-wide web documents. We propose a metric by which it is possible to determine when to cease sampling a given web database, and we compare this new metric to other metrics that have been used in previous work to determine the fidelity of sampled language models.
All rights reserved (no additional license for public reuse)
Monroe, Gary, David Mikesell, and James French. "Determining Stopping Criteria in the Generation of Web-Derived Langua ge Models." University of Virginia Dept. of Computer Science Tech Report (2000).
University of Virginia, Department of Computer Science