Using Machine Learning and Information Retrieval to Identify Federally Funded Research and Development Trends

Article
Authors:Linehan, KathrynUniversity of Virginia ORCID icon orcid.org/0000-0002-2142-2136Oh, EricUniversity of Virginia Thurston, Joel University of Virginia SIWE, Guy LeonelUniversity of Virginia Kindlon, AudreyNational Center for Science and Engineering Statistics Jankowski, JohnNational Center for Science and Engineering Statistics Shipp, StephanieUniversity of Virginia
Abstract:

A vast amount of information on federally funded research and development (R&D) is available and can be utilized by researchers, policymakers, and the public to uncover insights on the directionality and extent of government R&D funding. In this work, we use natural language processing (NLP), machine learning, and information retrieval techniques to classify broad research topics and pandemic-related research topics contained within Federal RePORTER grant abstracts, a typical example of a scientific award database. In collaboration with the National Center for Science and Engineering Statistics (NCSES), we examine these topics, their trends over time, and how the topics and their trends change as a result of the number of topics produced by the model. The methods described in this paper show promise to supplement the information currently collected through the NCSES Federal Survey of Funds for Research and Development (FFS) and Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions (FSS) by providing information that the surveys do not collect.

Language:
English
Source Citation:

Linehan K, Oh E, Thurston J, SIWE GL, Kindlon K, Jankowski J, Shipp S (2024). Using Machine Learning and Information Retrieval to Identify Federally Funded Research and Development Trends. University of Virginia. DOI: https://doi.org/10.18130/xtmk-f634

Publisher:
University of Virginia
Published Date:
Janauary 2024
Sponsoring Agency:
National Center for Science and Engineering Statistics
Notes:

The National Center for Science and Engineering Statistics contract #49100420C0015 funded this research.

We thank the students who contributed to this project in the 2020 and 2021 Data Science for the Public Good programs through the Social and Decision Analytics Division (SDAD), Biocomplexity Institute (BI), University of Virginia. These students are (listed in alphabetical order) Martha Czernuszenko, Lara Haase, Elizabeth Miller, Cierra Oliveira, Sean Pietrowicz, Haleigh Tomlin, and Crystal Zang. We also thank Samantha Cohen, a former SDAD Postdoctoral Research Associate, for contributing to the text cleaning and processing work. We thank Sallie Keller for helpful discussions and suggestions throughout the project. Dr. Keller is a distinguished professor of biocomplexity, division director of SDAD and BI, and professor of public health sciences at the University of Virginia. She is currently on an IPA with the US Census Bureau.