Impressive Stats

 

By Marc Airhart. Illustrations by Martha Morales.


THE DIGITAL REVOLUTION has ignited the explosive growth of statistics and data science and the related field of machine learning – approaches that are key to turning digital information into knowledge. These fields are now so top of mind that the U.S. Bureau of Labor Statistics predicts data science will see more job growth than nearly any other field this decade. Meanwhile, statistician and data scientist are each top-10 “best jobs,” according to U.S. News & World Report.

In one of The University of Texas at Austin’s newest and most rapidly expanding departments, faculty are working now both to train a new generation of data scientists and statisticians and to use data to deepen our understanding of how to solve challenges in human health, the environment, astronomy and more.

Department on the rise

Not even a decade old, the Department of Statistics and Data Sciences (SDS) provides cutting-edge training to thousands of UT students each semester at the undergraduate and graduate levels and has launched several new degree programs. 

In fall 2022, the department welcomed its first cohort of undergraduate majors in statistics and data science. According to Department Chair Kate Calder, this new major stands out among degrees offered by peer institutions across the country. 

 

Texas is rapidly rising in the rankings, climbing 23 spots.

 

“The curriculum provides students with rigorous training in the foundational ideas of statistics in a way that emphasizes the modern skillset associated with a data scientist,” Calder said.

At the graduate level, the department offers residential master’s and Ph.D. degrees, a concurrent master’s degree in statistics for UT Austin students pursuing doctoral degrees in other fields and an online data science master’s degree. The online program is a collaboration with the Department of Computer Science and already has enrolled more students than the residential graduate programs for the two departments combined. Almost 1,000 students were in the program last academic year, hailing from across the world. These master’s students welcomed the opportunity to learn from top faculty in the field and earn their graduate degree for only about $10,000.

Since 2019, the department has tripled the number of tenured and tenure-track faculty. Calder is excited by the dynamic and rapidly evolving possibilities – including the opportunity to advance research impact throughout the University.

“The people that we’re bringing in have research agendas that are very synergistic with the big problems that faculty across our college and the University are tackling right now,” Calder said.

These researchers help contribute to a robust computing and data ecosystem at UT Austin. Other parts of that ecosystem are: a computer science program ranked in the country’s top 10, one of the world’s most powerful supercomputers at the Texas Advanced Computing Center, the NSF Institute for Foundations of Machine Learning, the Machine Learning Laboratory – those last two featuring strong participation from SDS faculty – and a home base in the technology hub that’s Austin. Now Texas is rapidly rising in the rankings, climbing 23 spots in U.S. News and World Report’s statistics graduate program rankings (from 50th to 27th) in just four years.

Researching health

Layla Parast, an SDS associate professor, evaluates factors called “surrogate markers,” which can help speed up the testing of treatments for conditions, like diabetes and dementia, that develop slowly over years. Testing a treatment based on primary outcomes, like cognitive decline, might take a decade or more.

“The idea with the surrogate marker is, if you can identify something like a blood biomarker that lets you measure how effective the treatment is earlier [in a study], then you can make a decision about the treatment earlier and get it to the larger patient population,” Parast said. 

Doing so would not only save time and money; it ultimately would help patients access needed treatments sooner. 

Also interested in improving human health is Parast’s colleague Professor Roger Peng, who uses environmental data to identify indoor or outdoor contributors to health. In one study, comparing air pollution data to millions of claims for treatment of conditions such as cardiovascular disease, Peng’s team found air pollution, even at very low levels, was statistically linked to more hospitalizations and deaths.

“We found there’s no lower threshold that’s safe for air pollution,” Peng said. “As far as we know, the safe level is zero.” He added, from a statistical perspective, to find such a small effect, the key was a very large sample size.

Widening the lens

Calder herself is part of a multi-institution collaboration aimed at understanding how life experiences shape adolescent wellbeing and health. The researchers use a rich dataset collected from 1,400 Ohio youth, including surveys, social media interactions, stress levels (indicated by cortisol levels in saliva and hair), brain imaging and location data from smart phones. 

Calder specializes in “spatial statistics,” developing tools for the project that leverage data collected at precise geographic locations to better understand how routine activity patterns and social exposures affect adolescent physical and mental health outcomes.

Analyzing the data is allowing the collaborators to find patterns in the ways that social interactions, the presence of adults, risky behavior, cigarette use and alcohol use influence educational outcomes, mental health and physiological stress.


Conservation insights

Professor Mevin Hooten uses information about how animals move through the environment to build models that aid in conservation. Hooten’s team analyzed the movement of subpopulations of polar bears, for example, along with seasonal shifts in sea ice. Polar bears are protected marine mammals that our country considers threatened. The team developed statistical approaches to better understand natural delineations among subpopulations here and in Russia. These data-based findings supported existing management boundaries between Russia and the U.S. and facilitated international conservation efforts.


Inner cities, outer space

Arya Farahi, now a UT assistant professor of statistics and data sciences, was a graduate student in astronomy and physics at the University of Michigan with an interest in data science and machine learning when dangerously high levels of lead were detected in drinking water in Flint, Michigan. The city, urgently needing to replace old lead pipes throughout the area, reached out to the university for help with a problem: where exactly in the vast underground system were the lead pipes? Historical records were inaccurate and there wasn’t enough time and money to dig up every pipe. Farahi pitched in and helped develop a model to predict where the lead pipes were, saving the city money and time. 

We’re trying to build interdisciplinary collaborations with domain experts across campus so we can have impact on the scientific community and on our local and national communities.

Farahi now relishes the ability to apply statistics and data science approaches to problems in any area that piques his interest. Through a collaboration with the City of Austin and the University’s Good Systems grand challenge, for example, Farahi evaluates how major transportation projects affect people’s housing choices, shedding light on the relationships between transportation systems, affordable housing and gentrification. In cosmology, he’s working on projects where he models galaxies and galaxy clusters to better understand dark energy and dark matter.

“I like that I have the freedom to think about two different, completely unrelated directions,” Farahi said. “This is one thing that’s unique to our department. We’re trying to build interdisciplinary collaborations with domain experts across campus so we can have impact on the scientific community and on our local and national communities.”