I consult on NLP and geospatial problems. Please see below for a list of my projects. If my interests align with a problem you have, please inquire!

Text classification and information extraction (Apixio, 2017–present)

I work with the fantastic data science team at Apixio to build models that scan medical charts for items of clinical or administrative salience. The models range from simple (logistic regression with heavily engineered features) to more complex (such as LSTMs in Tensorflow with self-attention for extracting relationships between tokens).

Topic modeling (Solvvy, 2020)

I did a short contract with Solvvy where I experimented with various unsupervised clustering models on customer-generated content. This was an ideal project in that I had a lot of experience with Latent Dirichlet Allocation in grad school. When the project ended, the CTO had kind words for me:

Will has been one of the most thorough, diligent, honest, and intelligent professionals I have ever worked with. Not only does he have a fantastic command of advanced ML techniques and algorithms, but he wields that knowledge with all the prudence and practicality required by industrial research applications. Will was an absolute pleasure to work with and I look forward to collaborating many more times in the future! —Justin Betteridge, CTO at Solvvy

Oilfield groundwater monitoring (USGS, 2016–present)

I assist the California Oil, Gas, and Groundwater Program at the US Geological Survey in its ongoing effort to monitor groundwater resources in and around California oilfields. My role has been to build spatial models using Gaussian processes to map groundwater salinity. I and teammates are also exploring ways to bolt petrophysical models onto the Gaussian process, to jointly model related quantities such as rock conductivity, rock porosity, and temperature.

Papers

Groundwater salinity mapping using geophysical log analysis within the Fruitvale and Rosedale Ranch oil fields, Kern County, California, USA. Michael J. Stephens, David H. Shimabukuro, Janice M. Gillespie, and Will Chang. Hydrogeology Journal. 2018.
Stratigraphic and structural controls on groundwater salinity variations in the Poso Creek Oil Field, Kern County, California, USA. Michael J. Stephens, David H. Shimabukuro, Will Chang, Janice M. Gillespie, and Zack Levinson. Hydrogeology Journal. 2021.
Mapping aquifer salinity gradients and effects of oil field produced water disposal using geophysical logs: Elk Hills, Buena Vista and Coles Levee Oil Fields, San Joaquin Valley, California. Janice M. Gillespie, Michael J. Stephens, Will Chang, and John G. Warden. PLOS ONE. 2022.

Linguistic phylogenetics (Graduate Linguistics, 2007–2015)

It was linguistics that turned me into a statistician. In my first year of grad school I was amazed to read a statistical analysis that inferred the shape and chronology of the family tree of Indo-European languages. My astonishment was along the lines of: how can these matters of human judgment be quantified, and how can any amount of math capture the relevant phenomena? As much as I admired the paper, however, I resisted its conclusion, which is that Indo-European languages are 9000 years old. Almost all linguists believed 6000 years to be more accurate. So this paper simultaneously gave me something to strive for and against, and ended up shaping the rest of my career. Seven years and countless stats classes later, I coauthored a response. Now I use math to model human judgment every day.

Papers

Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis. Will Chang, Chundra Cathcart, David Hall, and Andrew Garrett. Language 91.1:194-244. 2015.
Press coverage: Science/AAAS News, New York Times.
Awards: Best Paper in Language.
A relaxed admixture model of language contact. Will Chang and Lev Michael. Language Dynamics and Change 4:1-26. 2014.
Exploring phonological areality in the circum-Andean region using a naive Bayes classifier. Lev Michael, Will Chang, and Tammy Stark. Language Dynamics and Change 4:27-86. 2014.

Websites

2013–2015. I helped to maintain the South American Phonological Inventory Database.

Talks

Notes on Bayesian Lexicostatistics. LING 230. April 2016.
A vanishing, multiple-gain lexical trait model. Workshop Towards a Global Language Phylogeny. Max Planck Institute for the Science of Human History, Jena. September 2014.
Linguistic mirages and lexical borrowing between Tongan and Samoan. 9th International Conference On Oceanic Linguistics (COOL9). University of Newcastle, Australia. February 2013.
The distribution of Polynesian words. 39th Annual Meeting of the Berkeley Linguistics Society. University of California, Berkeley. February 2013.
Probabilistic generative models of language contact. Workshop on Quantitative Approaches to Areal Linguistic Typology. Koninklijke Nederlandse Akademie van Wetenschappen, Amsterdam. December 2012.

Education

M.A. Linguistics, U.C. Berkeley. 2009.
M.S. Computer Science, U.C. Berkeley. 1998.
B.S. Electrical Engineering / Computer Science, U.C. Berkeley. 1994.

Employment

Sr Research Scientist, Semantic Machines, 2014–2016.
Sr Software Engineer, Cadence Design Systems, 1998–2005.