Will Chang
Machine learning freelancer

I consult on NLP and geospatial problems. Please see below for a list of my projects. If my interests align with a problem you have, please inquire!

Text classification and information extraction (Apixio, 2017–present)

I work with the fantastic data science team at Apixio to build models that scan medical charts for items of clinical or administrative salience. The models range from simple (logistic regression with heavily engineered features) to more complex (such as LSTMs in Tensorflow with self-attention for extracting relationships between tokens).

Topic modeling (Solvvy, 2020)

I did a short contract with Solvvy where I experimented with various unsupervised clustering models on customer-generated content. This was an ideal project in that I had a lot of experience with Latent Dirichlet Allocation in grad school. When the project ended, the CTO had kind words for me:

Will has been one of the most thorough, diligent, honest, and intelligent professionals I have ever worked with. Not only does he have a fantastic command of advanced ML techniques and algorithms, but he wields that knowledge with all the prudence and practicality required by industrial research applications. Will was an absolute pleasure to work with and I look forward to collaborating many more times in the future! —Justin Betteridge, CTO at Solvvy

Oilfield groundwater monitoring (USGS, 2016–present)

I assist the California Oil, Gas, and Groundwater Program at the US Geological Survey in its ongoing effort to monitor groundwater resources in and around California oilfields. My role has been to build spatial models using Gaussian processes to map groundwater salinity. I and teammates are also exploring ways to bolt petrophysical models onto the Gaussian process, to jointly model related quantities such as rock conductivity, rock porosity, and temperature.


Linguistic phylogenetics (Graduate Linguistics, 2007–2015)

It was linguistics that turned me into a statistician. In my first year of grad school I was amazed to read a statistical analysis that inferred the shape and chronology of the family tree of Indo-European languages. My astonishment was along the lines of: how can these matters of human judgment be quantified, and how can any amount of math capture the relevant phenomena? As much as I admired the paper, however, I resisted its conclusion, which is that Indo-European languages are 9000 years old. Almost all linguists believed 6000 years to be more accurate. So this paper simultaneously gave me something to strive for and against, and ended up shaping the rest of my career. Seven years and countless stats classes later, I coauthored a response. Now I use math to model human judgment every day.





M.A. Linguistics, U.C. Berkeley. 2009.
M.S. Computer Science, U.C. Berkeley. 1998.
B.S. Electrical Engineering / Computer Science, U.C. Berkeley. 1994.


Sr Research Scientist, Semantic Machines, 2014–2016.
Sr Software Engineer, Cadence Design Systems, 1998–2005.