layout: page title: “Linguistic Modelling At Scale” permalink: /linguistic_modeling/

Linguistic Modelling At Scale


Raw unstructured text data has high value for insight but is tough to work with.

Buried within text data is;


I’ve written 2 academic papers on this area that not only do the work of 2 weeks of 5 researchers time in half a day, I’ve also derived novel insights and understandings of social groups that had eluded identification from talented scholars.

My research papers are:

Furthermore, I’m currently developing a research paper that not only models unknown topics within user groups, it also forecasts the popularity of these topics to better aid tactical planning to engage with these topics.


A generalised methodology follows this process that I manage and implement:

  1. Data collection (web scraping or sql extraction)
  2. Data cleaning
  3. Text cleaning (tokenizing, stopword filtering, part-of-speech filtering)
  4. Text modeling (dimensionality reduction, topic modeling, vector space embedding)
  5. Model interrogation (answering the business question)
  6. Reporting, insight sharing and action recommendations