layout: page title: “Linguistic Modelling At Scale” permalink: /linguistic_modeling/
Linguistic Modelling At Scale
Problem
Raw unstructured text data has high value for insight but is tough to work with.
Buried within text data is;
- What your customers truly think of your products,
- What your next product should do for your customers
- What the dominant zeitgeists are within your customers… and how they will progress in the future
Impact
I’ve written 2 academic papers on this area that not only do the work of 2 weeks of 5 researchers time in half a day, I’ve also derived novel insights and understandings of social groups that had eluded identification from talented scholars.
My research papers are:
Furthermore, I’m currently developing a research paper that not only models unknown topics within user groups, it also forecasts the popularity of these topics to better aid tactical planning to engage with these topics.
Method
A generalised methodology follows this process that I manage and implement:
- Data collection (web scraping or sql extraction)
- Data cleaning
- Text cleaning (tokenizing, stopword filtering, part-of-speech filtering)
- Text modeling (dimensionality reduction, topic modeling, vector space embedding)
- Model interrogation (answering the business question)
- Reporting, insight sharing and action recommendations