Natural Language Processing: The Frontier to Qualitative GIS
When people talk about GIS, there are several factors that come up to mind instantly, such as creating beautiful maps, spatial analyses, and quantitative data. But what about qualitative GIS? Does that come up in your mind? At this time of writing (2017), there is a good chance that the answer is no; however, I guarantee you that in the next 5 – 10 years from now (maybe earlier) this will become a hot topic in the geospatial world, despite its current infancy.
Why Qualitative Data?
We use quantitative data more because it is easier and abundant for computers to analyze and generate results. Numbers definitely have value and a story to tell of what’s going on, but at the end of the day we’re human beings looking for the social implications of it or the “so what?” One classic example is the quantitative relationship between health and financial indicators. It is important to have wealth as a numerical indicator to have better access to resources and thus, better life. However, this is not entirely true. Turns out, Bloomberg reported several countries have beat or are in relatively good standing compared to some of the richest nations (this includes per capita). If you notice, there are several nations in the top 20 that are relatively poorer, have “winter” economies, high unemployment rate, and lack of opportunities than wealthier nations (#17 Canada, #23 UK, #34 USA) – those are the Mediterranean countries with Italy as #1. How these countries thrive is based on being more family-oriented, higher social cohesion, a bit more stress-free and simple lifestyles, excellent diets, and warmer climates. The point is some of this data (family-oriented, lifestyle) may require qualitative data to fill an explanatory story, something with which quantitative data analysis sometimes struggles, especially in the interdisciplinary fields of social sciences, geography, and health.
Unfortunately though, qualitative research is often frowned upon because going through ethical protocols and can be extremely time consuming, especially when employing conventional methods (i.e. semi-structured interviews and ethnography) and targeting specific groups (i.e. informal caregivers), which can lead to statistical insignificance (n < 30). Though that doesn’t mean we should disregard qualitative data! Qualitative research just hasn’t reached its full potential. However, this will change due to the rapid pace of analytics and influx of data feeding into databases on a daily basis. With that, allow me to introduce you to Natural Language Processing (NLP) as one of the methods.
Natural Language Processing
Think of Alexa (Amazon), Cortana (Microsoft), Siri (Apple), or to a certain extent IBM Watson. That’s what they do in the back-end, Natural Language Processing (NLP). In a nutshell, NLP analyzes massive amounts of text or corpora as a way for machines to understand how humans speak. It has been around for a while, such as machine translation and automated question answering. What’s different today and in the near future is the advances in automatic text summarization, sentiment analysis, topic extraction, relationship extraction, and stemming. Some of these are gateways and foundations to Artificial Intelligence. Indirectly speaking, we are facilitating companies (i.e. Facebook, Google) by using their “free” messaging apps – sending text messages back and forth, which is then collected to constantly train and improve NLP. Certainly, it’s not perfect, but the gaps are closing over time. Anyhow, for the qualitative GIS context, I’ll be focusing on sentiment analyses and what examples can be applied.
Sentiment Analyses in GIS
Sentiment analysis is the process used to identify the behaviour of a writer or speaker, which is widely used in reviews, survey responses, and social media. With that, this type of NLP is becoming more popular and it’s something that can be applied in qualitative GIS since lots of digital data discloses geographic location. There are countless of qualitative GIS applications via sentiment analyses, but some major examples include: 1) urban and transportation planning, 2) actual accessibility, 3) community improvement and engagement, and 4) aggregated health profiling.
So how to create one? Well, it depends on the topic and objectives. For starters though, here are some recommended steps. Note: going through these steps are much harder than it looks and you will need to know some programming, especially NLP (Python / R).
- Create an online web mapping app with user-interface components (qualitative data input & geotagging) with some level of anonymity – this is sometimes referred as participatory GIS (a subcategory of qualitative GIS)
- Set up back-end database to capture the required information
- Set up back-end processes to perform NLP from database
- Cleanly compile the results and geotagged position (i.e. sentiment analyses, hierarchical clustering, quantifying)
- Aggregate compiled results to coarser geographic scale (i.e. 250 m hexagonal bins) & classify results for display
And voila! You’re ready to execute it your qualitative GIS app. This is something I proposed in my dissertation and hope to one day develop for research and insightful purposes.
On an ending note, qualitative GIS can act as an effective knowledge translation tool/app towards decision-makers; thus, with sufficient information, decisions can become more efficient and accurate to improve on whatever topic it is. This type of research and development reintroduces citizenship by empowering residents (with their voices being heard) and as a gateway to open data sites.