Design and Development of an Age Classification System of Twitter Users Based on Machine Learning Techniques

Alberto Marqués-Alba. (2017). Design and Development of an Age Classification System of Twitter Users Based on Machine Learning Techniques. Trabajo Fin de Titulación (TFG). ETSI Telecomunicación, Universidad Politécnica de Madrid.

Nowadays, the way in which people interact has changed thanks to the great revolution that the use of Internet has supposed, reaching almost every corner of the planet. Almost everyone is connected, so people can access and share information easily with their devices in every moment. One of the greatest social changes that the world has suffered in the last years is the appearance of the social networks, where people share information of all types with friends, relatives, companies or even unknown people, generating a huge quantity of data. All this information hides clues that can be used to elaborate a profile with attributes of the user like gender, age, likes, religion, etc. The demand of this kind of information is increasing mainly due to its high commercial value that can be used by the companies, for example, for the elaboration of a commercial campaign adapted to its clients. In this project we are going to analyze this kind of attributes focusing on the age and the data source chosen is Twitter, which can provide an enormous amount of valuable information in form of text that can be classified and processed. This analysis is based on the behavior differences that can be observed in the social networks between the different age groups, that are evident in the use of the language with different expressions and grammatical structures. The main task is the development of a Machine Learning based classifying system that allows predicting the age of the users with enough accuracy so the results obtained can be useful. For this, the chosen programming language is Python and it will be necessary the use of Machine Learning tools for the processing of the text using different algorithms so we can obtain the maximum accuracy possible. It will be necessary to start researching, looking for information related to our project and analyzing the state of the art. Once we have collected enough information, the next step is the analysis of the requirements of our project and learning the programming environment and the use of the tools needed. With this base we can start with the development of the project.