Textual analysis and machine learning

Textual analysis and machine learning

Credits
5
Holder
Prof. Thomas Renault & Prof. Mathieu Picault
Language
English
Location
HEC Liège – Management School of the University of Liège. Rue Louvrex 14, Liège
Field
Finance / Accounting (FIN/ACC)

Course Description

Textual analysis and machine learning with applications to economics and finance

Instructors

Thomas Renault is an Assistant Professor at the University Paris 1 Panthéon-Sorbonne (France) and scientific advisor at the French Council of Economic Analysis. He has received his PhD diploma in 2017 from Paris 1 Panthéon-Sorbonne. His work focuses on the use of textual data – mostly from social media – to forecast financial markets and to construct novel indicators to track economic conditions. He is teaching the following classes at the University Paris 1 Panthéon-Sorbonne: Applied Data Science in Finance, Applied Big Data in Finance, Introduction to Python, Digital Data and Network Analysis. He has published articles in the Journal of Public Economy, the Journal of Banking and Finance, and in the Journal of International Money and Finance.

Matthieu Picault is an Assistant Professor at the University of Orléans (France) in the Laboratoire d’Economie d’Orléans (LEO). He has received his PhD diploma from the University of Aix-Marseilles (AMSE) in 2017. His research focuses primarily on central banks communications and its impact of financial and macroeconomic variables. It includes textual analysis of both official documents and media. He teaches in the Applied Econometrics Master courses of Introduction to Python and Natural Language Processing with Python. He has published articles in the Journal of International Money and Finance, Finance Research Letters and in the International Journal of Finance & Economics.

Course Content

The objective of this course is study how we can use the millions of textual contents published on the Internet and social media every day to improve our understanding of various economic and financial phenomena. After an introduction to the Python programming language, we will start by seeing how it is possible to extract online content via the use of existing APIs or the implementation of web scraping tools. We will create an application to collect articles from a major media site and we will use an API to extract tweets from a social network dedicated to finance. Next, we will see how to analyse a text using Natural Language Processing (NLP) methods. We will apply this to the speeches made by the European Central Bank to show how it is possible to give structure to unstructured data. The next session will be dedicated to sentiment analysis and will present the different methods (dictionary approach and machine learning). We will analyse Twitter data to build a sentiment indicator capturing the well-being of individuals in a country. The fourth session will be devoted to machine learning using text as data with an application on StockTwits data (asset pricing). The last session, we will introduce methods of textual analysis on unsupervised data (topic modelling and transformers). We will perform an application of a Latent Dirichlet Allocation on a large corpus of Glassdoor reviews.

For the different sessions, we will first present both the related theories and methods – in a language accessible to non-mathematicians – and their latest applications in the economic and financial literature. We will then study and share with the participants’ scripts and codes to realize different tasks in Python. We will also offer participants the opportunity to present their research and/or projects, and if possible, we will assist them with their projects – both on the data collection side and on the data analysis side.

Pre-requisite:

Participants should have a basic understanding of computer programming. It is possible to follow the tutorial available at https://www.learnpython.org/ to learn or review the basics of programming in Python.

Participants must install Anaconda (https://www.anaconda.com/products/individual) to have a functional programming environment before the beginning of the course.

Academic papers:

Altig, D., Baker, S., Barrero, J. M., Bloom, N., Bunn, P., Chen, S., … & Thwaites, G. (2020). Economic uncertainty before and during the COVID-19 pandemic. Journal of Public Economics, 191, 104274.

Kearney, C., & Liu, S. (2014). Textual sentiment in finance: A survey of methods and models. International Review of Financial Analysis, 33, 171-185.

Picault, M., Pinter, J., & Renault, T. (2022). Media sentiment on monetary policy: determinants and relevance for inflation expectations. Journal of International Money and Finance, Forthcoming.

Picault, M., & Renault, T. (2017). Words are not all created equal: A new measure of ECB communication. Journal of International Money and Finance, 79, 136-156.

Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187-1230.

Renault, T. (2020). Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digital Finance, 2(1), 1-13.

Renault, T. (2017). Intraday online investor sentiment and return patterns in the US stock market. Journal of Banking & Finance, 84, 25-40.

Thorsrud, L. A. (2020). Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business & Economic Statistics, 38(2), 393-409.

Books:

Mitchell, R. (2018). Web scraping with Python: Collecting more data from the modern web. ” O’Reilly Media, Inc.”.

Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied text analysis with python: Enabling language-aware data products with machine learning. ” O’Reilly Media, Inc.”.

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”.

Schedule (Updated: February 17, 2023)

Day 1	p.m.	Introduction to Python
Day 2	a.m.	Application: Getting data from API and websites
Day 2	p.m.	Natural Language Processing
Day 3	a.m.	Application: Using NLP to analyse central bank communication
Day 3	p.m.	Sentiment Analysis
Day 4	a.m.	Machine learning using text as data
Day 4	p.m.	Application: Predicting Asset Prices using StockTwits
Day 5	a.m.	Advanced methods in text mining
Day 5	p.m.	Application : Latent Dirichlet Allocation on Glassdoor

Schedule

Academic Year 2022-2023

Update October 20, 2022 –> The maximum number of participants has been reached. Registration is closed.

March 6 to 10, 2023

UPDATE February 2023 (classroom numbers have been added):

Monday 06/03/2023 – 14:00 – 17:00 N1A Building 2^nd Floor Classroom: 220
Tuesday 07/03/2023 – 9:00 – 12:00 & 14:00 – 17:00 17th Century Building 3^rd Floor Classroom: 1715
Wednesday 08/03/2023 – 9:00 – 12:00 & 14:00 – 17:00 N1A Building 2^nd Floor Classroom: 220
Thursday 09/03/2023 – 9:00 – 12:00 & 14:00 – 17:00 N1A Building 1^st Floor Classroom: 126
Friday 10/03/2023 – 10:00 – 12:00 & 14:00 – 16:00 17th Century Building 3^rd Floor Classroom: 1715

Please register via: https://hec-liege.idloom.events/doctoral-course-textual-analysis before February 28. Registrations will be closed before this date if the maximum number of participants has been reached