Textual analysis and machine learning
Credits |
---|
5 |
Holder |
Prof. Thomas Renault & Prof. Mathieu Picault |
Language |
English |
Location |
HEC Liège – Management School of the University of Liège. Rue Louvrex 14, Liège |
Field |
Finance / Accounting (FIN/ACC) |
Course Description
Textual analysis and machine learning with applications to economics and finance
Instructors
Thomas Renault is an Assistant Professor at the University Paris 1 Panthéon-Sorbonne (France) and scientific advisor at the French Council of Economic Analysis. He has received his PhD diploma in 2017 from Paris 1 Panthéon-Sorbonne. His work focuses on the use of textual data – mostly from social media – to forecast financial markets and to construct novel indicators to track economic conditions. He is teaching the following classes at the University Paris 1 Panthéon-Sorbonne: Applied Data Science in Finance, Applied Big Data in Finance, Introduction to Python, Digital Data and Network Analysis. He has published articles in the Journal of Public Economy, the Journal of Banking and Finance, and in the Journal of International Money and Finance.
Matthieu Picault is an Assistant Professor at the University of Orléans (France) in the Laboratoire d’Economie d’Orléans (LEO). He has received his PhD diploma from the University of Aix-Marseilles (AMSE) in 2017. His research focuses primarily on central banks communications and its impact of financial and macroeconomic variables. It includes textual analysis of both official documents and media. He teaches in the Applied Econometrics Master courses of Introduction to Python and Natural Language Processing with Python. He has published articles in the Journal of International Money and Finance, Finance Research Letters and in the International Journal of Finance & Economics.
Course Content
The objective of this course is study how we can use the millions of textual contents published on the Internet and social media every day to improve our understanding of various economic and financial phenomena. After an introduction to the Python programming language, we will start by seeing how it is possible to extract online content via the use of existing APIs or the implementation of web scraping tools. We will create an application to collect articles from a major media site and we will use an API to extract tweets from a social network dedicated to finance. Next, we will see how to analyse a text using Natural Language Processing (NLP) methods. We will apply this to the speeches made by the European Central Bank to show how it is possible to give structure to unstructured data. The next session will be dedicated to sentiment analysis and will present the different methods (dictionary approach and machine learning). We will analyse Twitter data to build a sentiment indicator capturing the well-being of individuals in a country. The fourth session will be devoted to machine learning using text as data with an application on StockTwits data (asset pricing). The last session, we will introduce methods of textual analysis on unsupervised data (topic modelling and transformers). We will perform an application of a Latent Dirichlet Allocation on a large corpus of Glassdoor reviews.
For the different sessions, we will first present both the related theories and methods – in a language accessible to non-mathematicians – and their latest applications in the economic and financial literature. We will then study and share with the participants’ scripts and codes to realize different tasks in Python. We will also offer participants the opportunity to present their research and/or projects, and if possible, we will assist them with their projects – both on the data collection side and on the data analysis side.
Pre-requisite:
Participants should have a basic understanding of computer programming. It is possible to follow the tutorial available at https://www.learnpython.org/ to learn or review the basics of programming in Python.
Participants must install Anaconda (https://www.anaconda.com/products/individual) to have a functional programming environment before the beginning of the course.
Academic papers:
- Altig, D., Baker, S., Barrero, J. M., Bloom, N., Bunn, P., Chen, S., … & Thwaites, G. (2020). Economic uncertainty before and during the COVID-19 pandemic. Journal of Public Economics, 191, 104274.
- Kearney, C., & Liu, S. (2014). Textual sentiment in finance: A survey of methods and models. International Review of Financial Analysis, 33, 171-185.
- Picault, M., Pinter, J., & Renault, T. (2022). Media sentiment on monetary policy: determinants and relevance for inflation expectations. Journal of International Money and Finance, Forthcoming.
- Picault, M., & Renault, T. (2017). Words are not all created equal: A new measure of ECB communication. Journal of International Money and Finance, 79, 136-156.
- Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187-1230.
- Renault, T. (2020). Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digital Finance, 2(1), 1-13.
- Renault, T. (2017). Intraday online investor sentiment and return patterns in the US stock market. Journal of Banking & Finance, 84, 25-40.
- Thorsrud, L. A. (2020). Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business & Economic Statistics, 38(2), 393-409.
Books:
- Mitchell, R. (2018). Web scraping with Python: Collecting more data from the modern web. ” O’Reilly Media, Inc.”.
- Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied text analysis with python: Enabling language-aware data products with machine learning. ” O’Reilly Media, Inc.”.
- Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”.
Schedule (Updated: February 17, 2023)
Day 1 | p.m. | Introduction to Python |
Day 2 | a.m. | Application: Getting data from API and websites |
Day 2 | p.m. | Natural Language Processing |
Day 3 | a.m. | Application: Using NLP to analyse central bank communication |
Day 3 | p.m. | Sentiment Analysis |
Day 4 | a.m. | Machine learning using text as data |
Day 4 | p.m. | Application: Predicting Asset Prices using StockTwits |
Day 5 | a.m. | Advanced methods in text mining |
Day 5 | p.m. | Application : Latent Dirichlet Allocation on Glassdoor |
Schedule
Academic Year 2022-2023
March 6 to 10, 2023
UPDATE February 2023 (classroom numbers have been added):
- Monday 06/03/2023 – 14:00 – 17:00 N1A Building 2nd Floor Classroom: 220
- Tuesday 07/03/2023 – 9:00 – 12:00 & 14:00 – 17:00 17th Century Building 3rd Floor Classroom: 1715
- Wednesday 08/03/2023 – 9:00 – 12:00 & 14:00 – 17:00 N1A Building 2nd Floor Classroom: 220
- Thursday 09/03/2023 – 9:00 – 12:00 & 14:00 – 17:00 N1A Building 1st Floor Classroom: 126
- Friday 10/03/2023 – 10:00 – 12:00 & 14:00 – 16:00 17th Century Building 3rd Floor Classroom: 1715
Please register via: https://hec-liege.idloom.events/doctoral-course-textual-analysis before February 28. Registrations will be closed before this date if the maximum number of participants has been reached