Financial Language Modelling
Problem
Open-source models like Google’s BERT-BASE architecture allow for state-of-the-art performance in natural language processing (NLP).
However, the BERT-BASE model is trained on Wikipedia and has not been exposed to finance-specific language and semantics, limiting the accuracy that financial data scientists can expect from their machine learning models.
Solution
LSEG Labs saw an opportunity to extend BERT-BASE and create finance-domain specific models that outperform the open-source equivalents by leveraging LSEG’s depth and breadth of unstructured financial data.
The team have trained two domain-specific versions of Google’s BERT language models using extensive News and Transcripts archives – BERT-RNA and BERT-TRAN.
The models have a better understanding of financial language, produce more accurate word embeddings, and ultimately can improve the performance of downstream tasks such as text classification, topic modelling, auto summarisation and sentiment analysis.
Financial Language Modelling in action
LSEG Labs’ models return a single document embedding, or a vector of word embeddings, for two pre-trained models:
1. BERT-RNA
Pre-trained using Reuters News Archive, this model consists of all Reuters articles published between 1996 and 2019.
On the downstream task of classifying financial news for ESG controversies, BERT-RNA outperformed BERT-BASE by 4% in terms of accuracy.
On the downstream task of identifying news related to COVID-19 as either a risk or opportunities, BERT-RNA again outperformed BERT-BASE by 4% in terms of accuracy.
2. BERT-TRAN
The BERT-BASE was pre-trained using a large corpus of earnings call transcripts, consisting of 390,000 transcripts, totalling 2.9bn words.
What we’re thinking next
Both models are now available on the Refinitiv Data Platform and LSEG Labs are also giving a small group of customers early access to use their new models via a test user interface with include tutorials, example training data and use-cases. Their feedback will inform the next phase of the Financial Language Modelling project.
This is an early but important step in being able to scale the understanding of trends and insight in finance’s unstructured data. The team hope their findings and results continue to help move the performance on BERT forwards in the financial industry.