While powerful, current cutting-edge LLMs may not meet the needs of specialized sectors. We introduce KodeXv0.1, a family of language models that surpass GPT-4 in financial question answering. We use Llama 3.1 8B and 70B variants, adapting them to finance with a custom training regime. We collect publicly available financial documents like earnings calls and business reports to create a high-quality synthetic dataset of Context-Question-Answer triplets closely mirroring real-world tasks. Using this dataset, we perform RAG-aware 4bit LoRA instruction tuning with Llama 3.1 base variants to produce KodeX-8Bv0.1 and KodeX-70Bv0.1. We then conduct evaluations using FinanceBench, FinQABench, and a withheld test set. Our results show that KodeX-8Bv0.1 is more reliable in financial contexts than other models, surpassing them by up to 9.24%, and even outperforming GPT-4 by up to 7.07%. KodeX-70Bv0.1 further improves on this, exceeding GPT-4's performance in every benchmark tested.
Co-Authors
Neel Rajani, PhD Candidate in Responsible NLP, University of Edinburgh; BSc in Computing Science, University of Glasgow
Lilli Kiessling, MSc Candidate in Computational Neuroscience, Bernstein Center for Computational Neuroscience (BCCN) & Technische Universität Berlin; BSc in Physics, Technische Universität Berlin
Aleksandr Ogaltsov, Data Scientist (AI & ML), Kodex AI
Claus Lang, CTO & Co-Founder, Kodex AI