Aiera Launches Finance-Focused LLM Leaderboard

August 28, 2024

Image

We’re excited to announce the release of Aiera’s new leaderboard on Hugging Face Spaces, designed to evaluate large language models (LLMs) on finance-specific tasks.

At Aiera, we leverage LLMs to extract insights from transcripts and documents. As we expand our applications, ensuring the right model for each task becomes crucial. To maintain high-quality content for our users, we’ve developed internal benchmarks and now, we’re making a subset of these evaluations public.

Our leaderboard, powered by EleutherAI’s lm-evaluation-harness, assesses LLMs on four key tasks:

  1. Speaker Assignment (aiera_speaker_assign): Evaluates models on their ability to assign speakers to transcript segments and identify speaker changes.
  2. Earnings Call Summarization (aiera_ect_sum): Tests abstractive summarization capabilities for earnings call transcripts.
  3. Financial Q&A (finqa): Challenges models with calculation-based questions over financial texts.
  4. Transcript Sentiment Analysis (aiera_transcript_sentiment): Measures accuracy in determining financial sentiment from event transcript segments.

All datasets used in these tasks are available on Hugging Face, promoting transparency and enabling further research in the field. Large-context models available via Hugging Face’s Serverless Inference API may be submitted for evaluation.

It’s worth noting that our abstractive summarization dataset currently shows a bias towards Anthropic models due to its construction method. We’ve discussed this limitation in detail in a previous post.

This leaderboard marks just the beginning. As we refine our processes and venture into new areas, we plan to expand this benchmark, continually pushing the boundaries of LLM performance in finance-related tasks.

We invite the AI and finance communities to explore our leaderboard, contribute to the datasets, and join us in advancing the application of LLMs in the financial domain.