Aiera Launches Finance-Focused LLM Leaderboard

We’re excited to announce the release of Aiera’s new leaderboard on Hugging Face Spaces, designed to evaluate large language models (LLMs) on finance-specific tasks.

At Aiera, we leverage LLMs to extract insights from transcripts and documents. As we expand our applications, ensuring the right model for each task becomes crucial. To maintain high-quality content for our users, we’ve developed internal benchmarks and now, we’re making a subset of these evaluations public.

Our leaderboard, powered by EleutherAI’s lm-evaluation-harness, assesses LLMs on four key tasks:

Speaker Assignment (aiera_speaker_assign): Evaluates models on their ability to assign speakers to transcript segments and identify speaker changes.
Earnings Call Summarization (aiera_ect_sum): Tests abstractive summarization capabilities for earnings call transcripts.
Financial Q&A (finqa): Challenges models with calculation-based questions over financial texts.
Transcript Sentiment Analysis (aiera_transcript_sentiment): Measures accuracy in determining financial sentiment from event transcript segments.

All datasets used in these tasks are available on Hugging Face, promoting transparency and enabling further research in the field. Large-context models available via Hugging Face’s Serverless Inference API may be submitted for evaluation.

It’s worth noting that our abstractive summarization dataset currently shows a bias towards Anthropic models due to its construction method. We’ve discussed this limitation in detail in a previous post.

This leaderboard marks just the beginning. As we refine our processes and venture into new areas

We’re excited to announce the release of Aiera’s new leaderboard on Hugging Face Spaces, designed to evaluate large language models (LLMs) on finance-specific tasks.

Our leaderboard, powered by EleutherAI’s lm-evaluation-harness, assesses LLMs on four key tasks:

See Aiera in Action