August 28, 2024
At Aiera, we leverage LLMs to extract insights from transcripts and documents. As we expand our applications, ensuring the right model for each task becomes crucial. To maintain high-quality content for our users, we’ve developed internal benchmarks and now, we’re making a subset of these evaluations public.
All datasets used in these tasks are available on Hugging Face, promoting transparency and enabling further research in the field. Large-context models available via Hugging Face’s Serverless Inference API may be submitted for evaluation.
It’s worth noting that our abstractive summarization dataset currently shows a bias towards Anthropic models due to its construction method. We’ve discussed this limitation in detail in a previous post.
This leaderboard marks just the beginning. As we refine our processes and venture into new areas, we plan to expand this benchmark, continually pushing the boundaries of LLM performance in finance-related tasks.
We invite the AI and finance communities to explore our leaderboard, contribute to the datasets, and join us in advancing the application of LLMs in the financial domain.