ChatGPT And Other AI Models Unable To Analyze SEC Filing, Patronus AI Researchers Find - TechStartups

Republished By Plato

Followers: 0

Over the past year, ChatGPT and other large language models (LLMs), including Google Bard and Anthropic, have gained widespread attention for their impressive abilities, ranging from coding, poetry, and songwriting to even devising entire movie plots. They’ve even showcased proficiency in diverse tasks, including passing law exams, Wharton MBA exams, and medical exams.

However, amid these advancements, challenges persist. A recent report from startup Patronus AI shared some insights on the struggles faced by large language models, including OpenAI’s GPT-4-Turbo, to effectively analyze Securities and Exchange Commission (SEC) filings. According to Patronus AI’s findings, these models often falter in providing accurate responses to questions derived from SEC filings.

In an interview with CNBC, Patronus founders added that even the most effective AI model configuration tested, OpenAI’s GPT-4-Turbo, with the ability to read nearly the entire filing alongside the question, only achieved a 79% accuracy rate on Patronus AI’s new test, CNBC reported.

The researchers said that many times, the language models either decline to respond or generate information that wasn’t present in the SEC filings, a phenomenon often described as “hallucination.” Patronus AI co-founder Anand Kannappan expressed dissatisfaction with the performance, stating:

“That type of performance rate is just absolutely unacceptable. It has to be much higher for it to really work in an automated and production-ready way.”

The report underscores the difficulties faced by AI models, particularly in regulated industries like finance, as major companies aim to integrate cutting-edge technology into their operations for customer service or research purposes.

The findings underscore the hurdles faced by AI models as they are integrated into real-world products, particularly in industries like finance. Extracting crucial numbers swiftly and analyzing financial narratives has been viewed as a promising application for chatbots, with the potential to provide a competitive edge in the financial sector.

This discovery also aligns with another study that found a significant decline in ChatGPT’s ability to solve basic math problems. In a matter of a few months, its accuracy plummeted from 98% to a mere 2%

While the potential of generative AI in the banking industry is substantial, challenges persist. Incorporating LLMs into products poses difficulties, given their non-deterministic nature, requiring rigorous testing to ensure consistent, on-topic, and reliable results.

Patronus AI, founded by former Meta employees, aims to address this challenge by automating LLM testing using software. They created FinanceBench, a dataset with over 10,000 questions and answers drawn from SEC filings, establishing a “minimum performance standard” for language AI in the financial sector.

Patronus AI cofounders Anand Kannappan and Rebecca Qian (Credit: Patronus AI)

The co-founders emphasized the importance of more robust testing procedures, moving beyond manual evaluations. Through FinanceBench, Patronus AI seeks to provide companies with the assurance that their AI bots won’t deliver surprising or inaccurate answers, ultimately enhancing the reliability of language models in practical applications.

Test Questions

“We definitely think that the results can be pretty promising,” Kannappan said. He also added, “Models will continue to get better over time. We’re very hopeful that in the long term, a lot of this can be automated. But today, you will definitely need to have at least a human in the loop to help support and guide whatever workflow you have.”

SEO Powered Content & PR Distribution. Get Amplified Today.
PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
Source: https://techstartups.com/2023/12/19/chatgpt-and-other-ai-models-unable-to-analyze-sec-filing-patronus-ai-researchers-find/

Time Stamp: December 19, 2023

Time Stamp: Aug 21, 2023

Indonesia’s ride-hailing tech startup Gojek to make every car and motorcycle on its platform an electric vehicle (EV) by 2030

Source Cluster:

TechStartups

Source Node: 835904

Time Stamp: Apr 30, 2021

ChatGPT and other AI models unable to analyze SEC Filing, Patronus AI researchers find – TechStartups

Republished By Plato

Test Questions

More from TechStartups

Bosch to cut 1,200 software jobs by 2026 amid rising costs and slowdown in automated driving sector – TechStartups

Tesla engineers unveil in-house audio system that produces 120dB+ for a kick drum you can feel in your stomach –

PropTech startup Altrio lands $6.2M in Series A funding to digitize and disrupt the $11 trillion global real estate capital markets

FinTech startup Stripe eyes exit; plans to go public within the next year

This dual solar-powered reactor converts plastic waste and CO2 ‘from thin air’ into sustainable fuels

Sustainable home startup Enter raises €19.4M in Series A funding to help people decarbonize their homes

Veloce Energy completed Series A round to reduce time and cost to deploy and operate EV charging stations

Tesla Model X computer shuts down on a busy Los Angeles freeway. You won’t believe what happened next

About Us

Vertical Search & Ai

Platform

Stay Connected

Account