Hedge funds are spending billions on AI-powered trading. Retail investors are asking ChatGPT for stock tips. This essay tests what AI can actually do in stock selection as of 2026, and what it still can't.
The Scoreboard
The Results So Far
Here is what the data says. AI-powered quant fundsHedge funds that use mathematical models and algorithms to pick investments instead of relying on human judgment alone. have been running for years. LLMs have been tested against financial benchmarks since 2023. The results are mixed in a specific way: AI is good at some things and bad at others, and the distinction matters.
88%
Of active equity funds underperformed the S&P 500 over 15 years
This is the baseline. Human stock pickers, even professionals with teams and data, fail to beat the index most of the time. AI has to clear this bar, not just beat a random portfolio.
~2%
Annual alphaReturns above what the market delivers. If the S&P 500 returns 10% and your fund returns 12%, your alpha is 2%. generated by the best AI quant funds
Renaissance Technologies, Two Sigma, and DE Shaw use ML models extensively. The best generate 2-4% annual alpha after fees. Good, but not the revolution you'd expect given the hype.
The honest answer: AI can find edges in stock selection, but the edges are small, temporary, and expensive to maintain. The biggest advantage AI has isn't picking winners. It's processing information faster than humans.
Where AI Has an Edge
Three Things AI Does Better Than Humans
01
Processing Earnings Calls at Scale
A human analyst can deeply analyze maybe 20-30 companies. An LLM can process every earnings call transcript in the S&P 500 within minutes of publication. It can flag sentiment changes, unusual language patterns, and guidance shifts across hundreds of companies simultaneously. The edge isn't better analysis of one company. It's decent analysis of all companies, instantly.
02
Removing Emotional Bias
Humans anchor to their previous beliefs. If you bought NVIDIA at $50 and it's now at $130, you are emotionally attached to the position. An AI model doesn't care what it recommended yesterday. It evaluates each data point fresh. In backtests, systematic models that remove behavioral bias generate 1-2% of annual alpha purely from avoiding the mistakes humans make.
03
Alternative Data Processing
Satellite imagery of retail parking lots. Credit card transaction data. Shipping container tracking. Job posting velocity. These data sources existed before AI but were too large for human analysts to process. ML models can ingest terabytes of alternative data and find correlations that predict revenue surprises before they appear in earnings reports.
AI's stock-picking edge comes from speed and scale, not from deeper understanding. An LLM doesn't understand a business the way a veteran analyst does. It processes data patterns across thousands of businesses faster than any human team.
Where AI Fails
Four Things AI Still Can't Do
01
Predicting Regime Changes
AI models are trained on historical data. They perform well when the future resembles the past. When something changes at the structural level, the models break. COVID crashed markets in ways no historical pattern predicted. Interest rate pivots caught quant models off guard. AI excels in normal markets and struggles in abnormal ones, which is exactly when you need the edge most.
02
Understanding Narrative and Context
An LLM can read that a CEO said "we are cautiously optimistic" on an earnings call. It can flag the word "cautiously" as a negative sentiment shift. But it cannot understand the political dynamics of why the CEO chose those words, or that this particular CEO always understate results before blowout quarters. Narrative context requires judgment that current AI lacks.
03
Tail Risk Assessment
TSMC's stock could drop 50% tomorrow if geopolitical tensions spike. That risk exists every day. An AI model can assign a probability to it based on historical geopolitical events, but it cannot assess whether today's specific diplomatic signals represent an actual escalation or routine posturing. Tail risks are by definition rare, which means there isn't enough training data to model them well.
04
Maintaining Edge Over Time
Every AI edge gets arbitraged away. When one quant fund discovers that satellite parking lot data predicts retail earnings, other funds buy the same data within months. The alpha decays. AI edges in public markets have a half-life of 6-18 months before competitors replicate them. Staying ahead requires constant innovation, not a one-time model build.
The LLM Test
What Happens When You Ask ChatGPT for Stock Picks
Researchers have tested this. Multiple academic papers in 2024 and 2025 asked LLMs to pick stocks based on earnings call transcripts, financial statements, and news sentiment. Here's what they found.
LLM Stock-Picking Performance (Academic Studies)
Annual return vs. S&P 500 benchmark, various methodologies
Modest alpha from sentiment signals in transcripts.
LLM + financial statement analysis
+16.8%
+4.7%
Combining language + numbers outperformed either alone.
LLM with chain-of-thought reasoning
+13.5%
+1.4%
Reasoning improved explanations but not returns.
Naive "ask ChatGPT" picks
+9.8%
-2.3%
Underperformed. Generic recommendations, no edge.
S&P 500 (benchmark)
+12.1%
0%
The bar to clear.
LLM Stock-Picking Performance (Academic Studies): Annual return vs. S&P 500 benchmark from various methodologies. Structured analysis with financial data outperforms generic advice.
The pattern: structured AI analysis with specific financial data outperforms generic AI advice. Asking ChatGPT "what stocks should I buy?" underperforms the index. Feeding an LLM earnings transcripts and financial statements and asking it to score companies on specific criteria produces modest but real alpha.
The catch: these are academic backtests. Live trading introduces slippageThe gap between the price you expect to trade at and the price you actually get., transaction costs, and the fact that by the time a public paper shows an edge, the edge is already partly arbitraged.
AI stock picking works best when it's specific, structured, and combined with traditional financial data. It works worst when it's generic, conversational, and used as a replacement for thinking. The tool amplifies the quality of the question you ask it.
For Individual Investors
How to Use AI Without Getting Burned
Use AI For
Screening large universes of stocks on specific criteria. Summarizing earnings calls you don't have time to read. Checking whether a thesis holds up against the financial statements. Calculating metrics like PEG ratiosCompares a stock's P/E to its growth rate. Below 1.0 is cheap, above 2.0 is expensive. across 50 companies at once. Speed and breadth tasks where human capacity is the bottleneck.
Don't Use AI For
Making buy/sell decisions without your own judgment. Predicting what will happen next quarter. Timing market entries and exits. Replacing the work of understanding a business model. AI gives you data processing. It does not give you conviction, position sizing, or the ability to hold through drawdowns.
80/20
The AI investor's rule
Use AI for 80% of the data processing: screening, summarizing, calculating. Use your own judgment for the 20% that matters: deciding what to buy, how much, and when to sell. The humans who use AI as a research amplifier will outperform both the humans who ignore AI and the humans who blindly follow it.
~2%
That's the alpha the best AI-driven funds generate. Enough to matter over decades. Not enough to skip your own homework. AI is the best research assistant in history. It is not yet a replacement for the investor.
How I Built This
Performance data from SPIVA reports, academic papers on LLM financial analysis, and publicly reported quant fund returns.
Active Fund Underperformance
SPIVA U.S. Scorecard, 15-year data through 2025
The 88% underperformance figure comes from S&P Dow Jones Indices' SPIVA reports, which track active fund performance against benchmarks. The exact percentage varies by time period (85-92% over 10-20 year periods). The directional point holds: most active managers underperform.
Quant Fund Alpha
Estimated from public reporting, not audited returns
Renaissance Technologies, Two Sigma, and DE Shaw do not publicly disclose detailed returns. The "2-4% annual alpha" estimate comes from investor letters, press reporting, and academic studies of quant fund performance. Medallion Fund (Renaissance) generates far higher returns but is closed to outside investors and operates differently from most AI-driven funds.
LLM Backtest Results
Compiled from published academic papers, 2024-2025
The performance figures in the table are representative of academic findings, not exact replications of specific papers. Returns vary by time period, stock universe, and methodology. All are backtests, not live trading results. Backtests typically overstate real-world performance by 1-3% due to lookahead bias and transaction cost assumptions.
Alpha Decay
6-18 month half-life estimate from industry research
The alpha decay half-life is an industry estimate, not a precise measurement. It varies by strategy type. Some alternative data edges last months. Some fundamental edges last years. The point is that AI edges in public markets are not permanent, and maintaining them requires ongoing investment in new data and models.
Jesse Walker has been an individual investor for 30 years. Before that, he was a poker professional, which is where he learned that the best decision and the best outcome aren't always the same thing. He writes about financially navigating the uncertainties of AI.
Nothing on this site constitutes investment advice. All content is for informational purposes only. Full terms.