There was AI before LLMs... and there will be after
Do you believe in life after ChatGPT?
When GPT3 came out, I didn’t pay much attention to it. I had been working with AI models for about six years, mostly data analytics and prediction models, and GPT3 was the latest in a line of ever better Natural Language Processing (NLP) models which were still not close to being reliable. At the time I was working with symbolic AI (where you write the rules the AI will follow as opposed to letting it figure it out itself with Machine Learning) for applications that required reliability.
Then 2022 happened: OpenAI packaged GPT3 into ChatGPT, and the world changed.
But there was AI before LLMs, and there will be AI after. The bubble will burst, I believe, as we become more demanding of AI that is transparent, reliable, explainable, and private. We’ll revert back to using purpose-built models over blanket LLM solutions for everything.
This series will cover the types of models I believe will come back, what they’re good at, and how they can be used in a pipeline to produce much better results than an LLM, at lower cost and more sustainably. It’s not targeted at data scientists or AI engineers, but at people trying to figure out what to make of this technology, and how they can use it for themselves or their businesses.
Natural Language Processing Models
Encoder models (BERT, RoBERTa, etc.)
Encoder models are designed to understand and analyze text by converting it into mathematical representations called “embeddings.” Think of it like translating text into a format that captures its meaning in numbers, which can then be compared, classified, or searched.
Encoders excel at specific analysis tasks:
Text classification: Is this a positive review or negative? Is it a recipe or an essay? Is this email spam or legitimate?
Named entity recognition (NER): Who are the people/places/dates/organizations mentioned in this document?
Semantic search: Finding text that’s similar in meaning, not just matching keywords. “Car accident” would match “vehicular crash” even though the words are different.
These models have been workhorses for years in applications like spam filters and sentiment analysis. The key advantage: they can be fine-tuned on a relatively small dataset (hundreds or thousands of examples, not millions) to achieve excellent performance on your specific task. They’re also smaller and faster than LLMs, often running in milliseconds rather than seconds.
Traditional NLP libraries (SpaCy, NLTK)
These are rule-based tools that parse text using linguistic rules rather than machine learning. SpaCy is particularly good at breaking text into structured pieces - identifying tokens (individual words or punctuation), finding parts of speech (nouns, verbs, adjectives), and mapping dependencies (which words relate to which).
Traditional NLP is deterministic: you write explicit rules, and the system follows them exactly. This makes it perfect for extracting structured information (dates, email addresses, customer IDs, order numbers), finding grammatical relationships between words, and matching text based on patterns you define.
These tools were used for summarization, plagiarism detection, and topic matching. The real power comes from combining them with encoder models: traditional NLP extracts the structured pieces, then encoders handle the semantic understanding. Together they create hybrid systems that are both precise and intelligent.
Sequence-to-sequence models (T5, BART)
Seq2seq models combine encoding and decoding in one architecture. They first encode the input to deeply understand it, then decode that understanding into a different output format. This two-phase process makes them ideal for transformation tasks.
They excel at:
Translation: Take English text, output Spanish text
Summarization: Take a long article, output the key points in a few sentences
Question answering: Take a document and a question, output just the answer
Format conversion: Take messy text, output clean structured data
Seq2seq models work in two steps: read everything and compress it into a representation, then generate from that. LLMs do it all in one go - they process input and generate output using the same mechanism. For tasks like translation where you need to fully understand a sentence before translating it, that two-step approach is often more accurate and efficient.
Data Analytics and Prediction Models
Tree-based models (Random Forests, XGBoost, LightGBM)
These models don’t understand or generate text - they predict outcomes and classify data based on patterns. Think about it this way: if you want to understand if your customer is happy or not, you use NLP. If you want to predict whether they’ll churn, you use XGBoost.
Tree-based models work by creating decision trees (or thousands of them in the case of Random Forests) that learn patterns from your data. They’re incredibly powerful for:
Churn prediction: Which customers are likely to cancel their subscription?
Fraud detection: Is this transaction legitimate or suspicious?
Risk scoring: What’s the likelihood this loan will default?
Demand forecasting: How much inventory will we need next quarter?
A well-tuned XGBoost model can predict customer churn with scary accuracy, and it’ll tell you exactly which features drove that prediction. “This customer has an 85% churn risk because they haven’t logged in for 30 days, downgraded their plan, and opened two support tickets.” That’s actionable information you can use.
The beauty of these models is explainability. You can see exactly what factors led to a prediction and how much each one mattered. This is critical for regulated industries where you need to explain your decisions.
Classic classifiers (Logistic Regression, Naive Bayes)
Before we had neural networks for everything, we had these classic algorithms that were ridiculously effective for specific tasks. They’re fast, they require way less training data, and they tell you exactly why they made a decision.
Naive Bayes was the king of spam filters, and works very similarly to how your brain makes quick judgments: When you glance at an email and instantly know it’s spam, your brain isn’t analyzing every word carefully - it’s pattern matching. You see “free,” “viagra,” “click here” and your brain says “I’ve seen this combination before in spam emails, high probability this is spam.” That’s exactly what Naive Bayes does. It calculates the probability something is spam based on the words it contains and patterns it learned from training data. Simple math, super fast, surprisingly accurate.
Logistic Regression works similarly to how you weigh evidence to make a decision. Think about deciding whether to approve a loan: you’re not just looking at one factor, you’re weighing multiple signals - income level, credit history, debt ratio - and each one influences your decision by a certain amount. Logistic Regression does exactly this mathematically. It assigns weights to each feature and combines them to produce a probability score. “This loan application has a 75% approval probability because: stable income (35% influence), low debt ratio (40% influence), good credit history (25% influence).”
The key advantage of both these models: they work like your brain’s “fast thinking” system - the one that lets you make instant judgments based on patterns you’ve learned. They work with hundreds of training examples, not thousands or millions. And they tell you which features matter most. A loan denial from Logistic Regression can say “denied because debt-to-income ratio (40% weight) and recent credit inquiries (35% weight).” Try getting that from an LLM.
Time series models (ARIMA, Prophet)
Time series models are specialized for data that changes over time. They look at historical patterns - trends, seasonality, cycles - to predict what comes next.
They’re used for:
Sales forecasting: What will revenue look like next quarter based on historical patterns?
Inventory planning: When will we run out of stock at current demand levels?
Capacity planning: How many support agents will we need during peak season?
Anomaly detection: Is this week’s traffic normal or is something wrong?
ARIMA models work like how you naturally think about patterns over time. When you look at your coffee shop’s sales, you instinctively separate out different types of patterns: “We’re growing overall” (trend), “we’re always busier in the morning” (daily pattern), “Mondays are slow” (weekly pattern). That’s exactly what ARIMA does - it decomposes time series into trend and seasonal components, then uses those patterns to forecast what comes next. Your brain does this pattern separation automatically when you say “traffic is bad today, but it’s always bad on Friday afternoons.”
Prophet takes this further by mimicking how you’d account for real-world messiness. You know that a holiday throws off normal patterns, that missing data one day doesn’t invalidate the whole pattern, that there are multiple overlapping cycles (daily, weekly, yearly). Prophet is designed to handle exactly this - missing values, outliers, and multiple seasonal patterns layered on top of each other. It’s particularly good at forecasting metrics like daily active users, sales, or website traffic where you have human-driven patterns (weekday vs weekend, holidays, back-to-school season).
The advantage of time series models over just throwing data at an LLM: they work like your brain’s pattern recognition for temporal data. Just like you instinctively know that retail sales spike in December every year, that website traffic drops on weekends, that the long-term growth trend matters separately from seasonal ups and downs - these models understand those patterns mathematically. And like your brain making a prediction (”probably 20-30 people will show up”), they give you forecasts with confidence intervals - not just a number, but a range and a probability.
Putting it together: How real systems work
Here’s the secret that companies who’ve been doing AI for years know: you don’t pick one model type. You don’t “deploy AI” by building one LLM tool. You build pipelines where each model does what it’s best at.
Customer support system example:
Traditional NLP extracts structured data (customer ID, order number, dates)
Encoder model classifies issue type and sentiment
Random Forest scores escalation risk based on all these signals
Semantic search (using encoders) finds similar past tickets
Seq2seq model generates response draft if it’s low-risk
Rules validate the response before sending
Each piece is specialized and reproducible. Each piece is fast and cheap. Each piece can be audited and debugged.
Or fraud detection:
Traditional rules flag obvious red flags (transaction from new country, amount over limit)
Logistic Regression does quick probability scoring on transaction patterns
XGBoost provides deeper risk assessment incorporating hundreds of features
Time series model detects if spending pattern is anomalous for this user
High-risk transactions get blocked, medium-risk get additional verification
The expensive LLM? You use it only for edge cases where the specialized models aren’t confident, or for generating the customer-facing explanation. Not for everything.
Why this matters now
The LLM hype made everyone think they needed to use GPT-4 for everything. But reality is setting in:
Cost: Running an LLM on every transaction or customer interaction adds up fast
Privacy: Can’t send healthcare or financial data to external APIs
Reliability: LLMs hallucinate. Classification models don’t
Explainability: Regulated industries need to explain decisions
Sustainability: LLMs consume enormous amounts of energy - both to run the computations and to cool the data centers (some facilities use millions of gallons of water for cooling)
Most business AI tasks are repetitive and specific. You’re not generating random content or asking questions about the universe. You’re categorizing tickets, extracting data, predicting churn, forecasting demand, routing emails. These tasks don’t need a model that knows everything about everything. They need a model that knows your specific domain really well.
Purpose-built models are cheaper, faster, more accurate for their specific tasks, and you can run them on your own infrastructure. The data never leaves your servers. When something breaks, you can actually figure out why and fix it.
The takeaway
The future of practical AI isn’t about using the biggest, newest model for everything. It’s about understanding your toolbox and picking the right tools for each job - the same way your brain uses different types of thinking for different problems.
Sometimes that’s an LLM for open-ended tasks where you need broad knowledge and flexibility. Often it’s an encoder for understanding text, a Random Forest for predictions, or even a Naive Bayes classifier that’s been around for decades. These models have survived because they work - they’re the patterns and shortcuts your systems use to make fast, accurate decisions on specific tasks.
That’s what this series is about - understanding what these different tools do, so you can make informed decisions about what to use when. Not following hype, but solving real problems efficiently. In the next articles, we’ll dive deeper into how to combine these tools into pipelines that actually work in production.



Thanks for writing this, it clarifies a lot. You're spot on about the bubble bursting. It totally makes sense we'll revert to purpose-built models. Like when I try to get really specific book reccomendations, LLMs just often miss the mark. Excited for the rest of the series!