When ChatGPT is Harmful: How General-Purpose AI Can Ruin Your Project

Companies continue to invest in GPT-like models, but their versatility is increasingly becoming the cause of failure. Expert Ivan Budnik explains why choosing the "most powerful neural network" leads to unnecessary expenses and ineffective AI projects.

This material is part of RBC's new Education section , where we discuss how to develop skills, make informed decisions, and advance your career consciously.

The RBC School of Management is the media holding's new educational project, focused on executive development. We meet every Thursday at 7:00 PM for online events, where we tackle complex management challenges together.

The schedule and topics can be found here.

In 2025 , a study by S&P Global Market Intelligence found that 42% of companies had curtailed almost all of their AI initiatives, compared to only 17% a year earlier. It's not a matter of technology quality—they're working just fine. The problem is that companies are trying to solve all kinds of problems with the same tools.

We've seen many projects where teams deployed GPT-4 where a simpler model, like regression, would have been more effective. This resulted in the system becoming slow, money was quickly spent, and users were dissatisfied. But at company meetings, they kept repeating, "But GPT is the most powerful model!"

However, in such cases, power doesn't matter—it's important to choose a tool that's truly suited to the specific task. We'll explore why GPT isn't a panacea and what to do when versatility becomes a problem.

Why isn't there one neural network for all situations?

If we need to hammer a nail, we use a hammer. We can try using a microscope—it's heavy, and we can hit it with it. But it's still the wrong tool.

The same thing happens with neural networks. There are different types of models, each developed for a specific data format. Some work better with sequentially presented data, such as time series. Others are more effective at analyzing text and large data sets. There are models specifically designed for image generation.

Each model type is designed for its own application, and this is supported by mathematics. The No Free Lunch Theorem states that there is no universal algorithm that handles all problems better than others. Different problems require different approaches—and this is a proven fact.

Therefore, even if a model is trained on one type of data, it won't automatically perform well on another. For example, if you take a network trained on cat photos and simply "add" it with medical images, the results will be weak without fine-tuning for the specifics of medicine.

What to look for when choosing a neural network

Typically, we start with three questions: what data we have, what we need to obtain, and how much of that data is available. The type of data is the basis for choosing a tool.

  • If you work with text, use so-called transformer models. This type of neural network is the foundation of all top text processing models (and others): GPT-4/5, Claude, BERT, LLaMA, Mistral, and many others. In 2025, BERT, GPT-4/4.1, Claude 4.5, and their various modifications will lead almost all text processing tasks.
  • For images, use other solutions. Services like Midjourney, Nano Banana, and Seedream are suitable for image generation. Vision Transformers are used for cloud image analytics, and compact models like MobileNet are suitable for mobile apps.
  • Gradient boosting algorithms work best with tables and numerical data, primarily XGBoost and CatBoost.

Data size also matters. If you have fewer than 10,000 examples, training large models like Transformers from scratch is ineffective. In this case, transfer learning is used—taking a pre-trained model (such as BERT, Vision Transformer, or MobileNet) and retraining it for a specific task—or choosing simpler algorithms.

Why ChatGPT and similar tools can't be used for all tasks

LLM (Large Language Model) is a large language model like GPT-4, GPT-4.1, Claude 4.5, or Llama 3. It's trained on huge text datasets and can write, summarize, answer questions, and translate. But it's not always the best tool.

The biggest mistake is using large language models everywhere possible. Explosion.ai calls this "LLM maximalism": companies integrate LLM into every process. Need to filter spam? Use GPT-4. Create a summary? Again, GPT-4 or Claude 4.5. Extract dates from text? Again, LLM.

Problems arise immediately. These models are slower than conventional algorithms, and users are unwilling to wait ten seconds for a response that previously took one. Costs also rise: each unit of text processed by the model (token) costs money, and LLM spends thousands of dollars.

A real-life example: a company's reputation monitoring system (automatically tracks, analyzes, and evaluates what's being said about a brand, company, or person online) was initially built entirely on LLM. The model filtered texts, generated summaries, and extracted the necessary data. But after a month, it became clear: it was too slow for real-time operation, too expensive as the data volume increased, and impossible to compare the summaries with the original texts.

The solution turned out to be simple. The architecture was split into parts: first, a regular classifier operates, filtering out noise and breaking the text into sentences, while LLM is used only for summaries. As a result, the system became faster and significantly cheaper.

When Speed ​​Matters More Than Power: Which AI Models to Use and Why

Transformers are today's leading tools for working with text. They can analyze very large volumes of information and even process text, images, and audio simultaneously.

But they have a drawback: the longer the text, the slower they work. With huge amounts of data, transformers become too "heavy," so faster models are needed for such tasks.

That said, older architectures—RNNs and LSTMs—remain useful. They're faster, require fewer resources, and are suitable for devices that process data locally rather than in the cloud. They perform excellent in real-time tasks—for example, they can accurately recognize human actions based on sensor data.

Diffusion models have greatly advanced AI-powered image and media creation. Tools like Stable Diffusion, DALL-E, and Midjourney create high-quality and diverse images and can also handle audio, video, and code.

However, these models are slow: they require a lot of computation to generate, so they are not suitable for applications where the result is needed instantly.

Data Drives Results: How Models Tolerate Error and Noise

Large neural networks are very sensitive to data quality. If the data contains errors, incorrect labels, or a bias toward one class, the model will consistently make mistakes. Without preliminary data preparation—cleaning, straightening, and standardizing the data—the results of such models become unpredictable.

When we say "neural networks," many people imagine something huge and complex. But there are also simpler, more stable, and more predictable algorithms that often produce excellent results when the data is junk or incomplete. These include Random Forest and XGBoost.

  • Random Forest  is an algorithm that makes decisions not using a single approach, but using multiple small models, each looking at the data from its own perspective. These models then vote, and the most reliable result is selected. Its strength: it's robust to data errors, performing reliably even with noise, missing data, and skewed data.
  • XGBoost  is an algorithm that builds a forecast gradually, step by step, each time correcting its previous errors. It combines many simple solutions that consistently complement each other, making it one of the most accurate data analysis methods. Its strength: it works very well on real, imperfect data and delivers high accuracy.

A good example of failure is the models that were attempted to detect COVID-19 from images. Most of these studies failed in real life. The main reasons were simple:

  • Mixed, inappropriate data. Information was taken from different sources, sometimes with overlap, which disrupted learning.
  • Equipment bias. The models didn't recognize the disease , but rather the characteristics of the devices used to take the images.
  • Shifting shooting conditions. For example, the "healthy" photos always featured sunny weather, so the model learned to distinguish the sun from signs of illness.
How to avoid working blindly

First, understand the problem, not the models. Once you understand what exactly needs to be solved, it becomes clear which type of model is best suited.

Don't limit yourself to a single method—test several models and compare which one produces the best results. AutoML tools can help with this—these are services that automatically select the optimal algorithm, configure its parameters, and check its performance.

AutoML (automated machine learning) is a tool that automatically selects the best model, configures it, and verifies the results.

That is, instead of manually:

  • select an algorithm;
  • configure parameters;
  • try different approaches;
  • compare the results with each other.

There are also tools like AutoGluon, FLAML, or H2O that allow you to quickly try dozens of model options and choose the one that shows the best results for your data, without manual tuning or lengthy experiments.

Don't just look at accuracy. It's important to understand:

  • how stable is the model;
  • can her decisions be explained;
  • How much does it cost to calculate it?

Sometimes a model with less accuracy but faster response is a better choice for real-world work.

And be sure to record all experiments: what parameters you tried, what results you got. This will allow you to reproduce successful solutions in the future and help your colleagues continue their work without guesswork.

Tools to help you choose the right model
  • AutoGluon (AWS)  is one of the most powerful AutoML tools. It can work with tables, text, and images, automatically combining multiple models into a powerful ensemble. At Kaggle machine learning competitions, it outperforms 99% of competitors in just a few hours—and without complex data preparation.
  • H2O AutoML emphasizes transparency. It reveals the factors that influence the model's decisions and generates clear reports. This is especially important for industries where every decision must be explained, such as banking or medicine.
  • FLAML ( Microsoft Research) optimizes performance for budget. It selects models to minimize computational resources. It works well with popular tools such as scikit-learn, XGBoost, LightGBM, and transformers.
  • By 2025, Hugging Face became the leading platform for ready-to-use models. The SmolVLM model, with 256 million parameters, requires less than 1 GB of video memory, but its quality surpasses models hundreds of times larger. It can even run on an iPhone. The LeRobot library for robotics tasks has been released. Over 10,000 models are already integrated with Azure AI Foundry, simplifying development.
  • PyTorch Lightning has greatly simplified model training on high-powered computing platforms. It allows you to debug your model in real time without restarting, easily scale from a single GPU to hundreds, and pause experiments and resume them later.

Read together with it: