Blog

AI Hybridization and Price Hikes

A lot of words for things are going to get expensive.
Jun 5, 2026

About 2 months ago I noticed both NVIDIA and Apple were positioning themselves to supply hardware for 4-bit quantization hardware in the home. Open source models had already been working on this problem for a while with tech like MoE, to make more powerful models available on less expensive hardware. This lead to my prediction that is: bleeding edge AI is trying to find it’s way into our homes this year, but more so next.

They are all taking different approaches to reduce hardware costs. Apple’s platform, with its unified memory architecture made for a natural choice. High end GPU level memory speeds are a perfect match for inference, but they lack in raw TOPS for training and decoding. NVIDIA on the other hand, has been more focused on using their raw horsepower capabilities paired with lower memory speeds. It seems Apple is focused more on running AI in homes and NVIDIA would like to help us train larger models for less money. In any case, I do not think either offering is a coincidence. NVIDIA doubled down recently. Their Linux oriented machines have been available for over a year, but now they are coming to Windows. And Microsoft is so invested in the idea that they even made some Linux commands available, without the use of WSL. It’s likely the joint venture was delayed due to need to better ready Windows for ARM and Linux tools. They were building out a compatibility layer between two worlds.

NVIDIA announced about a month ago that they are working with Dell to get raw AI horsepower on prem. It’s interesting timing, to say the least.

Companies have been blowing through their AI budgets in record time, after positioning themselves to be AI first, AI always, even at the cost of jobs. It makes sense. If we assume a human and AI are equivalent in cost and performance, then they have can have AI “employees” that work 24 hours a day. They never ask for a break or time off. They don’t get sick (but downtime is real). And the assumption with tech is always that it gets better and cheaper, quickly on an exponential scale. The whole Moore's law thing, the j-curve we were accustomed to for decades.

It hasn’t become cheaper though. The opposite is happening.

New frontier models are getting more and more expensive, while open source models in the Qwen and DeepSeek varieties have changed, too. The biggest and best open source models are now being gated behind paywalls. They too knew a change was coming and they want their cut.

Previously, a high-end GPU found in homes could run a 27b sized model well, but with MoE we can now run 70b class models and that keep up with the last generation of frontier models. That is to say: a home user is within reach of something like Sonnet 4.6, without a subscription.

With price hikes continuing, uses are losing access to Sonnet 4.6 / GPT 5.4 level models. They can no longer use them for a whole month’s worth of coding for $40. $100-200 a month is what’s realistic now — a 2.5 to 5x price hike for many.

Now that we gained accessed affordable hardware that can run 4-bit quants, while losing minimal amounts of quality, it was time for Qwen, DeepSeek, etc to capitalize. If users are priced out of next generation of frontier models and they can only subscribe to the higher capabilities of the Chinese offerings, then many have seen and will see the value proposition. Some have been willing to pay for the much less expensive alternatives, leading to tremendous potential growth for the Chinese products. They were patient, waited for this moment, and are capitalizing. It rarely pays to be first and they know this. They clearly trying to create an offering where home users get their cake, but businesses foot are the bill. It's not a bad model by any means, but it does restrict access to power users, who have and want to invest in workstation class GPUs.

As an aside, this is a result of China not having the same access to very high end components. They made the push for high quality quantizations, because that's the hardware they have access to. So when they offer hosted solutions at a cheaper price than companies like Anthropic and OpenAI, it's because they are running it on much less expensive hardware.

So what will power the next generation of affordable models? The same players probably, but I wouldn’t expect much more for free. And now that the models are restricted to the 70b class, but hardware is gearing towards 120b support, who will fill that gap? That I do not know. Perhaps another open source provider or NVIDIA as it continues to develop it’s Nemo series. The problem with Nemo and GPT-OSS is that they are meant to be blank slates. They require significant harnessing in order to make them effective at tasks like coding, but will always fall behind a model designed for it. The good news is CoPilot gives us that harnessing for free and we can plug-in our own models, regardless if they are local or in the cloud. This is why I am so invested in my own home AI setup.

Pay up or no access to the good stuff is the new norm. Investors want their returns, both domestically and abroad.

It’s not to say it’s worrisome. If you have some good local hardware, being locked into current gen AI still leaves you in a good place, at a low cost. If you are trying to bring that type of power at scale, for your organization, there are options now, but it’s far for affordable. Companies will have to decide if they want to invest in their own infrastructure, continue to use providers with rising costs, or switch to foreign providers that put data at an even greater risk. It’s an age old problem really. When I was in IT these types of decisions were made every day. I worked for companies that wanted their own labs, companies that wanted cost effective on-prem solutions, and companies that wanted to go full cloud. There isn’t a right answer, outside of ROI. More infrastructure means dealing with aging hardware and more human labor. Choosing to go with managed solutions means accounting for rising costs, but access to bleeding edge tech.

The landscape that’s playing out seems to be no different than traditional IT vs cloud. Cloud largely won, but some companies have pulled back in recent years. The math changes over time, so companies must be prepared to make forecasts and stay on top of trends.

The next era of AI compute is going to get expensive, fast. NVIDIA knows their memory bandwidth holds them back, which is why their next architecture is based around HBM, rather than GDDR. It’s much faster, but also much much more expensive, during a time when memory costs have soared. As far as I can tell, NVIDIA has no desire to keep hardware costs low, but they are trying to slow down how much it hurts with software, i.e. by supplying efficient models like we see with the Nemo family.

There are a lot of factors at play and some I haven't even mentioned. NVIDIA and others have been trying to get reliable edge TPUs distributed into homes. It's a clear indication that the current pace of bringing AI data centers online is both too slow and so is getting the power to feed them. Companies have already had marketplaces in place to rent GPUs from people, but they are plagued with issues around reliability. It think it's more likely they will aim for putting them in business that can supply higher amounts of power and more reliable internet.

I have found following the AI hardware trends to be highly informative as to what is coming next. I will continue to do so and share my insights and predictions. But do keep in mind, these trends seem to be changing on a month-to-month basis or so. It's a lot of change, expensive change at the speed of sound.