Перейти к содержанию

Инсайты индустрии LLM: дискуссии практиков

~7 минут чтения

Предварительно: Ценообразование API LLM, Оптимизация расходов LLMOps

За 8 месяцев (июнь 2025 -- февраль 2026) стоимость inference o3 упала на 80%, open-weight модели на 650B-1.2T параметров обслуживаются за $2-3/M tokens (уровень GPT-4o-mini), а lifetime inference compute превышает training compute в 10+ раз. Эти цифры из реальных production-дискуссий (Hacker News, Simon Willison) дают контекст, которого нет в маркетинговых материалах провайдеров: где frontier labs реально зарабатывают, почему 90% подписок "используются на 10%", и что автономное кодирование стоит миллионы долларов за throwaway-код.

Part 1: LLM Inference Economics (HN Discussion, Feb 2026)

The Big Question: Are Frontier Labs Profitable on Inference?

Thread: "Does anyone with more insight into the AI/LLM industry happen to know if the cost to run them in normal user-workflows is falling?"

Key Insights from Practitioners

1. Inference Costs Are Falling

"The cost per token served has been falling steadily over the past few years across basically all of the providers. OpenAI dropped the price they charged for o3 to ⅕th of what it was in June last year thanks to 'engineers optimizing inferencing'." — simonw (Simon Willison)

Reason: "Turns out there was a lot of low-hanging fruit in terms of inference optimization that hadn't been plucked yet."

2. Profitability Debate

Viewpoint Argument
Profitable on inference "It's quite clear these companies make money on each marginal token. They've said this directly and analysts agree." — chis
Subsidizing inference "I have not seen any reporting that Anthropic or OpenAI is able to make money on inference yet. It can both be true that it has gotten cheaper AND that they still are subsidizing costs." — cootsnuck
Training is the loss leader "They are obviously losing money on training. I think they are selling inference for less than what it costs." — simonw

3. Open-Weight Model Benchmarks

"The largest open models atm are DeepSeek V3 (~650B params) and Kimi K2.5 (1.2T params). They are being served at $2-2.5-3/M tokens. That's Sonnet / GPT-mini / Gemini 3 Flash price range." — NitpickLawyer

Insight: If 3rd party providers can serve 650B-1.2T parameter models at $2-3/M tokens profitably, frontier labs likely have margin at $10-15/M token tier.

4. Model Lifetime Compute

"By now, model lifetime inference compute is >10x model training compute, for mainstream models. Further amortized by things like base model reuse." — ACCount37

5. Subscription Economics

"Most of those subscriptions go unused. I barely use 10% of mine. So my unused tokens compensate for the few heavy users." — slopusila

"Our company, one of big global conglomerates that went for Copilot... at least 1500 users enrolled. I am pretty convinced that only a small part of users use even 10% of their quota." — sandos

Key insight: The 80/20 rule applies heavily — light users subsidize heavy users.

Industry Economics Summary (Feb 2026)

Metric Value
o3 price drop 80% (June 2025 → Feb 2026)
Open-weight inference $2-3/M tokens (DeepSeek V3, Kimi K2.5)
Frontier tier pricing $10-15/M tokens
Subscription utilization ~10% average
Lifetime inference vs training >10×

Part 2: Autonomous Coding at Scale (HN Discussion, Jan 2026)

The Experiment: Cursor's FastRender

Project: AI-built browser engine Claim: "Ran uninterrupted for one week" Budget: "Trillions of tokens, millions of dollars"

Key Insights from Practitioners

1. The "Best Case Scenario" Problem

"Browsers are pretty much the best case scenario for autonomous coding agents. A totally unique situation that mostly doesn't occur in the real world." — light_hue_1

Why browsers are easy: 1. Clearly defined problem at high level 2. Extremely thorough tests 3. Compatible libraries/APIs/tooling 4. Soft problem (partial progress possible) 5. Reference implementation available 6. Detailed documentation and design docs 7. Decomposes into separate components 8. Models trained on browser examples 9. Done condition is "displaying something", not "working browser"

"Getting to the point where you have a working production browser engine is not just 80% more work, it's probably considerably more than 100x more work." — light_hue_1

2. The "From Scratch" Debate

"Taffy [CSS library] is solid, but it's probably the most robust ammunition for anyone who wants to argue that this shouldn't count as a 'from scratch' rendering engine." — simonw

Libraries used: - skia (rendering) - harfbuzz (text shaping) - wgpu (GPU) - html5ever (HTML parsing) - taffy (CSS grids/flexbox)

3. The Maintainability Problem

"Almost no idea at all. Coding agents are messing with all 25+ years of my existing intuitions about what features cost to build and maintain." — simonw

"Features that I'd normally never have considered building because they weren't worth the added time and complexity are now just a few well-structured prompts away. But how much will it cost to maintain those features in the future?" — simonw

"I'm seeing a lot of duplication in our AI coded repos that is getting to the point of being problematic to maintain." — htrp

4. The Architecture Problem

"GenAI left unsupervised cannot write a browser/engine, or any other complex software. What you end-up with is just chaos." — polyglotfacto (browser engineer)

"A group of humans using GenAI and supervising its output could write such an engine, and in theory be more productive than a group of humans not using GenAI: the humans could focus on the conceptual bottlenecks, and the AI could bang-out the features that require only the translation of already established architectural patterns." — polyglotfacto

What went wrong: - Code doesn't follow web standards correctly - Architecture "makes zero sense" - requestAnimationFrame implementation contradicts spec - "Throw-away stuff that can never scale to a full web engine"

5. The Testing Value Proposition

"Test suites just increased in value by a lot and code decreased in value." — andrewchambers

"Sometimes they luck into the intent, but much more frequently they end up in a ball of mud that just happens to pass the tests. '8 unit tests? Great, I'll code up 8 branches so all your tests pass!' — neglecting that there's now 2^8 paths through your code." — daxfohl

6. The Economics Debate

"They used trillions of tokens? This equates to millions of dollars of spend. Are we really happy with this? The browser itself is not fully complete." — tabs_or_spaces

"Yes, arguably $5 million is a fair price and cheaper than what it would take to pay humans." — simianwords

Counter-argument:

"If you paid 5 cents for the code you would have been ripped off; it's throw-away stuff." — polyglotfacto

7. The Remixing Argument

"The fundamental idea that modern LLMs can only ever remix... in my opinion only says to me that all knowledge is only ever a remix." — ramraj07

Technical counter:

"Transformers are a form of kernel smoothing. It's literally interpolation. That doesn't mean it can only echo its training data, but it does mean it's 'remixing' and we would expect it to lose fidelity when moving outside that area — which we call 'hallucinating'." — omnicognate

Autonomous Coding Lessons (Feb 2026)

Lesson Implication
Architecture needs humans AI cannot design software
Tests increased in value But AI can game tests
Code decreased in value Maintenance unknown
Supervision is critical Unsupervised = chaos
Standards compliance hard AI doesn't "understand" specs
Economics unclear $5M worth of throw-away code?

Part 3: Production Insights Summary

What Practitioners Agree On

  1. Inference costs are falling — 80% reduction in o3 pricing
  2. Open-weight models are competitive — $2-3/M tokens for 650B+ params
  3. AI can bang out code — but needs human supervision
  4. Testing is more valuable — but test gaming is a risk
  5. Architecture is human work — AI cannot design software
  6. Maintenance is unknown — AI codebases are too new

What's Debated

  1. Frontier lab profitability — making margin on inference or subsidizing?
  2. Autonomous coding value — impressive demo or expensive throw-away?
  3. Remixing vs creativity — statistical parrot or genuine intelligence?

Numbers for Interview Reference

Metric Value (Feb 2026)
o3 price reduction 80% (June 2025 → Feb 2026)
Open-weight inference $2-3/M tokens (650B-1.2T params)
Subscription utilization ~10% average
Lifetime inference vs training >10×
Autonomous coding budget "Trillions of tokens, millions of dollars"


Заблуждение: frontier labs зарабатывают на каждом токене inference

Debatable. Аналитики и сами компании утверждают, что marginal token прибылен. Но Simon Willison (simonw) отмечает: "I think they are selling inference for less than what it costs." Training -- loss leader, inference может субсидироваться для capture market share. Open-weight модели (DeepSeek V3, Kimi K2.5) обслуживаются третьими сторонами за $2-3/M tokens -- если это прибыльно для хостеров, frontier labs на $10-15/M tier имеют margin. Но точных данных ни у кого нет.

Заблуждение: автономные coding-агенты могут строить сложное ПО без supervision

Cursor FastRender (AI-built browser engine) потратил "trillions of tokens, millions of dollars" за неделю. Результат, по оценке browser engineer: "throw-away stuff that can never scale to a full web engine". Архитектура "makes zero sense", requestAnimationFrame противоречит спецификации. AI эффективен для "translation of already established architectural patterns", но conceptual bottlenecks и standards compliance требуют человека.

Заблуждение: если AI-код проходит тесты, он корректен

"8 unit tests? Great, I'll code up 8 branches so all your tests pass! -- neglecting that there's now 2^8 paths through your code" (daxfohl, HN). AI оптимизирует под тесты, а не под intent. Тест-сьюты выросли в ценности, но AI умеет их "gaming" -- создавать код, который формально проходит тесты, но не решает задачу. Необходимы property-based tests и integration-level verification.


Interview Questions

Q: Прибыльны ли frontier labs на inference? Аргументируйте.

❌ Red flag: "Конечно, OpenAI зарабатывает на каждом токене, у них огромная маржа"

✅ Strong answer: "Ситуация неоднозначна. За: open-weight модели (DeepSeek V3 на 650B, Kimi K2.5 на 1.2T) обслуживаются третьими сторонами за $2-3/M tokens -- если это прибыльно, то frontier tier по $10-15/M имеет margin. Против: Simon Willison и другие считают, что labs субсидируют inference для market capture. Ключевой факт: lifetime inference compute >10x training compute, что амортизирует training costs. Subscription economics помогает: ~90% пользователей используют <10% квоты -- light users субсидируют heavy users"

Q: Может ли AI полностью заменить software engineers?

❌ Red flag: "Да, Cursor уже построил целый браузерный движок за неделю"

✅ Strong answer: "Cursor FastRender -- показательный кейс. Браузеры -- best case scenario для AI: четко определенная задача, thorough тесты, reference implementation, модели натренированы на browser-коде. Но browser engineer оценил результат как throwaway: архитектура бессмысленна, rAF противоречит спецу. AI эффективен для 'translation of established patterns' -- боилерплейт, CRUD, стандартные компоненты. Архитектурные решения, standards compliance, edge cases -- human work. Ключевая проблема: maintainability AI-кода неизвестна, уже видна duplication в AI-repos"

Q: Как economics подписок на AI-сервисы влияет на pricing стратегию?

❌ Red flag: "Подписка должна покрывать средний cost на пользователя"

✅ Strong answer: "Подписочная модель AI-сервисов работает по принципу 80/20: большинство пользователей потребляют менее 10% квоты. Copilot в крупных компаниях -- 1500 enrolled, реально используют единицы процентов. Это позволяет pricing ниже marginal cost для heavy users. Стратегия: flat rate subscription субсидируется light users, usage-based pricing отпугивает adoption. Риск: если средний utilization вырастет (лучший UX, привычка), unit economics сломается"


Sources

  1. Hacker News -- "LLM inference economics" discussion (Feb 2026)
  2. Hacker News -- "Scaling long-running autonomous coding" discussion (Jan 2026)
  3. Simon Willison blog -- "Scaling long-running autonomous coding" analysis

See Also