More

martinald · 2025-12-28T02:11:23 1766887883

I don't think the benchmarks catch this very well. Opus 4.5 is _significantly_ better than Sonnet 4.5 in my experience, far more than the SWE Bench scores would say. I can happily leave Opus 4.5 running for 20-30 minutes and come back to very high quality software on complex tasks/refactoring. Sonnet 4.5 would fall over within a couple of minutes on these tasks.

20251227 · 2025-12-28T02:20:55 1766888455

What does "very high quality" mean here

martinald · 2025-12-28T02:07:10 1766887630

Good catch, bad wording from me. Revised on the post.

martinald · 2025-12-24T01:03:46 1766538226

The problem is I think that they are using moonlight which is "designed" to stream games at very low latency. I very much doubt that people need <30ms response times watching an agent terminal or whatever they are showing!

When you try and use h264 et al at low latency you have to get rid of a lot of optimisations to encode it as quickly as possible. I also highly suspect the vaapi encoder is not very good esp at low bitrates.

I _think_ moonlight also forces CBR instead of VBR, which is pretty awful for this use case - imagine you have 9 seconds of 'nothing changing' and then the window moves for 0.25 seconds. If you had VBR the encoder could basically send ~0kbit/sec apart from control metadata, and then spike the bitrate up when the window moved (for brevity I'm simplifying here, it's more complicated than this but hopefully you get the idea).

Basically they've used the wrong software entirely. They should try and look at xrdp with x264 as a start.

phire · 2025-12-24T04:53:27 1766552007

Yeah, i think the author has been caught out by the fact that there simply isn’t a canonical way to encode h264.

JPEG is nice and simple, most encoders will produce (more or less) the same result for any given quality settings. The standard tells you exactly how to compress the image. Some encoders (like mozjpeg) use a few non-standard tricks to produce 5-20% better compression, but it’s essentially just a clever lossy preprocessing pass.

With h264, the standard essentially just says how decompressors should work, and it’s up to the individual encoders to work out to make best use of the available functionality for their intended use case. I’m not sure any encoder uses the full functionality (x264 refuses to use arbitrary frame order without b-frames, and I haven’t found an encoder that takes advantage of that). Which means the output of different encoders has wildly different results.

I’m guessing moonlight makes the assumption that most of its compression will come from motion prediction, and then takes massive shortcuts when encoding iframes.

martinald · 2025-12-24T00:55:42 1766537742

In fairness VNC-style approaches are bloody awful even over my 2.5gbit/sec lan on very fast hardware. It just cannot do 4K well (not sure if they need 4k or not).

I spent some time compiling the "new" xrdp with x264 and it is incredibly good, basically cannot really tell that I'm remote desktoping.

The bandwidth was extremely low as well. You are correct on that part, 40mbit/sec is nuts for high quality. I suspect if they are using moonlight it's optimized for extremely low latency at the expense of bandwidth?

Scaevolus · 2025-12-24T04:14:24 1766549664

Moonlight is mostly designed to stream your gaming desktop to a portable device or your TV at minimal latency and maximum quality within a LAN. For that, 40Mbps is quite reasonable. It's obviously absurd for mundane VNC/productivity workloads.

adastra22 · 2025-12-24T14:45:19 1766587519

They are streaming AI coding agents. They are not streaming 4K video.

martinald · 2025-12-21T02:23:34 1766283814

Interesting. I did finally find a use for IPv6 which I wrote up here: https://martinalderson.com/posts/i-finally-found-a-use-for-i...

Tbh though the docker problems are very serious and extremely painful to work around. Everything works great apart from Docker which has so many issues - it does not handle IPv6 inbound but IPv4 out well at all (at least as far as I can tell!).

martinald · 2025-12-19T13:28:03 1766150883

Yes I remember a while ago it fixing a pipeline problem because I had managed to copy and paste an IP with one of the digits missing at the end. Spent about an hour before that looking at everything else (all the other steps succeeded, but the last one 'timed out', because I copy and pasted it wrong at the end). Took it <30secs as you said to instantly diagnose the problem.

martinald · 2025-12-15T15:42:58 1765813378

Agreed. This is why PE buys so many SaaS companies!

My article here isn't really aimed at "good" SaaS companies that put a lot of thought into design, UX and features. I'm thinking of the tens/hundreds of thousands+ of SaaS platforms that have been bought by PE or virtually abandoned, that don't work very well and send a 20% cost renewal through every year.

martinald · 2025-12-15T15:39:13 1765813153

Author here, really good comment and I agree with you.

What _has_ surprised me though is just how many companies are (or are considering) building 'internal' tooling to replace SaaS they are not happy with. These are not the classic HN types whatsoever. I think when non technical people get to play with AI software dev they go 'wow so why can't we do everything like this'.

I think your point 3 is really interesting too.

But yes the point of my article (hopefully) wasn't that SaaS is overnight dead, but some thin/lower "quality" products are potentially in real trouble.

People will still buy and use expertly designed products that are really nice to use. But a lot of b2b SaaS is not that, its a slow clunky mess that wants to make you scream!

OxfordOutlander · 2025-12-17T17:27:58 1765992478

Thanks!

I agree - it is surprising how many are looking at doing in house.

I think what they miss (and I say this as someone who spent the early part of his career outside of tech) is an understanding of what goes into maintaining software products - and this ignorance will be short lived. I was honestly shocked how complex it was to build and maintain my first web app. So business types (like I was) who are used to 'maintaining' an excel spreadsheet and powerpoint deck they update every quarter may think of SaaS like a software license they can build once and use forever. They have no appreciation of the depth of challenges that come with maintaining anything in production.

My working model is that of no-code - many non-tech types experimented with bubble etc, but quickly realize that tech products are far deeper than the (heavily curated) surface level experience that the user has. It is not like an excel model where the UI is codebase. I expect vibe-coders will find the same thing.

I have on several occasions built my own versions of tools, only to cave and buy a $99 a year off the shelf version because the maintenance time isn't worth it. Non-tech folks have no idea of the depth of pain of maintaining any system.

They will learn. Will be interesting to see how it plays out.

martinald · 2025-12-15T15:33:53 1765812833

Author here. What I'm seeing in particular is CRM/ERP solutions at risk. I know of two people in my peer group who are actively trying to replace a 'niche' ERP/CRM (not salesforce) with a agent built alternative. These are both >$100k/yr contracts.

They've outgrown the current (industry specific) products, arguably a long time ago. The discussions started like this:

1) Started building custom dashboards on top of data exports of said product with various AI tooling. 2) This was extremely successful, as a non developer "business" person could specify, build and iterate on the exact analytics. Painful to work with a developer on this as you need to quickly iterate once you see the data and realise where your thinking was wrong. Non developers also really struggle to explain this in a way that makes sense from a developers PoV. 3) ERP system at play wanted a renewal price which was a big increase, and API deprecation. This would require a lot of existing (pre "AI") integrations to be rewrote/redone. 4) Now building an internal replacement. They would not have even considered this before AI Agents.

FWIW this tool is not super complex, but it is extremely expensive (for what it does). It already has a load of limitations which are being worked round with various levels of horrible hacks.

There are a _lot_ of these kind of SaaS products about, for each industry. You never really hear about them.

Btw I use claude code nearly every day for many hours. Opus 4.5 has been a huge leap forward, I am blown away with how it can do 10-30 minute sessions without going wrong (Sonnet definitely needed constant babysitting). And the models/agent harnesses are only getting better. Claude Code isn't even a year old yet!

runako · 2025-12-15T19:45:47 1765827947

Thanks for responding. This sounds like the type of thing companies have done in cycles over the years. Some % gets dissatisfied with their vendor, so they bring in-house. That predictably sucks (distraction, can't keep up with third-party tools, etc.) and so the companies go back into the marketplace. I've even been the one building some of those internal tools. :-)

Overall, that story sounds more like the niche is not well-served by software and perhaps there is an opening for a competitor to serve them well. Or perhaps the attrition will make the incumbents improve.

martinald · 2025-12-09T01:00:12 1765242012

I think we are mixing up two things here.

Robotics for picking and the general feasibility of grocery delivery in the US.

Ocado is good because their stock levels are far more accurate than the other supermarkets for online ordering, so you don't get as many substituted/missing items. This is sort of a side effect of having dedicated picking facilities Vs "real" supermarkets. I would not be surprised if they "lose" less stock as well compared to in supermarket fulfillment.

Then you have "is online grocery good in the US"? There's a lot of areas of the US that have reasonable density for this kind of service imo, and the road infrastructure is generally far superior to the UK which negates any loss of density (as you care about time between deliveries, not distance per se). I imagine the much better parking options in most suburbs in the US also helps efficiency (it's an absolute nightmare for a lot of the online delivery cos when there isn't off street parking in the UK and they have to park pretty far from the drop).

It sounds to me that Kroger just messed up the execution of this in terms of "marketing" more than anything.