This is probably not a core concern for most HN readers, but at work we do multilingual testing for synthetic text data generation and natural language processing. Emphasis on multilingual. Gemini has made some serious leaps from 1.5 to 2.5 and now 3.0, and is actually proficient in languages that other models can only dream of. On the other hand, GPT-5 has a really mixed performance in a lot of categories.
This goes way back. Even back in the 1.5 days it was the best multilingual model, when HN still treated it as entirely uncompetitive all-around. Just because, exactly as you're saying, it's not a core concern of people here. The two fields Gemini models have been number one at for years now are A. multilinguality B. image understanding. At no point since the release of Gemini 1.5 Pro way back has any Anthropic or OpenAI model done performed better at either.
Even those who have zero experience with different (human) languages could've known this if they liked, from the fact that on the LMArena leaderboards, Gemini models have consistently ranked much higher in non-English languages than in English. This gap has actually shrunk a lot over time! In the 1.5 Pro days this advantage was huge, it would be like 10th in English and 2nd in many other languages.
Nevertheless, it still depends on the specific language you're targeting. Gemini isn't the winner on every single one of them. If you're only going to choose one model for use with many languages, it should be Gemini. But if the set of languages isn't too large, optimizing model selection per language is worth it.
In our previous tests, when it was 1.5 Pro against GPT 4o and Claude Sonnet 3.7, Gemini wasn't winning in the multilingual race, but it was definitely competitive. 2.5 and 3.0 seems to be big leaps from the 1.5 days.
That said, it also depends on the testing methodology; we tested a bunch of use cases mostly to test core linguistic proficiency. Not as much complex tasks in language or cultural knowledge.
Which languages, how popular, how many? The biggest difference has been for low-resource or far-from-English languages. Thai, Korean, Vietnamese, and so on. For something like German or French all of them were of course good enough that general intelligence and other factors overruled any language differences. I didn't take screenshots, maybe archive.org has them, but during the entire period of that generation of models on the LMArena leaderboard there was this large gap between 1.5 Pro rankings on such languages vs on English, which was backed up by our experience including feedback from groups of native speakers.
And regarding specific models - we obviously only tested a few languages, and there are thousands of them in the world. But Gemini seems to lead the pack basically regardless of the language your throw at it. YMMV.
You could ask GPT for what it knows about you and use it to seed your personal preferences to a new model/app. Not perfect and probably quite lossy, but likely much better than starting from scratch.
+1 on this one!
I only use LLMs once I'm done with writing, and basically using them as my editor.
In case it helps anyone, here is my prompt:
"You are a professional writer and editor with many years of experience. Your task is to provide writing feedback, point out issues and suggest corrections. You do not use flattery. You are matter of fact. You don't completely rewrite the text unless it is absolutely necessary - instead you try to retain the original voice and style. You focus on grammar, flow and naturalness. You are welcome to provide advice changing the content, but only do that in important cases.
If the text is longer, you provide your feedback in chunks by paragraph or other logical elements.
Do not provide false praise, be honest and feel free to point out any issues."
(Yes, you kind of need to repeat you're actively not looking for a pat on the back, otherwise it keeps telling you how brilliant your writing is instead of giving useful advice.)
A very similar workflow on my end, both beets as the main tagger/organizer and Picard to pick up whatever can't be processed through beets. Beets is amazing!
Surprised this did not get any attention whatsoever. Some really surprising findings in it: 75% of firms already have a positive return on investment from AI, less than 5% negative return. Also 46% of businesses leaders now use AI daily themselves.
Although I loathe ads, I think that for new products where the presence of ads is disclosed clearly upfront, this is acceptable. Especially if this comes with a discount. We have Kindles with and without ads and people are generally fine with it.
But the fact that this gets retrofitted to fridges that people already bought, without any way of opting out or other mitigation, is criminal. Is this a lawsuit in the making? Am I naive?
This is not. An appliance—which should never be connected to the Internet in the first place—that costs 4 digits in USD should never show you ads in any way. You obviously don't "loathe" ads if you're even capable of reasoning about that kind of behavior. Stop trying to shove ads 24/7 in every millimeter of available space.
On the upside, it will be very easy to find more reliable and cheaper fridges than that.
I dont like them either, but if the fridge were cheaper (this one isn't) or had other features I wanted, I'd consider buying it and just covering up the ad screen.
Another concern is energy consumption. In this age of energy star and touting how efficient appliances are, it doesn't make sense to have a screen that is always-on.
The real killer feature Samsung needs is an automatic warranty. I would never buy Samsung again, unless they started a new trend where all repairs are covered for 10 years (not sold as an add-on during checkout).
Azure copilot is really something. It can't see the context of the page it's embedded in, and the message you send is limited to 500 characters, so good luck pasting a log or configuration.
Photoshop now has a bunch of features that get used in professional environments. And in the end user space, facial recognition or magic eraser are features in apps like Google Photos that people actively use and like. People probably don't care that it's AI under the hood, in fact they probably don't even realize.
There is a lot of unchecked hype, but that doesn't mean there is no substance.
When people say AI, they refer to LLMs. Your examples are models in general which have been around for a lot longer before the OpenAI and techbros had the AGI wet dream.