Was just about to say this. As a staff engineer your position is (or should be!) so secure that you can get away with asking all sorts of “dumb” questions that more junior engineers don’t want to ask. I will also regularly say things in meetings like “I don’t understand, can you take us through that again” or “can you remind me how <xyz thing> works?”. Sometimes this makes the difference between a meeting being useful and everyone just being confused but afraid to say so.
In an ideal world, juniors would all do this too, but I don’t blame them if they don’t. So it’s very important to do it if you have the social capital.
One of my favorite interview questions for senior positions is "Tell me about a decision you made that you would change in hindsight." Junior level people and people who are otherwise unfit for the role will try to give answers that minimize their responsibility or (worst case) have no examples. Senior level people will have an example where they can walk you through exactly how they messed up and what they would have done differently. Good senior level candidates examine their mistakes and are honest about them.
I do this for everyone, not just senior positions. "If you were to start that again today, knowing everything you do now, what would you do differently?" is a question you can ask regardless of seniority. Even if they've only done some school projects, being able to look back and say "yeah, that could have gone better from the start" is a hugely valuable signal.
The details of how I ask it might change based on seniority, but that I ask it? No.
It’s also true that the kind of people who are ready for staff level work are already doing staff level work. While social capital is a factor, it isn’t necessarily accumulated because of title or experience.
The idea of “disambiguation” is itself ambiguous. The way I recognize other people solving problems at a staff level is we are communicating in terms of properties, constraints and tradeoffs. Crucially, these constraints are not necessarily business constraints, but rather, constraints inherent to an architecture. For example, queuing works for ordering because it append-only, and monotonic. So as soon as you have multiple queues (such as partitioning) or try to reorder it, it also loses its ordering guarantee. Does the problem require ordering?
The first couple chapters of Roy Fielding’s dissertation goes through this. The first time I tried reading it, I did not have experience to understand. It was a slog and I got little out of it. The next time I tried reading it, it was helping me gel and articulate things I had started observing from experience. I recognized that I had previously been so focused on architectural elements and that the properties and constraints were far more important. It is this that determines what is being traded off, and antipatterns pop out. Knowing properties and constraints allows me to quickly identify problems, and start the process of disambiguation. Many of the other staff or principal engineers I have chatted with communicate along these lines.
I don’t try to ask smart questions or dumb questions. I ask questions so that I can understand properties and constraints.
I read this article back when I was learning the basics of transformers; the visualizations were really helpful. Although in retrospect knowing how a transformer works wasn't very useful at all in my day job applying LLMs, except as a sort of deep background for reassurance that I had some idea of how the big black box producing the tokens was put together, and to give me the mathematical basis for things like context size limitations etc.
I would strongly caution anyone who thinks that they will be able to understand or explain LLM behavior better by studying the architecture closely. That is a trap. Big SotA models these days exhibit so much nontrivial emergent phenomena (in part due to the massive application of reinforcement learning techniques) that give them capabilities very few people expected to ever see when this architecture first arrived. Most of us confidently claimed even back in 2023 that, based on LLM architecture and training algorithms, LLMs would never be able to perform well on novel coding or mathematics tasks. We were wrong. That points towards some caution and humility about using network architecture alone to reason about how LLMs work and what they can do. You'd really need to be able to poke at the weights inside a big SotA model to even begin to answer those kinds of questions, but unfortunately that's only really possible if you're a "mechanistic interpretability" researcher at one of the major labs.
Regardless, this is a nice article, and this stuff is worth learning because it's interesting for its own sake! Right now I'm actually spending some vacation time implementing a transformer in PyTorch just to refresh my memory of it all. It's a lot of fun! If anyone else wants to get started with that I would highly recommend Sebastian Raschka's book and youtube videos as way into the subject: https://github.com/rasbt/LLMs-from-scratch .
Has anyone read TFA author Jay Alammar's book (published Oct 2024) and would they recommend it for a more up-to-date picture?
> massive application of reinforcement learning techniques
So sad that "reinforcement learning" is another term whose meaning has been completely destroyed by uneducated hype around LLMs (very similar to "agents"). 5 years ago nobody familiar with RL would consider what these companies are doing as "reinforcement learning".
RLHF and similar techniques are much, much closer to traditional fine-tuning than they are reinforcement learning. RL almost always, historically, assumes online training and interaction with an environment. RLHF is collecting data from user and using it to reach the LLM to be more engaging.
This fine-tuning also doesn't magically transform LLMs into something different, but it is largely responsible for their sycophantic behavior. RLHF makes LLMs more pleasing to humans (and of course can be exploited to help move the needle on benchmarks).
It's really unfortunate that people will throw away their knowledge of computing in order to maintain a belief that LLMs are something more than they are. LLMs are great, very useful, but they're not producing "nontrivial emergent phenomena". They're increasing trained a products to invoked increase engagement. I've found LLMs less useful in 2025 than in 2024. And the trend in people not opening them up under the hood and playing around with them to explore what they can do has basically made me leave the field (I used to work in AI related research).
I wasn't referring to RLHF, which people were of course already doing heavily in 2023, but RLVR, aka LLMs solving tons of coding and math problems with a reward function after pre-training. I discussed that in another reply, so I won't repeat it here; instead I'd just refer you to Andrej Karpathy's 2025 LLM Year in Review which discusses it.
https://karpathy.bearblog.dev/year-in-review-2025/
> I've found LLMs less useful in 2025 than in 2024.
I really don't know how to reply to this part without sounding insulting, so I won't.
While RLVF is neat, it still is an 'offline' learning model that just borrows a reward function similar to RL.
And did you not read the entire post? Karpathy basically calls out the same point that I am making regarding RL which "of course can be exploited to help move the needle on benchmarks":
> Related to all this is my general apathy and loss of trust in benchmarks in 2025. The core issue is that benchmarks are almost by construction verifiable environments and are therefore immediately susceptible to RLVR and weaker forms of it via synthetic data generation. In the typical benchmaxxing process, teams in LLM labs inevitably construct environments adjacent to little pockets of the embedding space occupied by benchmarks and grow jaggies to cover them. Training on the test set is a new art form
Regarding:
> I really don't know how to reply to this part without sounding insulting, so I won't.
Relevant to citing him: Karpathy has publicly praised some of my past research in LLMs, so please don't hold back your insults. A poster on HN telling me I'm "not using them right!!!" won't shake my confidence terribly. I use LLMs less this year than last year and have been much more productive. I still use them, LLMs are interesting, and very useful. I just don't understand why people have to get into hysterics trying to make them more than that.
I also agree with Karpathy's statement:
> In any case they are extremely useful and I don't think the industry has realized anywhere near 10% of their potential even at present capability.
But magical thinking around them is slowing down progress imho. Your original comment itself is evidence of this:
> I would strongly caution anyone who thinks that they will be able to understand or explain LLM behavior better by studying the architecture closely.
I would say "Rip them open! Start playing around with the internals! Mess around with sampling algorithms! Ignore the 'win market share' hype and benchmark gaming and see just what you can make these models do!" Even if restricted to just open, relatively small models, there's so much more interesting work in this space.
RLVR is not offline learning. It's not learning from a static dataset. These are live rollouts that are being verified and which update the weights at each pass based on feedback from the environment.
You might argue that traditional RL involves multiple states the agent moves through. But autoregressive LLMs are the same: a forward pass generating a token also creates change in state.
After training, the weights are fixed, of course, but that is the case of most traditional RL systems. RL does not intrinsically mean a continual updating of weights in deployment, which carries a bunch of problems.
From the premise that RLVR can be applied to benchmaxx (true!) it does not follow that it therefore is only good for that.
What do you think about Geoffrey Hinton's concerns about the AI (minus "AGI")? Do you agree with those concerns or do you believe that LLMs are only that much "useful" so they wouldn't impose a risk on our society?
I agree and disagree. In my day job as an AI engineer I rarely if ever need to use any “classic” deep learning to get things done. However, I’m a firm believer that understanding the internals of a LLM can set you apart as an gen AI engineer, if you’re interested in becoming the top 1% in your field. There can and will be situations where your intuition about the constraints of your model is superior compared to peers who consider the LLM a black box. I had this advice given directly to me years ago, in person, by Clem Delangue of Hugging Face - I took it seriously and really doubled down on understanding the guts of LLMs. I think it’s served me well.
I’d give similar advice to any coding bootcamp grad: yes you can get far by just knowing python and React, but to reach the absolute peak of your potential and join the ranks of the very best in the world in your field, you’ll eventually want to dive deep into computer architecture and lower level languages. Knowing these deeply will help you apply your higher level code more effectively than your coding bootcamp classmates over the course of a career.
I suppose I actually agree with you, and I would give the same advice to junior engineers too. I've spent my career going further down the stack than I really needed to for my job and it has paid off: everything from assembly language to database internals to details of unix syscalls to distributed consensus algorithms to how garbage collection works inside CPython. It's only useful occasionally, but when it is useful, it's for the most difficult performance problems or nasty bugs that other engineers have had trouble solving. If you're the best technical troubleshooter at your company, people do notice. And going deeper helps with system design too: distributed systems have all kinds of subtleties.
I mostly do it because it's interesting and I don't like mysteries, and that's why I'm relearning transformers, but I hope knowing LLM internals will be useful one day too.
Wouldn't you say that people who pursue deep architectural knowledge should just go down the AI Researcher career track? I feel like that's where that sort of knowledge actualy matters.
I think the biggest problem is that most tutorials use words to illustrate how the attention mechanism works. In reality, there are no word-associated tokens inside a Transformer. Tokens != word parts. An LLM does not perform language processing inside the Transformer blocks, and a Vision Transformer does not perform image processing. Words and pixels are only relevant at the input. I think this misunderstanding was a root cause of underestimating their capabilities.
An example of why a basic understanding is helpful:
A common sentiment on HN is that LLMs generate too many comments in code.
But comment spam is going to help code quality, due to the way causal transformers and positional encoding works. The model has learned to dump locally-specific reasoning tokens where they're needed, in a tightly scoped cluster that can be attended to easily, and forgetting about just as easily later on. It's like a disposable scratchpad to reduce the errors in the code it's about to write.
The solution to comment spam is textual/AST post-processing of generated code, rather than prompting the LLM to handicap itself by not generating as much comments.
Unless you have evidence from a mechanistic interpretability study showing what's happening inside the model when it creates comments, this is really only a plausible-sounding just-so story.
Like I said, it's a trap to reason from architecture alone to behavior.
An example of why a basic understanding is helpful:
A common sentiment on HN is that LLMs generate too many comments in code.
For good reason -- comment sparsity improves code quality, due to the way causal transformers and positional encoding work. The model has learned that real, in-distribution code carries meaning in structure, naming, and control flow, not dense commentary. Fewer comments keep next-token prediction closer to the statistical shape of the code it was trained on.
Comments aren’t a free scratchpad. They inject natural-language tokens into the context window, compete for attention, and bias generation toward explanation rather than implementation, increasing drift over longer spans.
The solution to comment spam isn’t post-processing. It’s keeping generation in-distribution. Less commentary forces intent into the code itself, producing outputs that better match how code is written in the wild, and forcing the model into more realistic context avenues.
Literally the exact thing I tell new hires on projects for training models: theory is far less important than practice.
We are only just beginning to understand how these things work. I imagine it will end up being similar to Freud’s Oedipal complex: when we failed to have a fully physical understanding of cognition, we employed a schematic narrative. Something similar is already emerging.
> would never be able to perform well on novel coding or mathematics tasks. We were wrong
I'm not clear at all we were wrong. A lot of the mathematics announcements have been rolled back and "novel coding" is exactly where the LLMs seem to fail on a daily basis - things that are genuinely not represented in the training set.
The essence of it is that after the "read the whole internet and predict the next token" pre-training step (and the chat fine-tuning), SotA LLMs now have a training step where they solve huge numbers of tasks that have verifiable answers (especially programming and math). The model therefore gets the very broad general knowledge and natural language abilities from pre-training and gets good at solving actual problems (problems that can't be bullshitted or hallucinated through because they have some verifiable right answer) from the RL step. In ways that still aren't really understood, it develops internal models of mathematics and coding that allow it to generalize to solve things it hasn't seen before. That is why LLMs got so much better at coding in 2025; the success of tools like Claude Code (to pick just one example) is built upon it. Of course, the LLMs still have a lot of limitations (the internal models are not perfect and aren't like how humans think at all), but RL has taken us pretty far.
Unfortunately the really interesting details of this are mostly secret sauce stuff locked up inside the big AI labs. But there are still people who know far more than I do who do post about it, e.g. Andrej Karpathy discusses RL a bit in his 2025 LLMs Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/
You can download a base model (aka foundation, aka pretrain-only) from huggingface and test it out. These were produced without any RL.
However, most modern LLMs, even base models, would be not just trained on raw internet text. Most of them were also fed a huge amount of synthetic data. You often can see the exact details in their model cards. As a result, if you sample from them, you will notice that they love to output text that looks like:
6. **You will win millions playing bingo.**
- **Sentiment Classification: Positive**
- **Reasoning:** This statement is positive as it suggests a highly favorable outcome for the person playing bingo.
A base LLM that has only been pre-trained (no RL = reinforcement learning), is not "planning" very far ahead. It has only been trained to minimize prediction errors on the next word it is generating. You might consider this a bit like a person who speaks before thinking/planning, or a freestyle rapper spitting out words so fast they only have time to maintain continuity with what they've just said, not plan ahead.
The purpose of RL (applied to LLMs as a second "post-training" stage after pre-training) is to train the LLM to act as if it had planned ahead before "speaking", so that rather than just focusing on the next word it will instead try to choose a sequence of words that will steer the output towards a particular type of response that had been rewarded during RL training.
There are two types of RL generally applied to LLMs.
1) RLHF - RL from Human Feedback, where the goal is to generate responses that during A/B testing humans had indicated a preference for (for whatever reason).
2) RLVR - RL with Verifiable Rewards, used to promote the appearance of reasoning in domains like math and programming where the LLM's output can be verified in someway (e.g. math result or program output checked).
Without RLHF (as was the case pre-ChatGPT) the output of an LLM can be quite unhinged. Without RLVR, aka RL for reasoning, the abilty of the model to reason (or give the appearance of reasoning) is a function of pre-training, and won't have the focus (like putting blinkers on a horse) to narrow generative output to achieve the desired goal.
It is almost like understanding wood at a molecular level and being a carpenter. It also may help the carpentery, but you cam be a great one without it. And a bad one with the knowledge.
> Most of us confidently claimed even back in 2023 that, based on LLM architecture and training algorithms, LLMs would never be able to perform well on novel coding or mathematics tasks.
I feel like there are three groups of people:
1. Those who think that LLMs are stupid slop-generating machines which couldn't ever possibly be of any use to anybody, because there's some problem that is simple for humans but hard for LLMs, which makes them unintelligent by definition.
2. Those who think we have already achieved AGI and don't need human programmers any more.
3. Those who believe LLMs will destroy the world in the next 5 years.
I feel like the composition of these three groups is pretty much constant since the release of Chat GPT, and like with most political fights, evidence doesn't convince people either way.
Those three positions are all extreme viewpoints. There are certainly people who hold them, and they tend to be loud and confident and have an outsize presence in HN and other places online.
But a lot of us have a more nuanced take! It's perfectly possible to believe simultaneously that 1) LLMs are more than stochastic parrots 2) LLMs are useful for software development 3) LLMs have all sorts of limitations and risks (you can produce unmaintainable slop with them, and many people will, there are massive security issues, I can go on and on...) 4) We're not getting AGI or world-destroying super-intelligence anytime soon, if ever 5) We're in a bubble and it's going to pop and cause a big mess 6) This tech is still going to be transformative long term, on a similar level to the web and smartphones.
Don't let the noise from the extreme people who formed their opinions back when ChatGPT came out drown out serious discussion! A lot of us try and walk a middle course with this and have been and still are open to changing our minds.
Good to see Designing Data-Intensive Applications on there, but it should be higher — certainly above the thoroughly middling Clean Code at least! DDIA is still the first book I tell every junior to read after they’ve got a couple years experience under their belt. Can’t wait for the 2nd edition!
Thanks for articulating this position. I disagree with it, but it is similar to the position I held in late 2024. But as antirez says in TFA, things changed in 2025, and so I changed my mind ("the facts change, I change my opinions"...). LLMs and coding agents got very good about 6 months ago and myself and a lot of other seasoned engineers I respect finally starting using them seriously.
For what it's worth:
* I agree with you that LLMs probably aren't a path to AGI.
* I would add that I think we're in a big investment bubble that is going to pop,
which will create a huge mess and perhaps a recession.
* I am very concerned about the effects of LLMs in wider society.
* I'm sad about the reduced prospects for talented new CS grads and other entry-level engineers in this world, although sometimes AI is just used as an excuse to paper over macroeconomic reasons for not hiring, like the end of ZIRP.
* I even agree with you that LLMs will lead to some maintenance nightmares in the industry. They amplify engineers' ability to produce code, and there a lot of bad engineers out there, as we all know: plenty of cowboys/cowgirls who will ship as much slop as they can get away with. They shipped unmaintainable mess before, they will ship three times as much now. I think we need to be very careful.
But, if you are an experienced engineer who is willing to be disciplined and careful with your AI tools, they can absolutely be a benefit to your workflow. It's not easy: you have to move up and down a ladder of how much you rely on the tool, from true vide coding for throwaway use-once helper scripts for some dev or admin task with a verifiable answer, all the way up to hand-crafting critical business logic and only using the agent to review it and to try and break your implementation.
You may still be right that they will create a lot of problems for the industry. I think the ideal situation for using AI coding agents is at a small startup where all the devs are top-notch, have many years of experience, care about their craft, and hold each other to a high standard. Very very few workplaces are that. But some are, and they will reap big benefits. Other places may indeed drown in slop, if they have a critical mass of bad engineers hammering on the AI button and no guard-rails to stop them.
This topic arouses strong reactions: in another thread, someone accused me of "magical thinking" and "AI-induced psychosis" for claiming precisely what TFA says in the first paragraph: that LLMs in 2025 aren't the stochastic parrots of 2023. And I thought I held a pretty middle of the road position on all this: I detest AI hype and I try to acknowledge the downsides as well as the benefits. I think we all need to move past the hype and the dug-in AI hate and take these tools seriously, so we can identify the serious questions amidst the noise.
This is the 2023 take on LLMs. It still gets repeated a lot. But it doesn’t really hold up anymore - it’s more complicated than that. Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.
Sure, LLMs do not think like humans and they may not have human-level creativity. Sometimes they hallucinate. But they can absolutely solve new problems that aren’t in their training set, e.g. some rather difficult problems on the last Mathematical Olympiad. They don’t just regurgitate remixes of their training data. If you don’t believe this, you really need to spend more time with the latest SotA models like Opus 4.5 or Gemini 3.
Nontrivial emergent behavior is a thing. It will only get more impressive. That doesn’t make LLMs like humans (and we shouldn’t anthropomorphize them) but they are not “autocomplete on steroids” anymore either.
> Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.
This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain.
> they are not “autocomplete on steroids” anymore either.
Yes, they are. The steroids are just even more powerful. By refining training data quality, increasing parameter size, and increasing context length we can squeeze more utility out of LLMs than ever before, but ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences.
First, this is completely ignoring text diffusion and nano banana.
Second, to autocomplete the name of the killer in a detective book outside of the training set requires following and at least some understanding of the plot.
Reinforcement learning is a technique for adjusting weights, but it does not alter the architecture of the model. No matter how much RL you do, you still retain all the fundamental limitations of next-token prediction (e.g. context exhaustion, hallucinations, prompt injection vulnerability etc)
You've confused yourself. Those problems are not fundamental to next token prediction, they are fundamental to reconstruction losses on large general text corpora.
That is to say, they are equally likely if you don't do next token prediction at all and instead do text diffusion or something. Architecture has nothing to do with it. They arise because they are early partial solutions to the reconstruction task on 'all the text ever made'. Reconstruction task doesn't care much about truthiness until way late in the loss curve (where we probably will never reach), so hallucinations are almost as good for a very long time.
RL as is typical in post-training _does not share those early solutions_, and so does not share the fundamental problems. RL (in this context) has its own share of problems which are different, such as reward hacks like: reliance on meta signaling (# Why X is the correct solution, the honest answer ...), lying (commenting out tests), manipulation (You're absolutely right!), etc. Anything to make the human press the upvote button or make the test suite pass at any cost or whatever.
With that said, RL post-trained models _inherit_ the problems of non-optimal large corpora reconstruction solutions, but they don't introduce more or make them worse in a directed manner or anything like that. There's no reason to think them inevitable, and in principle you can cut away the garbage with the right RL target.
Thinking about architecture at all (autoregressive CE, RL, transformers, etc) is the wrong level of abstraction for understanding model behavior: instead, think about loss surfaces (large corpora reconstruction, human agreement, test suites passing, etc) and what solutions exist early and late in training for them.
> This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain
I wasn’t arguing that LLMs are like a human brain. Of course they aren’t. I said twice in my original post that they aren’t like humans. But “like a human brain” and “autocomplete on steroids” aren’t the only two choices here.
As for appealing to complexity, well, let’s call it more like an appeal to humility in the face of complexity. My basic claim is this:
1) It is a trap to reason from model architecture alone to make claims about what LLMs can and can’t do.
2) The specific version of this in GP that I was objecting to was: LLMs are just transformers that do next token prediction, therefore they cannot solve novel problems and just regurgitate their training data. This is provably true or false, if we agree on a reasonable definition of novel problems.
The reason I believe this is that back in 2023 I (like many of us) used LLM architecture to argue that LLMs had all sorts of limitations around the kind of code they could write, the tasks they could do, the math problems they could solve. At the end of 2025, SotA LLMs have refuted most of these claims by being able to do the tasks I thought they’d never be able to do. That was a big surprise to a lot us in the industry. It still surprises me every day. The facts changed, and I changed my opinion.
So I would ask you: what kind of task do you think LLMs aren’t capable of doing, reasoning from their architecture?
I was also going to mention RL, as I think that is the key differentiator that makes the “knowledge” in the SotA LLMs right now qualitatively different from GPT2. But other posters already made that point.
This topic arouses strong reactions. I already had one poster (since apparently downvoted into oblivion) accuse me of “magical thinking” and “LLM-induced-psychosis”! And I thought I was just making the rather uncontroversial point that things may be more complicated than we all thought in 2023. For what it’s worth, I do believe LLMs probably have limitations (like they’re not going to lead to AGI and are never going to do mathematics like Terence Tao) and I also think we’re in a huge bubble and a lot of people are going to lose their shirts. But I think we all owe it to ourselves to take LLMs seriously as well. Saying “Opus 4.5 is the same thing as GPT2” isn’t really a pathway to do that, it’s just a convenient way to avoid grappling with the hard questions.
Not the person you're responding to, but I think there's a non trivial argument to make that our thoughts are just auto complete. What is the next most likely word based on what you're seeing. Ever watched a movie and guessed the plot? Or read a comment and know where it was going to go by the end?
And I know not everyone thinks in a literal stream of words all the time (I do) but I would argue that those people's brains are just using a different "token"
There's no evidence for it, nor any explanation for why it should be the case from a biological perspective. Tokens are an artifact of computer science that have no reason to exist inside humans. Human minds don't need a discrete dictionary of reality in order to model it.
Prior to LLMs, there was never any suggestion that thoughts work like autocomplete, but now people are working backwards from that conclusion based on metaphorical parallels.
There actually was quite a lot of suggestion that thoughts work like autocomplete. A lot of it was just considered niche, e.g. because the mathematical formalisms were beyond what most psychologist or even cognitive scientists would deem usefull.
Predictive coding theory was formalized back around 2010 and traces it roots up to theories by Helmholtz from 1860.
Predictive coding theory postulates that our brains are just very strong prediction machines, with multiple layers of predictive machinery, each predicting the next.
There are so many theories regarding human cognition that you can certainly find something that is close to "autocomplete". A Hopfield network, for example.
Roots of predictive coding theory extend back to 1860s.
Natalia Bekhtereva was writing about compact concept representations in the brain akin to tokens.
> There are so many theories regarding human cognition that you can certainly find something that is close to "autocomplete"
Yes, you can draw interesting parallels between anything when you're motivated to do so. My point is that this isn't parsimonious reasoning, it's working backwards from a conclusion and searching for every opportunity to fit the available evidence into a narrative that supports it.
> Roots of predictive coding theory extend back to 1860s.
This is just another example of metaphorical parallels overstating meaningful connections. Just because next-token-prediction and predictive coding have the word "predict" in common doesn't mean the two are at all related in any practical sense.
You, and OP, are taking an analogy way too far. Yes, humans have the mental capability to predict words similar to autocomplete, but obviously this is just one out of a myriad of mental capabilities typical humans have, which work regardless of text. You can predict where a ball will go if you throw it, you can reason about gravity, and so much more. It’s not just apples to oranges, not even apples to boats, it’s apples to intersubjective realities.
I feel the link between humans and autocomplete is deeper than that an ability to predict.
Think about an average dinner party conversation. Person A talks, person B thinks about something to say that fits, person C gets an association from what A and B said and speaks...
And what are people most interested in talking about? Things they read or watched during the week perhaps?
Conversations would not have had to be like this. Imagine a species from another planet who had a "conversation" where each party simply communicated what it most needed to say/was most benefitial to say and said it. And where the chance of bringing up a topic had no correlation at all with what previous person said (why should it?) or with what was in the newspapers that week. And who had no "interest" in the association game.
Humans saying they are not driven by associations is to me a bit like fish saying they are not noticing the water. At least MY thought processes works like that.
I don't think I am. To be honest, as ideas goes and I swirl it around that empty head of mine, this one ain't half bad given how much immediate resistance it generates.
Other posters already noted other reasons for it, but I will note that you are saying 'similar to autocomplete, but obviously' suggesting you recognize the shape and immediately dismissing it as not the same, because the shape you know in humans is much more evolved and co do more things. Ngl man, as arguments go, it sounds to me like supercharged autocomplete that was allowed to develop over a number of years.
Fair enough. To someone with a background in biology, it sounds like an argument made by a software engineer with no actual knowledge of cognition, psychology, biology, or any related field, jumping to misled conclusions driven only by shallow insights and their own experience in computer science.
Or in other words, this thread sure attracts a lot of armchair experts.
> with no actual knowledge of cognition, psychology, biology
... but we also need to be careful with that assertion, because humans do not understand cognition, psychology, or biology very well.
Biology is the furthest developed, but it turns out to be like physics -- superficially and usefully modelable, but fundamental mysteries remain. We have no idea how complete our models are, but they work pretty well in our standard context.
If computer engineering is downstream from physics, and cognition is downstream from biology ... well, I just don't know how certain we can be about much of anything.
> this thread sure attracts a lot of armchair experts.
"So we beat on, boats against the current, borne back ceaselessly into our priors..."
Look up predictive coding theory. According to that theory, what our brain does is in fact just autocomplete.
However, what it is doing is layered autocomplete on itself. I.e. one part is trying to predict what the other part will be producing and training itself on this kind of prediction.
What emerges from this layered level of autocompletes is what we call thought.
First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities.
Probably you believe that humans have something called intelligence, but the pressure that produced it - the likelihood of specific genetic material to replicate - it is much more tangential to intelligence than next-token-prediction.
I doubt many alien civilizations would look at us and say "not intelligent - they're just genetic information replication on steroids".
Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".
> First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities.
Invoking terms like "selection mechanism" is begging the question because it implicitly likens next-token-prediction training to natural selection, but in reality the two are so fundamentally different that the analogy only has metaphorical meaning. Even at a conceptual level, gradient descent gradually honing in on a known target is comically trivial compared to the blind filter of natural selection sorting out the chaos of chemical biology. It's like comparing legos to DNA.
> Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".
RL is still token prediction, it's just a technique for adjusting the weights to align with predictions that you can't model a loss function for in per-training. When RL rewards good output, it's increasing the statistical strength of the model for an arbitrary purpose, but ultimately what is achieved is still a brute force quadratic lookup for every token in the context.
I use enterprise LLM provided by work, working on very proprietary codebase on a semi esoteric language. My impression is it is still a very big autocompletion machine.
You still need to hand hold it all the way as it is only capable of regurgitating the tiny amount of code patterns it saw in the public. As opposed to say a Python project.
But regardless, I don’t think anyone is claiming that LLMs can magically do things that aren’t in their training data or context window. Obviously not: they can’t learn on the job and the permanent knowledge they have is frozen in during training.
As someone who still might have a '2023 take on LLMs', even though I use them often at work, where would you recommend I look to learn more about what a '2025 LLM' is, and how they operate differently?
LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.
The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.
Notice that the Rule 110 string picks out a machine, it is not itself the machine. To get computation out of it, you have to actually do computational work, i.e. compare current state, perform operations to generate subsequent state. This doesn't just automatically happen in some non-physical realm once the string is put to paper.
For someone speaking as you knew everything, you appear to know very little. Every LLM completion is a "hallucination", some of them just happen to be factually correct.
I used to teach 19th-century history, and the responses definitely sound like a Victorian-era writer. And they of course sound like writing (books and periodicals etc) rather than "chat": as other responders allude to, the fine-tuning or RL process for making them good at conversation was presumably quite different from what is used for most chatbots, and they're leaning very heavily into the pre-training texts. We don't have any living Victorians to RLHF on: we just have what they wrote.
To go a little deeper on the idea of 19th-century "chat": I did a PhD on this period and yet I would be hard-pushed to tell you what actual 19th-century conversations were like. There are plenty of literary depictions of conversation from the 19th century of presumably varying levels of accuracy, but we don't really have great direct historical sources of everyday human conversations until sound recording technology got good in the 20th century. Even good 19th-century transcripts of actual human speech tend to be from formal things like court testimony or parliamentary speeches, not everyday interactions. The vast majority of human communication in the premodern past was the spoken word, and it's almost all invisible in the historical sources.
Anyway, this is a really interesting project, and I'm looking forward to trying the models out myself!
I wonder if the historical format you might want to look at for "Chat" is letters? Definitely wordier segments, but it's at least the back and forth feel and we often have complete correspondence over long stretches from certain figures.
This would probably get easier towards the start of the 20th century ofc
Good point, informal letters might actually be a better source - AI chat is (usually) a written rather than spoken interaction after all! And we do have a lot transcribed collections of letters to train on, although they’re mostly from people who were famous or became famous, which certainly introduces some bias.
The question then would be whether to train it to respond to short prompts with longer correspondence style "letters" or to leave it up to the user to write a proper letter as a prompt. Now that would be amusing
Dear Hon. Historical LLM
I hope this letter finds you well. It is with no small urgency that I write to you seeking assistance, believing such an erudite and learned fellow as yourself should be the best one to furnish me with an answer to such a vexing question as this which I now pose to you. Pray tell, what is the capital of France?
While not specifically Victorian, couldn't we learn much from what daily conversations were like by looking at surviving oral cultures, or other relatively secluded communal pockets? I'd also say time and progress are not always equally distributed, and even within geographical regions (as the U.K.) there are likely large differences in the rate of language shifts since then, some possibly surviving well into the 20th century.
don't we have parlament transcripts? I remember something about Germany (or maybe even Prussia) developing fast script to preserve 1-to-1 what was said
I mentioned those in the post you’re replying to :)
It’s a better source for how people spoke than books etc, but it’s not really an accurate source for patterns of everyday conversation because people were making speeches rather than chatting.
Yes, but (to write the second half of your post for you!) regulation and incentives are very different in the aviation industry, because safety and planning for long-tail risks is paramount. Therefore airlines can afford to have their pilots spend thousands of hours training on manual control in various scenarios. By contrast, I don’t think the average software development org will encourage its engineers to hand-roll a sizable proportion of their code, if (still a big if) there are major productivity costs in doing so. Rushing the Next Big Feature out the door will almost always beat out long-term investment in dev training, unfortunately.
Don’t get me wrong - manual practice is in some sense the correct solution, and I plan to try and do it myself in the next decade to make sure my skills stay sharp. But I don’t see the industry broadly encouraging it, still less making it mandatory as aviation does.
Addendum: as you probably know, even in aviation, this is hard to get right. (This is sometimes called the “children of the magenta” problem, but it’s really Bainbridge again.) The most famous example is perhaps Air France Flight 447[0], where the pilots put the plane into a stall at 35,000ft when they reacted poorly after the autopilot disconnecting, and did not even realize they had stalled the plane. Of course, that crash itself led to more regulations around training in manual scenarios too.
I like this framing; I think it captures some of the key differences between engineers who are instinctively enthusiastic about AI and those who are not.
Many engineers walk a path where they start out very focussed on programming details, language choice, and elegant or clever solutions. But if you're in the game long enough, and especially if you're working in medium-to-large engineering orgs on big customer-facing projects, you usually kind of move on from it. Early in my career I learned half a dozen programming languages and prided myself on various arcane arts like metaprogramming tricks. But after a while you learn that one person's clever solution is another person's maintainability nightmare, and maybe being as boring and predictable and direct as possible in the code (if slightly more verbose) would have been better. I've maintained some systems written by very brilliant programmers who were just being too clever by half.
You also come to realize that coding skills and language choice don't matter as much as you thought, and the big issues in engineering are 1) are you solving the right problem to begin with 2) people/communication/team dynamics 3) systems architecture, in that order of importance.
And also, programming just gets a little repetitive after a while. Like you say, after a decade or so, it feels a bit like "more of the same." That goes especially for most of the programming most of us are doing most of the time in our day jobs. We don't write a lot of fancy algorithms, maybe once in a blue moon and even then you're usually better off with a library. We do CRUD apps and cookie-cutter React pages and so on and so on.
If AI coding agents fall into your lap once you've reached that particular variation of a mature stage in your engineering career, you probably welcome them as a huge time saver and a means to solve problems you care about faster. After a decade, I still love engineering, but there aren't may coding tasks I particularly relish diving into. I can usually vaguely picture the shape of the solution in my head out the gate, and actually sitting down and doing it feels rather a bore and just a lot of typing and details. Which is why it's so nice when I can kick off a Claude session to do it instead, and review the results to see if they match what I had in mind.
Don't get me wrong. I still love programming if there's just the right kind of compelling puzzle to solve (rarer and rarer these days), and I still pride myself on being able to do it well. Come the holidays I will be working through Advent of Code with no AI assistance whatsoever, just me and vim. But when January rolls around and the day job returns I'll be having Claude do all the heavy lifting once again.
I'm guessing, but I'm pretty sure you're dealing with big balls of mud which has dampened your love of coding. Where implementing something is more about solving accidental complexity and dealing with technical debts than actually doing the job.
I've seen some balls of mud, sure, but I don't think that's the essence of it. It's more like:
1) When I already have a rough picture of the solution to some programming task in my head up front, I do not particularly look forward to actually going and doing it. I've done enough programming that many things feel like a variation on something I've done before. Sometimes the task is its own reward because there is a sufficiently hard and novel puzzle to solve. Mostly it is not and it's just a matter of putting in the time. Having Claude do most of the work is perfect in those cases. I don't think this is particularly anything to do with working on a ball of mud: it applies to most kinds of work on clean well-architected projects as well.
2) I have a restless mind and I just don't find doing something that interesting anymore once I have more or less mastered it. I'd prefer to be learning some new field (currently, LLMs) rather than spending a lot of time doing something I already know how to do. This is a matter of temperament: there is nothing wrong with being content in doing a job you've mastered. It's just not me.
> 1) When I already have a rough picture of the solution to some programming task in my head up front, I do not particularly look forward to actually going and doing it.
Every time I think I have a rough picture of some solution, there's always something in the implementation that proves me wrong. Then it's reading docs and figuring whatever gotchas I've stepped into. Or where I erred in understanding the specifications. If something is that repetitive, I refactor and try to make it simple.
> I have a restless mind and I just don't find doing something that interesting anymore once I have more or less mastered it.
If I've mastered something (And I don't believe I've done so for pretty much anything), the next step is always about eliminating the tedium of interacting with that thing. Like a code generator for some framework or adding special commands to your editor for faster interaction with a project.
Or perhaps, just perhaps, the true higher-dimensional move is realizing that choice of programming language isn’t usually the critical factor in whether a project, system, or business succeeds or fails, and that obsessing over the One True Way is a trap.
It might surprise the author to learn that there are many people who:
1) Have tried lisp and clojure
2) Liked their elegance and expressiveness
3) Have read through SICP and done most of the exercises
4) Would still choose plain old boring easy-to-read always-second-best Python for 90% of use-cases (and probably Rust for the last 10%) when building a real business in the real world.
The article could really benefit from some steel-manning. Remove the cute Flatland metaphor and it is effectively arguing that lisp/clojure haven’t been universally adopted because most programmers haven’t Seen The Light in some sort of epiphany of parentheses and macros. The truth is more nuanced.
The reality of modern software development is that most people focus on languages they use for work, and developers are statistically likely to be employed at companies with large numbers of other developers.
The technical merits of languages just aren't relevant to choosing them for most developers, unless they're helping solve a people problem.
"Artisanal" languages like Lisp, and Forth can be fantastic at solving problems elegantly, but that's not the most important thing to optimize for in big organizations where a large portion of your time is spent reading code written by people you've never met who may not have known what they were doing.
Many of the tools that come from big tech are designed to ease the challenges of organizational scale. Golang enforces uniform styles so that you don't have idiosyncratic teams doing their own things. Bazel is a largely language agnostic build system, with amazing build farm support. Apple and Google have both contributed heavily to sanitizers and standard library hardening in order to detect/eliminate issues without reading the code. Facebook has poured vast resources into automatic static analysis. AWS built an entire organization around treating all their internal interfaces the same as external ones.
> "Artisanal" languages like Lisp, and Forth can be fantastic at solving problems elegantly, but that's not the most important thing to optimize for in big organizations ... Many of the tools that come from big tech are designed to ease the challenges of organizational scale.
I think the field of programming languages has grown enough that we have to start acknowledging the future of programming largely won't be in the context of what it means for devs working at large corporations. One of my favorite talks is from Amy J. Ko called A Human View of Programming [1], which argues there are many other ways to look at programming than "tool for generating business activity" and "mathematical construct", which heretofore have been the dominant views of programming languages.
Because there are so many other forms and purposes programming languages can and will take (she goes through them in the talk), so evaluating them and creating them solely on how well they are able to fit into a corporate R&D pipeline is a very narrow and short-term view of the field.
Indeed, it's been the case for a long time now that most people who write programs are not in fact professional software developers. The most used language in the world is Excel, by several orders of magnitude, and it's the opposite of everything devs say a "proper" language must be. There's something we as a field still need to learn from that.
I have very mixed feelings on this topic, starting with how you quantify and weigh something like "most used" for a programming language. To me, the claim feels almost as much a non sequitur as saying the most used building material in the western world is Legos blocks or Play-Doh...
Is the most used bridge-building technique a plank over a small culvert, or the properly engineered bridge that carries constant, multi-lane highway traffic for a century? How do we weigh the usage of resulting products into the usage of a design and production method? Should we consider the number of program users? The users X hours of usage?
Fundamentally, the software field is still just so young and we haven't teased apart the "obvious" different domains and domain rules that we have for production of different material goods. In some sense, the domains and domain rules for material goods emerge out of the connection to culture, economic roles, health, and safety aspects. Whether it falls into civil engineering, building codes, transporation rules, consumer product safety, food and drug, ...
The self-similar way that software can be composed into systems also makes it confusing to categorize. Imagine if we talked about other crafts the same way, and conflated textile manufacturing, clothing design, tailoring, costume making, wardrobe management, scripting, choreography, acting, and dancing as a single field that coordinates the visual movement of fabric on a stage.
> how you quantify and weigh something like "most used" for a programming language.
Define it as # of people who possess the knowledge and resources to effectively use said language to solve a problem they have in their actual lives.
> Fundamentally, the software field is still just so young and we haven't teased apart the "obvious" different domains and domain rules that we have for production of different material goods.
I think we're saying the same thing here from different angles. I said it's developed enough that we can see there are very different ways of doing things. You said it's young enough that we don't know all the different things there are.
As a member of the handmade community, I certainly hope that corporate constraints aren't the main future of the field. I just think it's a major part of the answer as it stands today.
no way? excel? and corporate programmers are not the majority of programmers? -- i mean im a non-corp programmer but i thought i was a special snowflake
Clojure is built on dynamic typing. This is pain. I wrote enough Python (pre-mypy), Javascript, and elisp to say this. Past certain size a dynamically typed codebase becomes needlessly hard to wrangle because of that. Hence the success of Python type annotations and Typescript.
Instead, the world should have seen the light of Hindley-Milner type systems, ML-inspired languages, immutability, or at least not sharing mutable state. Did Haskell fail? Hmm, let's look at Typescript and Rust.
Don't get me wrong, a Lisp is always a great and fun language, and you can write whatever DSL you might like on top of it. But the old joke that "a Lisp programmer knows the value of everything, and the cost of nothing" still has quite a bit of truth to it.
The reason I switched from Scheme to Common Lisp was because I wanted type checking more than I wanted hygienic macros or case-sensitive (by default) symbols.
Plenty of ways to define complex data shapes in Clojure
Spec is definitely underrated here considering it's built into the language and has a wider scope but for most people they want the intellisense experience which you can get with clj-kondo + mailli but is not built in so most teams don't use it, fair enough
I'd like to move the goal posts though and say I want flowstorm in every (any other?!) language
I can just run the program and scrub backwards and forwards through the execution and look at all the immutable values frame by frame with a high level UI with plenty of search/autocomplete options
For program understanding there's nothing better
The fact I can program against the timeline of values of my program and create custom UI on top is crazy
One of the most mind blowing demos to me was Bret Victor's inventing on principle and having a programmable reverse debugger for your language makes those demos viable
I built an emulator recently for work that replays what happens on live locally, combined with flowstorm I can go line by line and tell you exactly what happened and why, no print statements no reruns with my own custom UI customised to our apps interesting parts
This is my appeal to anyone outside of Clojure please build flowstorm for JavaScript and or Python
The design of flowstorm is definitely helped by the fact that 95% of Clojure programs are immutable but I don't think it's impossible to replicate just very difficult
On the other hand, it would be easier to add type checking to a Lisp than it was to Python or JavaScript, and I don’t know any technical reason you couldn’t. A little Googling shows it’s been experimented with several times.
But the real strength of Lisp is in the macros, the metaprogramming system. And I suspect that typing most macros properly would be a bit less trivial than even typing of complex generic types, like lenses. Not typing a macro, and only typechecking the macroexpansion would formally work, but, usability-wise, could be on par with C++ template error reporting.
I think we (the Clojure community) quickly figured out we don't really want static typing, which is a bit evident by the low uptake of Typed Clojure.
Personally I found it to A) make it a hassle for downstream consumers since your design is suddenly impacting others, because you can "lock things down" and B) have that very same effect on your own codebase, where it becomes a lot less flexible where it needs to be flexible.
Nowadays, I just use another language if I want static types, which happens sometimes but not nearly as often to say that dynamically typed languages are "dead" or whatever.
My point was that you could implement type checking with macros, not that you could type check macros. (Though that would be cool!) As opposed to having to change the language definition first (Python) or implement an entirely new compiler (TypeScript).
Certainly you can implement the typechecker with macros, but it should also work on macros, before expansion. That is, you likely want (-> ...) typechecked as written, not (only) as expanded, and typing errors reported on the non-expanded form.
Word. This is a problem of lisps in general, they loose information as the same "thing" traverse the various meta-layers that constitute the system. A parsed expression is not tied to its string, and the expansion of the expression, provided it is a macro, is not tied to the original expression. In the same vein: you can't easily find the source code of a lambda that was compiled/interpreted.
Of course you can do all of this, but you need to build it yourself: see rewrite-clj. If you want to build a clojure debugger that is able to display or refer to code with the same indentation the programmer is dealing with in his text editor, you need to bridge the gap between clojure expressiosn and their string representation.
Anyway I concur that reversible macros would be great. Tag the output, have those tags propagate to the input by playing the macro backwards. Complex stuff really. That's a job for category theory I guess.
If we approach the question as engineers, scientifically, with numbers and studies, not anecdotes and hand-waving, then Clojure is hands down the best language in terms of productivity and bug reduction.
To this day, I know of no study that was able to demonstrate superiority of statically-typed languages - [1].
What studies clearly show, is that both in terms of productivity [2] and bug reduction [3], expressivity reigns supreme.
And Clojure is the most expressive [4] out of languages that can leverage huge ecosystems (Java and JS, soon C++ through Jank dialect).
> Clojure is built on dynamic typing. This is pain. I wrote enough Python (pre-mypy), Javascript, and elisp to say this.
Probably not an absolute truth, but definitely a personal truth for you. For me, it's pretty much the opposite, static/fixed types is such a pain when you just wanna solve a problem and you know how to achieve it, all the invariants/constraints but the language tells you "No, you know what, this other person said you cannot use X for Y, so I'm gonna say no" instead of just letting me do that thing.
With that said, I still reach for Rust for about ~30% of new projects, despite the types, because some languages fit other problems better, simple as that. And still a lot more contracting gigs available for various Rust codebases who've fallen into disrepair, so one does what one can.
I feel like big codebases regardless of their size are hard to wrangle not because of the languages used, but because of the programmers having to rush through building proper abstractions, or even considering not adding so much abstractions. I've seen awful heavily typed codebases as much as I've seen awful dynamically types codebases or awful codebases not using explicit types anywhere, to me there seems to be no correlation between "awful" and "number of explicit types used".
Personally, I prefer a big codebase with lots of (good) unit tests in a dynamic program, than a that same big codebase with no unit tests and explicit static typing everywhere, especially when refactoring and needing to ensure everything (from a business logic perspective) works correctly. But again, this is my personal truth, and I'm not trying to claim it's universal.
"It might surprise the author to learn that there are many people who:
1) Have tried lisp and clojure
2) Liked their elegance and expressiveness
3) Have read through SICP and done most of the exercises
4) Would still choose plain old boring easy-to-read always-second-best Python for 90% of use-cases (and probably Rust for the last 10%) when building a real business in the real world."
This is me to a T — even when I'm building hobby projects. The point of writing any code, for me, is most of all to see a certain idea to fruition, so I choose what will make me most productive getting where I want to go. And while I still worship at the altar of Common Lisp as an incredibly good language, the language matters much less than the libraries, ecosystem, and documentation for productivity (or even effective DSL style abstraction level!), so eventually I have had to make my peace with Python, TypeScript, and Rust.
Tacking on, part of seeing it to fruition, and continued lifetime, is to ensure you can communicate the intent and operation to a large group of potential successors and co-workers.
An incredible epiphany that you can't transmit may not be as useful as a a moderately clever idea you can.
I think the missing piece is that "more expressive" languages do not automatically create more value at the team or company level.
Languages like Lisp, Clojure, Rust, Haskell, Erlang give strong engineers room to build powerful abstractions, but they also increase cognitive load and ramp up cost. In most orgs with churn and constant hiring, you do not get to enjoy "we built great abstractions and now we are fast". You live in "someone new is trying to understand what the last person did".
That is why hand holding and guard rails win. Not because Python or similar are technically superior, but because they support a commoditised, fungible workforce. Even if a wizard in a high dimension language is 2x more productive, that does not necessarily beat a slightly larger team in a mainstream language once you factor in turnover and ramp up. Companies mostly optimise for business impact, predictable delivery, and ease of staffing, not for maximising the ceiling of the top few programmers.
That said, at the individual level, as a programmer, you definitely can benefit from learning and mastering those added dimensions, even if you are to never use them again professionally, they expand your mindset.
> Languages like Lisp, Clojure, Rust, Haskell, Erlang give strong engineers room to build powerful abstractions ... You live in "someone new is trying to understand what the last person did".
In a language like C++, Rust or even Haskell, those "powerful abstractions" are mostly about building libraries, not application code. You still benefit from powerful libraries that were built as part of the ecosystem, even while keeping your own application code simple and intuitive to ease onboarding of new coders. Part of the "powerful abstractions" is to enable a "hand-holding and guard rails" approach for application code that's going to interface with that full-featured library.
There are several languages that I could use and be economically successful with, but I refuse to use because I consider them to be poorly designed.
Using a bad language for 8 hours a day makes me irritable and it's impossible to prevent that irritability from overflowing into my interactions with other people. I'd rather that my conversations with the computer be joyful ones.
Most of the time when someone adds these fancy languages what happens is that they leave and the ones left are the ones that have to deal with the shit that was produced.
I'm going through this now, having to deal with code nobody wants to touch because it is overly complex, has no documentation, and is in a language no one else knows. Now, whenever i see an effort like this, to bring an exoteric language for absolutely no good reason, i try to kill it as fast as possible.
I don't want to be the victim of this code in the future or have my team bear the cost of maintaining stuff they don't understand.
Sadly, much as I love Forth, it's kind of the same thing. It's an awesome language and it's a great way to bring up bare metal to a functional state, but who does that these days?
I could probably include Forth as a scripting language in a bigger app, but that app is probably going to want more complex variables than machine word size ints, and fixed-length strings. So, oh dear, Forth's not a great fit for that, and everyone just uses Lua anyway, so Lua it is.
Which is a pity, because I like Forth, and I used to to create possibly the nerdiest project on Github. I like Forth a lot, and I'd encourage anyone curious about how you get from "chunk of thinking sand and copper" to "thing I can type commands in" to have a crack at it - it's easy enough to implement your own, just to see how it's done.
But I don't expect anyone else to jump up and like it too, just because I said it's cool.
I ported Forth to a 1980s sampler, so that you could plug a MIDI cable in and run a special terminal program to write Forth programs on its 6809 processor.
It boots off a floppy disk, so it was really just a case of working out where in the ROM and OS disassembly the entry points were, and making it all fit around the assumptions the ROM makes.
This allowed me to make up a diagnostics disk to check the RAM (a whopping 128kB of sample RAM) and IO are working.
yes. and as a long time lisper, i don't think that it's the macros.
i think lisp's magic is a lot more cultural than most people think. i.e. how lispnicks implement lisps and the ecosystem around it. how easy it is to walk the entire ladder of abstractions from machine code to project specific DSL's. how pluggable its parsing pipeline is -- something that is not even exposed in most languages, let alone customizable.
the language, the foundation, of course matters. but i think to a lesser extent than what people think. (hence the trend of trying to hire lispnicks to hard, but non-lisp positions?)
and it's not even an obviously good culture... (just how abrasive common lispers are? need to have a thick skin if you ask a stupid question... or that grumpy, pervasive spirit of the lone wolf...?)
maybe it's just a peculiar filter that gets together peculiar people who think and write code in peculiar ways.
maybe it's not the macros, but the patterns in personality traits of the people who end up at lisp?
While what you say is true (I’ve used Lisps for 40 years and here I am writing Rust), the people who consciously make that choice are a tiny niche. There are vastly more people who don’t and can’t make that choice because they don’t have 1-3. So the empirical evidence for what’s actually critical is pretty slim.
> The article could really benefit from some steel-manning. Remove the cute Flatland metaphor and it is effectively arguing that lisp/clojure haven’t been universally adopted because most programmers haven’t Seen The Light in some sort of epiphany of parentheses and macros. The truth is more nuanced.
The talk I posted from Alan Kay is the steel man. I think you've missed the essence of TFA because it's not really about Clojure or lisp.
You may need to explain more? I don’t think I missed the big idea - the metaphor of a separate plane or higher dimension that contains ideas not expressible in the ordinary one is a nice metaphor, and does apply well to some things (Kuhn’s paradigms in history of science come to mind, e.g. Newtonian Mechanics versus Relativity). I just don’t think it really applies well here. What business concepts or thoughts can you express in Clojure that you can’t express in Python or Rust?
Not GP, but … because the overwhelming majority of programming is done in support of businesses selling things?
I’m not just talking about people who program for a living. The majority of academic CS chooses its research directions because of what limits people are running into for business; even privacy-focused software has been commoditized by many business; a large amount of OSS development is by (and often paid for by the employers of) people working for money; heck, after Linus’s initial “just a hobby OS” period even Linux’s contribution base was primarily driven by business needs (even if those needs influenced it via “contributor had a problem at work and committed a solution for it upstream in spare time” as often as “paid contributor upstreamed a change on behalf of their employer”).
Yes and no. Most of the big new languages today are created to support the business of selling things because languages are expensive to make, they don't generate any profit themselves, so the only people who have enough money to fund their development are mega corporations, who act in self-interested ways.
But look at historical languages and why they were created:
Algol - to explore writing algorithms
Fortran - to help scientists write programs using typical math formulas
Matlab - to help write programs in linear algebra
Haskell - to explore lazy program evaluation
ML - to explore how to reason about proof automatically
C - to build an OS
Python - to interface with an OS
LISP - to formalize symbol processing
APL - to explore programs defined over arrays
LOGO - to help young kids to program computers
Prolog - to create a language around the idea of formal logic.
Smalltalk - to create an entire programming system, not just a language
(I've left out C++, Java, and JavaScript because I feel like those languages are mostly about serving business interests)
Pretty much the entire computing landscape over the past 50-70 years has been defined by people writing languages for reasons other than "this is for a business to use to make more money". So if we let business-driven interest dictate the future direction, we will have missed out on all the things that could have been. Would Haskell ever have been invented if businesses interests were the only concern for researchers?
Fortran compilers were historically implemented by hardware vendors in order to sell their hardware, and this still largely holds true across the surviving implementations with the exceptions of GNU Fortran (obviously) and nagfor (commercial s/w product). There's a good reason that Cray Research's software group was initially part of its marketing department.
The post isn't about Clojure or Lisp, it's about the author's journey as a programmer, and the mind-bending effect learning a Lisp had on their development. They're still in the midst of figuring it out, but a lot of people have been on this path before them. TFA has been written over the years by various authors about Prolog, or Haskell, or Smalltalk. In my case I would have written it about Lucid.
The interesting bit here isn't related to Clojure or Lisp, that's what people are chewing on because it's the surface level topic of the essay. The thing that interests me about this post is how it touches on the psychedelic nature of learning programming languages and what that does to one's perspective as a programmer.
So when you ask "what business thoughts can you express in the language", my response is it's not about what you can express, but more about "what new thoughts / ways of thinking has the experience of learning the language caused you to become aware of?".
Few people can go their whole lives writing Python and think all the possible thoughts there are to think about the shapes and forms programming can take. It's hard to develop a good sense for your own practice of programming if you never step outside and see it from other perspectives. It often takes exposure to completely new languages with different design points and abstractions to really give one perspective about their own practice.
The easiest one related to lisp is just the form of the syntax, which is surprising to many students. Most programmers don't even consider you can write (+ 1 1) and that's the same thing as (1 + 1). They don't think about the pros, or the cons, or why one might be better than the other, because every language they've used and ever will use writes it as (1 + 1). But as soon as they see Lisp, they immediately see something that changes their perspective about everything they have previous learned, and therefore will reshape how they approach programming in the future. It doesn't have to mean they will use Lisp going forward, but it does mean they will program in their language of choice with greater purpose. That's how we each hone our craft.
Add on homoiconicity, read/eval, programming as manipulating an AST, and meta programming, and you've got yourself a righteous trip.
Now, can people learn about those things from other sources, without encountering Lisp? Of course. Can you express those ideas in mainstream languages? Yes. But the point is many devs don't think those thoughts until they've had a psychedelic experience, it's very common, that we need to consider that it's part of one's growth journey as a programmer to have these experiences, so we should encourage it. The author of TFA doesn't have experience enough to make that point, as they are on the journey, which is why I brought Alan Kay's talk into context, since his perspective is from the other end of the journey.
That's a bit of an ad feminam attack, isn't it? Just because I used the phrase "business concepts", somehow money is the only thing I care about when it comes to language choice? And yet, in my top-level post I said I went and learned lisp and clojure and read SCIP, and I will add that I did both of those things for fun. So no, I don't only think of programming languages as a way to make money. Elegance and expressiveness are interesting for their own sake. I trained as a mathematician; of course I think that.
But TFA was riffing on Paul Graham's old essay Beating the Averages, which argued precisely that the expressiveness of Lisp gave his startup a business edge. That was the context of my comment. I'd add that most of what most of us do in our day jobs is to use programming languages to make money, and there's no shame in that at all. And if you want to talk about why certain languages get widespread adoption and others not, you have to talk about the corporate context: there is no way around it.
But I'll rephrase my question, just for you: "what abstract problems can you solve or thoughts can you express in Clojure that you can’t express in Python or Rust?"
Oh, I didn’t say GP was making a sexist attack. If I thought that I would have said it clearly and distinctly. Ad feminam is just the feminine version of ad hominem. I like Latin.
I’m sympathetic to looking down on the obsession with money. But there’s something deep and important about the monetary element. Engineering is about solving real-world, practical problems. The cost is a real factor in whether a potential solution is a useful one.
I think the money question is a red herring here. I’d phrase it more like: what problem in a user’s problem space is expressible only like this? And if the only user is the programmer, that’s alright, but feels more aligned with pure academia. That’s important, too! But has a much smaller audience than engineering at large.
some people only think about life as a way to make money. unfortunately coding was best-in-slot career for too long and these kinds of people hijacked the culture.
Yeah, I thought of this first as well. There is nothing that hammers home the point that the past was a horrible place better than childhood mortality statistics. I’m surprised the author of the article didn’t mention it, given all her focus on families - I mean, good for her for realizing she didn’t understand what life in the past was really like, but she still seems a little focused on “it wasn’t cute” rather than the really big differences.
If that would not be enough, any lack of medical care could be another. 10% chance of dying for every birth for the mother. Flu, any tooth ache, appendix inflammation or any more severe cut would be easily deadly for young and old.
Everybody had tons of parasites and smelled horribly including royalty, think working out hard daily and wearing the same cloth, bathing once a year (maybe). Freedom we consider a basic human right was basically unheard of, everybody was a prisoner of some form of somebody else.
I agree on all counts except for irregular bathing among elites, which was more varied with cultures in the past: largely true in early modern Europe, but the upper classes in Imperial Rome bathed pretty much daily and probably didn’t smell too bad.
To the list I would add: a group of horrible diseases (smallpox above all, which killed about a billion people throughout history) that vaccines largely pushed to the margins, at least until recently.
People sometimes say that people in the past would have been familiar with the idea that mortality is high and therefore fine when half their children died. While there would have been cultural rituals in these cases, it seems like there is reasonable evidence (epitaphs, cultural practices eaves-drip burials or stillborn baptisms, etc) that the loss was still very dearly felt and so people’s lives were just much worse.
In an ideal world, juniors would all do this too, but I don’t blame them if they don’t. So it’s very important to do it if you have the social capital.
reply