Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful youāll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cutānāpaste it into its own post ā thereās no quota for posting and the bar really isnāt that high.
The post Xitter web has spawned soo many āesotericā right wing freaks, but thereās no appropriate sneer-space for them. Iām talking redscare-ish, reality challenged āculture criticsā who write about everything but understand nothing. Iām talking about reply-guys who make the same 6 tweets about the same 3 subjects. Theyāre inescapable at this point, yet I donāt see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldnāt be surgeons because they didnāt believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I canāt escape them, I would love to sneer at them.
(Credit and/or blame to David Gerard for starting this.)
In terms of writing bots to play Pokemon specifically (which given the prompting and custom tools written I think is the most fair comparison)⦠not very well⦠according to this reddit comment a bot from 11 years ago can beat the game in 2 hours and was written with about 7.5K lines of LUA, while an open source LLM scaffold for playing Pokemon relatively similar to claudeās or geminiās is 4.8k lines (and still missing many of the tools Gemini had by the end, and Gemini took weeks of constant play instead of 2 hours).
So basically it takes about the same number of lines written to do a much much worse job. Pokebot probably required relatively more skill to implement⦠but OTOH, Geminiās scaffold took thousands of dollars in API calls to trial and error develop and run. So you can write bots from scratch that substantially outperform LLM agent for moderately more programming effort and substantially less overall cost.
In terms of gameplay with reinforcement learning⦠still not very well. Iāve watched this video before on using RL directly on pixel output (with just a touch of memory hacking to set the rewards), it uses substantially less compute than LLMs playing pokemon and the resulting trained NN benefits from all previous training. The developer hadnāt gotten it to play through the whole game⦠probably a few more tweaks to the reward function might manage a lot more progress? OTOH, LLMs playing pokemon benefit from being able to more directly use NPC dialog (even if their CoT āreasoningā often goes on erroneous tangents or completely batshit leaps of logic), while the RL approach is almost outright blind⦠a big problem the RL approach might run into is backtracking in the later stages since they use reward of exploration to drive the model forward. OTOH, the LLMs also had a lot of problems with backtracking.
My (wildly optimistic by sneerclubbing standards) expectations for āLLM agentsā is that people figure out how to use them as a ācreativeā component in more conventional bots and AI approaches, where a more conventional bot prompts the LLM for āplansā which it uses when it gets stuck. AlphaGeometry2 is a good demonstration of this, it solved 42/50 problems with a hybrid neurosymbolic and LLM approach, but it is notable it could solve 16 problems with just the symbolic portion without the LLM portion, so the LLM is contributing some, but the actual rigorous verification is handled by the symbolic AI.
Cool thanks for doing the effort post.
This was my feeling a bit how it was used basically in security fields already, with a less focus on the conventional bots/ai. Where they use the LLMs for some things still. But hard to spread fact from PR, and some of the things they say they do seem to be like it isnāt a great fit for LLMs, esp considering what I heard from people who are not in the hype train. (The example coming to mind is using LLMs to standardize some sort of reporting/test writing, while I heard from somebody I trust who has seen people try that and had it fail as it couldnāt keep a consistent standard).
curious about this reference - wdym?
āwe use LLMs for X in our security productsā gets brought up a lot in the risky business podcast promotional parts basically, and it sometimes leaks into the other parts as well. That is basically the times I hear people speak somewhat positively about it. Where they use LLMs (or claim to use) for various things, some I thought were possible but iffy, some impossible, like having LLMs do massive amounts of organizational work. Sorry I canāt recall the specifics. (Iām also behind atm).
Never heard people speak positively about it from the people I know, but they also know Iām not that positive about AI, so the likelyhood they just avoid the subject is non-zero.
E: Schneier is also not totally against the use of llms for example. https://www.schneier.com/blog/archives/2025/05/privacy-for-agentic-ai.html quite disappointed. (Also as with all security related blogs nowadays, dont read the comments, people have lost their minds, it always was iffy, but the last few years every security related blog that reaches some fame is filled with madmen).
Ah, I donāt listen to riskybiz because ugh podcast
Schneierās a dipshit well past his prime, though. people should stop listening to that ossified doorstop
Ow yeah, I donāt disagree on that, even if I do keep up with them. Just making my sources obvious. (One of the ticks I do find valuable from the Rationalists, the verbosity and tendency to try and over explain isnāt as valuable, but hard to shift (and the one feeds in the other (and⦠im doing it again aināt I?))).
thatās fair - the first half of my post was certainly more about me than anything (but was also an indication as to why I donāt hear that particular angle much - Iāve also ensured I get as little as possible advertising in my life)
other part: nah still, fuck schneier
rest of your comment reminds me of the tact filters post (albeit in a different angle)
Tact filter is prob what went wrong with the Lawyer person discussed elsewhere, I never had heard about it (or had forgotten).