CPython's VM is broken...

ChubakPDP11+TakeWithGrainOfSalt · edit-2 3 months ago

CPython's VM is broken...

@Corbin · 3 months ago

I’ve only skimmed the paper, so let me know if I’ve missed something, ideally with a page number. Also, it’s late and I’m tired, so I’m not hyperlinking anything; sorry.

I’m not sure what a “full semantic analysis” entails, but always keep Rice’s theorem in mind: there aren’t any interesting semantic analyses available for Turing-complete systems.

Python is a descendant of Smalltalk. Like several of its cousins, particularly the famous ECMAScript, Python doesn’t have types or classes in the Smalltalk sense, but prototypes which form a class-like hierarchy. From the static-analysis point of view, whether a type is created or instantiated is a matter of Rice’s theorem.

The ability to invoke type() at runtime is not lazy. Python is eager and strict; even generators are eager and strict, although they can cause stack frames to become “stale”; whether a stale stack frame is cleaned up is also a matter of Rice’s theorem.

None of this prevents compilation of Python. The RPython toolchain first imports an application, evaluating all calls to type() and pre-building all classes; then, it statically analyzes all of the Python objects in memory and decompiles their bytecode to determine their behaviors. The resulting executable behaves as if it were started from a snapshot of the Python heap.

Yes, CPython sucks. Use PyPy instead; also, use cffi to wrap C libraries.

ChubakPDP11+TakeWithGrainOfSalt · 3 months ago

I’m a simp for Alan Kay and I goon to his SnappChat, but I subscribe to Andrew Appel’s OnlyFans (along with many others), so I did not look much into SmallTalk when it comes to ‘70s languages’. I guess I should do that. Thanks. I could have never put two and two together to realize Python uses prototypes. This blows my mind. Funny thing is just the other day I wrote JavaScript’s grammar. https://gist.github.com/Chubek/0ab33e40b01a029a7195326e89646ec5

I guess I still got a lot to learn so better get moving. I guess by ‘full semantic analysis’ I meant do a a full type analysis ‘before’ you emit the bytecode, not after. What is the protocol here exactly? I have seen several variants and supersets of Python that do an ML-style type analysis. They achieve it via the `NAME [ ‘:’ TYCON ]’ syntax so the regular Python interpeter would still work.

So thanks. Learned something with your post.

@[email protected] · 3 months ago

Historically Python has done no semantic analysis at all, and as far as I know CPython still ignores type annotations except for checking their syntax and (I think) checking that type expressions can be evaluated as regular expressions. It’s also one if the slowest languages around, and it used to be much worse in the 1.x days. The only actual declarations are global and nonlocal, unless they’ve added something else recently. Everything else that looks like a declaration is actually a statement executed for its side effects. The super function used to only be callable with two arguments, because automatically supplying self and the lexically enclosing class was considered too magical.

If you’re looking for something like Java or C#, Python isn’t for you. It was designed for use cases like fancy scripts and small applications that aren’t CPU bound. It’s about as dynamic as a language can be, meaning it’s possible to break almost any analysis you might do with a call to eval, and a lot of what you’d expect to be core language primitives, like accessing a field of an object, can execute arbitrary code.

@Corbin · 3 months ago

There are Python compilers which do AST analysis instead of bytecode analysis, particularly Nuitka and Shed Skin. They aren’t very good, but it’s not clear whether that’s because working with the AST is somehow harder than working with the bytecode. RPython doesn’t compile all bytecodes; most generator/coroutine functionality is missing, for example.

Think of type-checking as a syntactic analysis; this is how it avoids Rice’s theorem. Like you say, we can annotate names with type information, and we can do it without evaluating the code. The main problem here is that Python’s semantics don’t require these annotations to enforce the types of values; you may be interested in E, a research language from the 90s which did enforce type annotations on otherwise-untyped names. In Python, this doesn’t error:

>>> x :int = "42"

But in E, this does error:

? def x :int := "42"
# problem: <ClassCastException: String doesn't coerce to an int>

Sadly, E is long dead, and something of an archeological artifact rather than a usable system. But it may be inspiring to your future efforts, especially since it sounds like you’re learning how to build compilers. (I helped write Monte, a language which blends E and Python; it is also dead, but was more enjoyable than E.)

ChubakPDP11+TakeWithGrainOfSalt · 3 months ago

Why did you use a ? as a prompt for E, but a >>> as a prompt for Python? I know CPython uses >>> in its termio prompt (and I don’t know how they brought that to Windows?) but why would have E used ??

@Corbin · 3 months ago

I copied and pasted from the terminal to ensure that I formatted the error message properly. The question-mark prompt is what E used, or at least E-on-Java. Monte used a little Unicode mountain:

⛰  currentProcess.getProcessID() :Int
Result: 2805098
⛰  def x :Int := "42"
Exception: "42" does not conform to Int
⛰  "42" :Int
Exception: "42" does not conform to Int

I can’t really give a reason other than that the prompt characters on Unix-like systems are arbitrary and most REPL libraries allow them to be customized.

Turun · 3 months ago

Yes, python is between java (long compiles, faster execution) and perl (no compiles, slower execution). No, it won’t change that, because it’s pretty successful the way it is now.

And yes, cffi is pretty dang sweet. Tons of scientific research runs on Numpy and pandas.