This paper: https://cs.brown.edu/~sk/Publications/Papers/Published/pmmwplck-python-full-monty/paper.pdf

… has been out for several years now, and the CPython authors don’t seem to be taking any heed from it. The question one’s faced when viewing the inner-workings of CPython’s VM is:

Is Python a lazy language, or is it not? Should types and symbols be resolved through VM, or semantic analysis? Should there be explicit tree-building and DAG number-value optimization, or just shit out the bytecode?

Because the VM seems to build classes on-the-go [list of opcodes]. I am not pretending, and I don’t pretend, that I know enough about this, but would it be not better if they did a full semantic analysis, then emitted the bytecodes? So this way, the execution would be faster, albeit whilst introducing small lags for a more loaded semantic analysis?

Of course, the answer is clear: Python may not officially be a lazy language, but it virtually is one. class syntax, as the paper says, is a syntactic sugar around type with tree arguments. type with three arguments is invoked during runtime, it would be rather stupid, and slow to do semantics on a runtime function right!? So classes are not ‘really’ classes!

For further clearity, this:

cls = type("Cls", (), { "foo": "baar" })

is equal to this:

class Cls:
   foo = "bar"

They might have looked at this paper, and said ‘nah, don’t fix what’s broken’ and this exact attitude that Python community has, from top to bottom, is why I have not used it in about 2 years, and unless paid handsomely, won’t use it in any projects.

I believe Python needs to decide if it’s an scripting language, a cross-platform juggernaut like Java is, or is it what it exactly is, a piece of crap hyped out to high heavens!

These are my opinions, I don’t think I am educated enough for these to be facts. But look through your heart, compare CPython’s VM opcodes with JVM’s opcodes. JVM is a full register machine (whereas Python is a stack machine), with low-level opcodes designed to get things done fast and portable. It has an infrasturcture, and an echosystem. Several languages run on it, hell even Python itself runs on it!

Sadly, because that dang C FFI is so sweet, CPython seems to be de facto the Python implementation. And Python is not even badly specified like Perl is. I prefer a highly non-orthogonal language like Perl for scripting any day of the week. I use Perl a lot for preprocessing C source files, or just using it as AWK replacement. Is Python supposed to be that? Or Java? Decide goddamit.

So what we get from this is, Python is a simple AWK-ascended UNIX scripting language that lazy people have made into de facto Java! lol

Again, I am not very educated on this matter, please don’t take my opinion as facts. I just made this thread to share this nice paper and a bit of trivia.

Thanks.

  • @Corbin
    link
    English
    113 months ago

    I’ve only skimmed the paper, so let me know if I’ve missed something, ideally with a page number. Also, it’s late and I’m tired, so I’m not hyperlinking anything; sorry.

    I’m not sure what a “full semantic analysis” entails, but always keep Rice’s theorem in mind: there aren’t any interesting semantic analyses available for Turing-complete systems.

    Python is a descendant of Smalltalk. Like several of its cousins, particularly the famous ECMAScript, Python doesn’t have types or classes in the Smalltalk sense, but prototypes which form a class-like hierarchy. From the static-analysis point of view, whether a type is created or instantiated is a matter of Rice’s theorem.

    The ability to invoke type() at runtime is not lazy. Python is eager and strict; even generators are eager and strict, although they can cause stack frames to become “stale”; whether a stale stack frame is cleaned up is also a matter of Rice’s theorem.

    None of this prevents compilation of Python. The RPython toolchain first imports an application, evaluating all calls to type() and pre-building all classes; then, it statically analyzes all of the Python objects in memory and decompiles their bytecode to determine their behaviors. The resulting executable behaves as if it were started from a snapshot of the Python heap.

    Yes, CPython sucks. Use PyPy instead; also, use cffi to wrap C libraries.

    • ChubakPDP11+TakeWithGrainOfSaltOP
      link
      33 months ago

      I’m a simp for Alan Kay and I goon to his SnappChat, but I subscribe to Andrew Appel’s OnlyFans (along with many others), so I did not look much into SmallTalk when it comes to ‘70s languages’. I guess I should do that. Thanks. I could have never put two and two together to realize Python uses prototypes. This blows my mind. Funny thing is just the other day I wrote JavaScript’s grammar. https://gist.github.com/Chubek/0ab33e40b01a029a7195326e89646ec5

      I guess I still got a lot to learn so better get moving. I guess by ‘full semantic analysis’ I meant do a a full type analysis ‘before’ you emit the bytecode, not after. What is the protocol here exactly? I have seen several variants and supersets of Python that do an ML-style type analysis. They achieve it via the `NAME [ ‘:’ TYCON ]’ syntax so the regular Python interpeter would still work.

      So thanks. Learned something with your post.

      • @[email protected]
        link
        fedilink
        English
        33 months ago

        Historically Python has done no semantic analysis at all, and as far as I know CPython still ignores type annotations except for checking their syntax and (I think) checking that type expressions can be evaluated as regular expressions. It’s also one if the slowest languages around, and it used to be much worse in the 1.x days. The only actual declarations are global and nonlocal, unless they’ve added something else recently. Everything else that looks like a declaration is actually a statement executed for its side effects. The super function used to only be callable with two arguments, because automatically supplying self and the lexically enclosing class was considered too magical.

        If you’re looking for something like Java or C#, Python isn’t for you. It was designed for use cases like fancy scripts and small applications that aren’t CPU bound. It’s about as dynamic as a language can be, meaning it’s possible to break almost any analysis you might do with a call to eval, and a lot of what you’d expect to be core language primitives, like accessing a field of an object, can execute arbitrary code.

      • @Corbin
        link
        English
        23 months ago

        There are Python compilers which do AST analysis instead of bytecode analysis, particularly Nuitka and Shed Skin. They aren’t very good, but it’s not clear whether that’s because working with the AST is somehow harder than working with the bytecode. RPython doesn’t compile all bytecodes; most generator/coroutine functionality is missing, for example.

        Think of type-checking as a syntactic analysis; this is how it avoids Rice’s theorem. Like you say, we can annotate names with type information, and we can do it without evaluating the code. The main problem here is that Python’s semantics don’t require these annotations to enforce the types of values; you may be interested in E, a research language from the 90s which did enforce type annotations on otherwise-untyped names. In Python, this doesn’t error:

        >>> x :int = "42"
        

        But in E, this does error:

        ? def x :int := "42"
        # problem: <ClassCastException: String doesn't coerce to an int>
        

        Sadly, E is long dead, and something of an archeological artifact rather than a usable system. But it may be inspiring to your future efforts, especially since it sounds like you’re learning how to build compilers. (I helped write Monte, a language which blends E and Python; it is also dead, but was more enjoyable than E.)

        • ChubakPDP11+TakeWithGrainOfSaltOP
          link
          23 months ago

          Why did you use a ? as a prompt for E, but a >>> as a prompt for Python? I know CPython uses >>> in its termio prompt (and I don’t know how they brought that to Windows?) but why would have E used ??

          • @Corbin
            link
            English
            13 months ago

            I copied and pasted from the terminal to ensure that I formatted the error message properly. The question-mark prompt is what E used, or at least E-on-Java. Monte used a little Unicode mountain:

            ⛰  currentProcess.getProcessID() :Int
            Result: 2805098
            ⛰  def x :Int := "42"
            Exception: "42" does not conform to Int
            ⛰  "42" :Int
            Exception: "42" does not conform to Int
            

            I can’t really give a reason other than that the prompt characters on Unix-like systems are arbitrary and most REPL libraries allow them to be customized.

  • Turun
    link
    fedilink
    33 months ago

    Yes, python is between java (long compiles, faster execution) and perl (no compiles, slower execution). No, it won’t change that, because it’s pretty successful the way it is now.

    And yes, cffi is pretty dang sweet. Tons of scientific research runs on Numpy and pandas.