Introducing Otterkit COBOL

Given that this is a community about the COBOL programming language, I’d like to take the opportunity to make a post about this project that I’m a part of. Our goal is to create a compiler implementing the ISO 2023 standard of the COBOL language. If you’re confused/interested in what that means, please read further.

Why a new COBOL Compiler?

It is often believed that COBOL is an antiquated and archaic language: the logo of this community is literally a dinosaur. But this is not true. Did you that as of the most recent version (ISO 2023), the language has:

  • objects and classes
  • generics
  • concurrent async both locally and remotely (message passing)

Sounds more like a Java or C# than a fossil, doesn’t it?

As the “2023” in “ISO 2023” implies, the language has been evolving ever since it was created in the '50s. But why is the reputation of this language so bad? Firstly, it is that most code in COBOL adheres to the old 1985 standard: that was when the GNU manifesto was first published! This means that the language has been functionally stuck in the public eye for decades, as enterprise systems see little reason to put effort into modernization. This leads to a self-fulfilling prophecy, where COBOL programmers are assigned to tangles of technical debt and even FOSS compilers like GnuCOBOL target the 1985 standard because it’s the one that’s used. But it doesn’t have to be this way.

A Vision of the Future

It is our belief in the Otterkit Project team that modern COBOL, once free of propriety vendor lock-in and outdated stereotypes, has the potential to be a modern - nay, insightful language that deserves a place in the current programming language landscape. That’s why we’re making an Apache 2.0-licensed COBOL compiler on the .net platform to bring modern COBOL out in the open. This way, we hope to prove that even dinosaurs can walk again.

We would appreciate any help we can get: below are links to a presentation team head KT made on the project for the .net youtube channel, and a link to the github repo. Please take some time to look around, and if it strikes your fancy please consider contributing with either code or money, any bit helps.

Useful Links

Github Repo: https://github.com/otterkit/otterkit

Presentation: https://www.youtube.com/live/UASkE7cojSE?feature=share

  • KTSnowyM
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 years ago

    Hi, I’m Otterkit’s lead developer, I’ll add a few more reasons why you might want to choose COBOL.

    COBOL takes a different and often unique approach to common problems that in my opinion is more efficient and elegant than most other approaches.

    The way COBOL handles strings is in my opinion much more efficient than C#, Java or even Rust. In most newer languages strings are implemented as objects (C# and Java), which means that they’ll later need to be garbage collected at some point, which then leads to nondeterministic memory usage and forces it to always be heap allocated. In Rust they are (usually) heap allocated structs, and you have to deal with the borrow checker and the complexity Rust brings to the overall project.

    In COBOL, strings are built-in primitive types, not an object or struct at all. This brings a couple of nice compiler optimization opportunities:

    • COBOL strings are not subject to, and do not need to be garbage collected. The compiler will handle its memory for you, this applies to both fixed and dynamic length strings.
    • Being primitive types, the compiler is free to choose the most appropriate allocation strategy for a particular string, and in some cases (fixed length strings) can allocate it completely on the stack which leads to it being automatically freed after returning from a function or method.
    • Dynamic lengths strings, where the length can change at runtime, are still available, and yes they are still a primitive type. The compiler will handle the allocations for you, and without involving the garbage collector. Because they are not subject to GC, the compiler is free to copy the contents into a new bigger buffer, update all references to it, and immediately free the older buffer.
    • This means that COBOL strings can have deterministic memory usage, no waiting until the GC decides to run, and without the use and complexity of a borrow checker.

    The way COBOL handles concurrency is much safer, more efficient, and in situations where you need both local and remote communication, much easier and painless to use. While other languages opted for multithreaded concurrency, where a single process runs on multiple threads, COBOL opted for multiprocess concurrency, where multiple processes run concurrently, sharing data through message passing. COBOL processes have a global per machine Message Control System process, which handles all the message routing between both local and remote processes.

    • Because the language has this safer concurrency model built-in, you never have to worry about race conditions, locks, mutex, or other thread safety precautions and issues. Your messages are all handled by the MCS, this includes the sending, receiving and storing of the messages (if needed). User code never has to worry about thread safety.
    • Otterkit’s implementation uses Unix domain sockets for fast local message passing, and while you could argue that this might be slower, in practice the MCS will be sending messages between multiple processes concurrently without needing locks or a mutex everywhere on user code to ensure thread safety, which can be faster in the end.

    A few other minor things that COBOL handles more efficiently or better than other languages:

    • String formatting is amazingly simple, including any numeric formatting.
    • Standard decimal arithmetic. Most languages either don’t have it at all, or use a non-standard decimal representation. C# has its own weird format, Java uses a slow arbitrary precision implementation, and Rust doesn’t have one at all.
    • Multi-paradigm is handled better, you don’t need a public static class and a GC (C#, Java) to write procedural code, if all you need are functions and don’t need any objects at all, you can write COBOL in a purely procedural way. On the other hand, if what you need are classes and objects with a GC to safely handle memory for you, then you can write COBOL is an OO way without stressing over a borrow checker or manual memory allocation (Rust, C). You can mix the two in a single codebase, and the compiler is free to optimize purely procedural code to not require a GC at all.
    • It has a boolean primitive type (and a literal for it) that is optimized for bit operations with a user-defined size, the type only accepts bit values (1s or 0s). In contrast with other languages where bit shifting and boolean operations have to be performed on fixed size integer types, which can lead to error prone code by accidentally using an integer or literal of incorrect size (0U vs 0UL vs 0ULL in C). This also makes it overall easier to process binary data directly.
    • Declarative error handling. You don’t need a try..catch, if err != nil or similar things everywhere in the middle of you code. You can define “declaratives” that run whenever an exception or error occurs, and as soon as it occurs your program will jump into the declarative for that particular error to handle it, and then if the program continues or not is defined by the user. You can easily use and import these into your source code without it becoming messy.