Introducing Otterkit COBOL

Given that this is a community about the COBOL programming language, I’d like to take the opportunity to make a post about this project that I’m a part of. Our goal is to create a compiler implementing the ISO 2023 standard of the COBOL language. If you’re confused/interested in what that means, please read further.

Why a new COBOL Compiler?

It is often believed that COBOL is an antiquated and archaic language: the logo of this community is literally a dinosaur. But this is not true. Did you that as of the most recent version (ISO 2023), the language has:

  • objects and classes
  • generics
  • concurrent async both locally and remotely (message passing)

Sounds more like a Java or C# than a fossil, doesn’t it?

As the “2023” in “ISO 2023” implies, the language has been evolving ever since it was created in the '50s. But why is the reputation of this language so bad? Firstly, it is that most code in COBOL adheres to the old 1985 standard: that was when the GNU manifesto was first published! This means that the language has been functionally stuck in the public eye for decades, as enterprise systems see little reason to put effort into modernization. This leads to a self-fulfilling prophecy, where COBOL programmers are assigned to tangles of technical debt and even FOSS compilers like GnuCOBOL target the 1985 standard because it’s the one that’s used. But it doesn’t have to be this way.

A Vision of the Future

It is our belief in the Otterkit Project team that modern COBOL, once free of propriety vendor lock-in and outdated stereotypes, has the potential to be a modern - nay, insightful language that deserves a place in the current programming language landscape. That’s why we’re making an Apache 2.0-licensed COBOL compiler on the .net platform to bring modern COBOL out in the open. This way, we hope to prove that even dinosaurs can walk again.

We would appreciate any help we can get: below are links to a presentation team head KT made on the project for the .net youtube channel, and a link to the github repo. Please take some time to look around, and if it strikes your fancy please consider contributing with either code or money, any bit helps.

Useful Links

Github Repo: https://github.com/otterkit/otterkit

Presentation: https://www.youtube.com/live/UASkE7cojSE?feature=share

    • RadioRavenRideOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      Thank you for putting our project on the beginner list! Regarding the topic of the logo, I personally think that’s quite alright to acknowledge that COBOL is a weird and old language with a rich history going all the way back to Grace Hopper and even Flow-matic before her. However, the problem I have with the dinosaur imagery is that they are in fact dead because they could not adapt to the changes in their environment (besides the feathered ones of course). If you’re in the market for a mascot animal I would suggest the humble Horseshoe Crab, whose biological order spans back to the Ordovician Period but still exists today.

  • zygo_histo_morpheus
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    What’s the pitch for using modern COBOL? Sure, I’ll take your word that it’s not nearly as bad as COBOL 1985, but that is true of most languages!

    Sounds more like a Java or C# than a fossil, doesn’t it?

    If I want to write in a language that’s like Java or C# I can always just write in Java or C#!

    • RadioRavenRideOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      Great question. Because of its unique spot in computing history, COBOL takes a different approach as compared to other languages like the C or Lisp families, and we’d be here forever trying to figure out which differences are good or bad. So for now I will list a few benefits that COBOL may have for a project compared to other languages:

      • COBOL is based on English, which means that while it may be a bit harder to write, it’s a little bit easier to read, which may help bring developers up to speed as a project grows in size. For example, the keyword ADD only refers to adding numbers, so there’s no ambiguity there
      • COBOL programs are split into different divisions and sections, which means that the variables, functions, I/O, and other aspects are in predictable locations and just a ctrl+F away. Combined with the above point, this means that COBOL programs are less likely to become messy over time.
      • COBOL natively supports the fixed-decimal format for numbers, which, unlike floating-point numbers, does not lose precision for very large and very small numbers. This is very useful in use cases where numbers must be very exact, like banking. Hence why the banking system still runs on COBOL.
      • In my opinion, COBOL’s message-passing model is superior to the thread-sharing model in terms of safety, which would otherwise require something like a borrow checker to stop race conditions and the like

      I hope that answered your most pressing questions.

    • KTSnowyM
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      Hi, I’m Otterkit’s lead developer, I’ll add a few more reasons why you might want to choose COBOL.

      COBOL takes a different and often unique approach to common problems that in my opinion is more efficient and elegant than most other approaches.

      The way COBOL handles strings is in my opinion much more efficient than C#, Java or even Rust. In most newer languages strings are implemented as objects (C# and Java), which means that they’ll later need to be garbage collected at some point, which then leads to nondeterministic memory usage and forces it to always be heap allocated. In Rust they are (usually) heap allocated structs, and you have to deal with the borrow checker and the complexity Rust brings to the overall project.

      In COBOL, strings are built-in primitive types, not an object or struct at all. This brings a couple of nice compiler optimization opportunities:

      • COBOL strings are not subject to, and do not need to be garbage collected. The compiler will handle its memory for you, this applies to both fixed and dynamic length strings.
      • Being primitive types, the compiler is free to choose the most appropriate allocation strategy for a particular string, and in some cases (fixed length strings) can allocate it completely on the stack which leads to it being automatically freed after returning from a function or method.
      • Dynamic lengths strings, where the length can change at runtime, are still available, and yes they are still a primitive type. The compiler will handle the allocations for you, and without involving the garbage collector. Because they are not subject to GC, the compiler is free to copy the contents into a new bigger buffer, update all references to it, and immediately free the older buffer.
      • This means that COBOL strings can have deterministic memory usage, no waiting until the GC decides to run, and without the use and complexity of a borrow checker.

      The way COBOL handles concurrency is much safer, more efficient, and in situations where you need both local and remote communication, much easier and painless to use. While other languages opted for multithreaded concurrency, where a single process runs on multiple threads, COBOL opted for multiprocess concurrency, where multiple processes run concurrently, sharing data through message passing. COBOL processes have a global per machine Message Control System process, which handles all the message routing between both local and remote processes.

      • Because the language has this safer concurrency model built-in, you never have to worry about race conditions, locks, mutex, or other thread safety precautions and issues. Your messages are all handled by the MCS, this includes the sending, receiving and storing of the messages (if needed). User code never has to worry about thread safety.
      • Otterkit’s implementation uses Unix domain sockets for fast local message passing, and while you could argue that this might be slower, in practice the MCS will be sending messages between multiple processes concurrently without needing locks or a mutex everywhere on user code to ensure thread safety, which can be faster in the end.

      A few other minor things that COBOL handles more efficiently or better than other languages:

      • String formatting is amazingly simple, including any numeric formatting.
      • Standard decimal arithmetic. Most languages either don’t have it at all, or use a non-standard decimal representation. C# has its own weird format, Java uses a slow arbitrary precision implementation, and Rust doesn’t have one at all.
      • Multi-paradigm is handled better, you don’t need a public static class and a GC (C#, Java) to write procedural code, if all you need are functions and don’t need any objects at all, you can write COBOL in a purely procedural way. On the other hand, if what you need are classes and objects with a GC to safely handle memory for you, then you can write COBOL is an OO way without stressing over a borrow checker or manual memory allocation (Rust, C). You can mix the two in a single codebase, and the compiler is free to optimize purely procedural code to not require a GC at all.
      • It has a boolean primitive type (and a literal for it) that is optimized for bit operations with a user-defined size, the type only accepts bit values (1s or 0s). In contrast with other languages where bit shifting and boolean operations have to be performed on fixed size integer types, which can lead to error prone code by accidentally using an integer or literal of incorrect size (0U vs 0UL vs 0ULL in C). This also makes it overall easier to process binary data directly.
      • Declarative error handling. You don’t need a try..catch, if err != nil or similar things everywhere in the middle of you code. You can define “declaratives” that run whenever an exception or error occurs, and as soon as it occurs your program will jump into the declarative for that particular error to handle it, and then if the program continues or not is defined by the user. You can easily use and import these into your source code without it becoming messy.