• mo_ztt ✅
    link
    fedilink
    English
    42
    edit-2
    6 months ago

    This glosses over what is, to me, the most fascinating part of the history.

    SVN was, along with its proprietary contemporaries like Perforce, a pain in the ass as far as branching and merging and collaborative workflow across more that one branch of development or one machine installation. It worked, kind of, but for something at the scale of the Linux kernel it was simply unmanageable. The distributed nature and scale and speed of Linux kernel development, and the cantankerousness of most of the kernel team, meant that they weren’t willing or able to put up with any available solution while their project was progressively scaling up. And therefore, the version control system used by the team for one of the most advanced pieces of technology in the world for quite a lot of its development, was… diff and patch. And email. Linus just kept separate directories for separate versions, with the names of the directories corresponding to version numbers, and every single change that needed to go into the kernel would get emailed to Linus (or emailed around developers who were collaborating on “a branch”), and people applied diffs manually in order to pull changes.

    That sounds like a joke, but it actually made more sense than anything else they could do. The workflow of a pull request, with no necessity to maintain a shared source repository which was centrally updated, simply didn’t exist in any version control system at the time. diff and patch and email could do that, and it was quick and easy for small changes and possible (with some discipline) even for large changes. Everyone sort of knew that it wasn’t ideal, but it was literally better than anything else available.

    So, bitkeeper. It was a little weird for such an open-source darling to switch to a proprietary VCS, but its creator and CEO Larry McVoy was a respected kernel developer himself, and he was basically offering something for free that would be a significant step forward. Bitkeeper supported a lot of modes of distributed operation that were unique among VCSs at the time, informed by McVoy’s own firsthand experience with working on the kernel alongside everyone else, and so it was generally regarded as a step forward technologically. From around 2002 to 2005, the kernel used bitkeeper as its primary source control.

    It was a little bit of an uneasy peace, though. Some kernel developers were unhappy with having the source and all revisions “held hostage” within a proprietary VCS, and wanted to have the ability not to use Bitkeeper. McVoy, for his part, didn’t do himself any favors by being kind of a pain in the ass at times. A central point of contention was his objection to people “reverse engineering” elements of the BitKeeper protocol. Some developers wanted to be able to interoperate with bk without having to license McVoy’s software. McVoy said they weren’t allowed to, and said they could have a license to his software for free so what was the big deal. Things came to a head when Andrew Tridgell, a developer who did have access to bk, was accused of “reverse engineering” and had his license (and hence his ability to work) removed. Everyone got mad, arguing about Bitkeeper substantially eclipsed useful work on the kernel, and the need for a better solution could no longer be ignored. (It was later revealed that Tridgell’s “reverse engineering” was to access the Bitkeeper endpoint with telnet and type “help”.)

    Linus, in his inimitable fashion, decided to solve the problem by putting his head down to create the solution and then dictatorially deciding that his solution was going to be the way going forward. It may not be in people’s memory now as clearly as it was back then, but he was (and is) a genius wizard-man at systems design and architecture, and obviously his word was basically the word of God on significant matters within the kernel community. Basically, he took a lot of the concepts that made BitKeeper work, and put them within a lot more clean and general architecture, spent a few weeks making a working prototype with a lot of feedback and testing by the kernel team (which by this point absolutely badly needed the whole thing to work just so they could do their day-to-day development again without any fistfights), and then said, we’re switching to this, everyone install it and shut up and stop fighting please. Over the succeeding years it was so clearly leaps and bounds ahead of any other VCS that more or less everyone in the world switched to it.

    Moral of the story? Let people be. If McVoy had been a little more accommodating, maybe we’d have BitKeeperHub today, and he’d have been able to collect license fees from half the developers on the planet because everyone wanted to use his still-superior-to-everything-that-exists solution.

    (edit: Added the details about Andrew Tridgell’s reverse engineering)

    • Martin
      link
      fedilink
      176 months ago

      This comment was a better read than the linked article.

      • mo_ztt ✅
        link
        fedilink
        English
        11
        edit-2
        6 months ago

        I know right? I was all excited when I saw the OP article because I was like, oh cool, someone’s telling the story about this neat little piece of computing history. Then I went and read it and it was like “ChatGPT please tell me about the history of source control in a fairly boring style, with one short paragraph devoted to each applicable piece of technology.”

    • TechNom (nobody)
      link
      English
      2
      edit-2
      6 months ago

      Nicely written! Going into my bookmarks.

      SVN was, along with its proprietary contemporaries like Perforce, a pain in the ass as far as branching and merging and collaborative workflow across more that one branch of development or one machine installation.

      I’m one of those who was unfortunate enough to use SVN. It was my first version control too. Committing takes forever - to fetch data from the server to see if the ‘trunk’ (the branch) was updated (I see why Torvalds hated it). Even committing caused conflicts sometimes. People also used to avoid branching, because merging back was hell! There was practically no merges that didn’t end in merge conflicts. The biggest advance of merge workflow from those days was the introduction of 3-way merges. Practically all modern VCSs - including git and mercurial - use 3-way merge. 3-way merges cut down merge conflicts by a huge margin overnight. Git even uses 3-way merges even for seemingly unrelated tasks like revert, cherrypick, rebase, etc (and it works well for them. We barely even notice it!). Surprisingly though, 3-way merges were around since 1970s (the diff3 program). Why CVS and SVN didn’t use it is beyond me.

      Anyway, my next VCS was Bazaar. It’s still around as Breezy. It is a distributed VCS like Git, but was more similar to SVN in its interface. It was fun - but I moved on to Mercurial before settling with Git. Honestly, Mercurial was the sweetest VCS I ever tried. Git interface really shows the fact that it is created by kernel developers for kernel developers (more on this later). Mercurial interface, on the other hand is well thought out and easy to figure out. This is surprising because both Git and Mercurial share a similar model of revisions. Mercurial was born a few days after Git. It even stood a chance for winning the race to become the dominant VCS. But Mercurial lost kernel developers’ mindshare due to Python - it simply wasn’t as fast as Git. Then GitHub happened and the rest is history.

      And therefore, the version control system used by the team for one of the most advanced pieces of technology in the world for quite a lot of its development, was… diff and patch. And email

      That sounds like a joke, but it actually made more sense than anything else they could do.

      I was a relatively late adopter of Git. But the signs of what you say are still there in Git. Git is still unapologetically based on the idea of keeping versioned folders and patches. Git actually has a dual personality based on these! While it treats commits as a snapshot of history (similar to versioned folders), many operations are actually based on patches (3-way merges actually). That includes merging, rebase, revert, cherrypick, etc. There’s no getting around this fact. IMHO, the lack of understanding of this fact is what makes Git confusing for beginners.

      Perhaps this is no more apparent than in the case of quilt. Quilt is a software that is used to manage a ‘stack of patches’. It gives you the ability to absorb changes to source code into a patch and apply or remove a set of patches. This is as close you can get to a VCS without being a VCS. Kernel devs still use quilt sometimes and exchange quilt patch stacks. Git even has a command for importing quilt patch stacks - git-quiltimport. There are even tools that integrate patch stacks into Git - like stgit. If you haven’t tried it yet, you should. It’s hard to predict if you’ll like it. But if you do, it becomes a powerful tool in your arsenal. It’s like rebase on steroids. (aside: This functionality is built into mercurial).

      diff and patch and email could do that, and it was quick and easy for small changes and possible (with some discipline) even for large changes. Everyone sort of knew that it wasn’t ideal, but it was literally better than anything else available.

      I recently got into packaging for Linux. Trust me - there’s nothing as easy or convenient as dealing with patches. It’s closer to plain vanilla files than any VCS ever was.

      Some kernel developers were unhappy with having the source and all revisions “held hostage” within a proprietary VCS

      As I understand, the biggest problem was that not everyone was given equal access. Most significantly, many developers didn’t have access to the repo metadata. The metadata that was necessary to perform things like blame, bisect or even diffs.

      As best I remember, things came to a head when a couple members of the former group actually had their licenses pulled because McVoy said they had broken the agreement by “reverse engineering” his protocols, with people disagreeing over whether what they’d done actually fit that description, and the whole thing blew up completely with people arguing and some people unable to work.

      That sounds accurate. To add more context, it was Andrew Tridgell who ‘reverse engineered’ it. He became the target of Torvald’s ire due to this. He did reveal his ‘reverse engineering’ later. He telnetted into the server and typed ‘help’.

      Linus, in his inimitable fashion, decided to solve the problem by putting his head down to create a solution and then dictatorially deciding that this was going to be the way going forward.

      I thought I should mention Junio Hamano. He was probably the second biggest contributor to git back then. Torvalds practically handed over the development of git to him a few months after its inception. Hamano has been the lead maintainer ever since. There is one aspect of his leadership that I really like. Git by no means is a simple or easy tool. There has been ample criticisms of it. Yet, the git team has tried sincerely to address them without hostility. Some of the earlier warts were satisfactorily resolved in later versions (for example, restore and switch are way nicer than checkout).

      • mo_ztt ✅
        link
        fedilink
        English
        3
        edit-2
        6 months ago

        I’m one of those who was unfortunate enough to use SVN.

        Same. I guess I’m an old guy, because I literally started with RCS, then the big step up that was CVS, and then used CVS for quite some time while it was the standard. SVN was always ass. I can’t even really put my finger on what was so bad about it; I just remember it being an unpleasant experience, for all it was supposed to “fix” the difficulties with CVS. I much preferred CVS. Perforce was fine, and used basically the exact same model as SVN just with some polish, so I think the issue was the performance and interface.

        Also, my god, you gave me flashbacks to the days when a merge conflict would dump the details of the conflict into your source file and you’d have to go in and clean it up manually in the editor. I’d forgotten about that. It wasn’t pleasant.

        Git interface really shows the fact that it is created by kernel developers for kernel developers (more on this later).

        Yeah, absolutely. I was going to talk about this a little but my thing was already long. The two most notable features of git are its high performance and its incredibly cryptic interface, and knowing the history makes it make a lot of sense why that is.

        Mercurial interface, on the other hand is well thought out and easy to figure out. This is surprising because both Git and Mercurial share a similar model of revisions. Mercurial was born a few days after Git. It even stood a chance for winning the race to become the dominant VCS. But Mercurial lost kernel developers’ mindshare due to Python - it simply wasn’t as fast as Git.

        Yeah. I was present on the linux-kernel mailing list while all this was going on, purely as a fanboy, and I remember Linus’s fanatical attention to performance as a key consideration at every stage. I actually remember there was some level of skepticism about the philosophy of “just download the whole history from the beginning of time to your local machine if you want to do anything” – like the time and space requirements in order to do that probably wouldn’t be feasible for a massive source tree with a long history. Now that it’s reality, it doesn’t seem weird, but at the time it seemed like a pretty outlandish approach, because with the VCS technologies that existed at the time it would have been murder. But, the kernel developers are not lacking in engineering capabilities, and clean design and several rounds of optimization to figure out clever ways to tighten things up made it work fine, and now it’s normal.

        Perhaps this is no more apparent than in the case of quilt. Quilt is a software that is used to manage a ‘stack of patches’. It gives you the ability to absorb changes to source code into a patch and apply or remove a set of patches. This is as close you can get to a VCS without being a VCS. Kernel devs still use quilt sometimes and exchange quilt patch stacks. Git even has a command for importing quilt patch stacks - git-quiltimport. There are even tools that integrate patch stacks into Git - like stgit. If you haven’t tried it yet, you should. It’s hard to predict if you’ll like it. But if you do, it becomes a powerful tool in your arsenal. It’s like rebase on steroids. (aside: This functionality is built into mercurial).

        That’s cool. Yeah, I’ll look into it; I have no need of it for any real work I’m doing right now but it sounds like a good tool to be familiar with.

        I still remember the days of big changes to the kernel being sent to the mailing list as massive series of organized patchsets (like 20 or more messages with each one having a pretty nontrivial patchset to implement some piece of the change), with each patch set as a conceptually distinct change, so you could review them one at a time and at the end understand the whole huge change from start to finish and apply it to your tree if you wanted to. Stuff like that was why I read the mailing list; I just remember being in awe of the type of engineering chops and the diligence applied to everyone working together that was on display.

        I recently got into packaging for Linux. Trust me - there’s nothing as easy or convenient as dealing with patches. It’s closer to plain vanilla files than any VCS ever was.

        Agreed. I was a little critical-sounding of diff and patch as a system, but honestly patches are great; there’s a reason they used that system for so long.

        As I understand, the biggest problem was that not everyone was given equal access. Most significantly, many developers didn’t have access to the repo metadata. The metadata that was necessary to perform things like blame, bisect or even diffs.

        Sounds right. It sounds like your memory on it is better than mine, but I remember there being some sort of “export” where people who didn’t want to use bk could look at the kernel source tree as a linear sequence of commits (i.e. not really making it clear what had happened if someone merged together two sequences of commits that had been developed separately for a while). It wasn’t good enough to do necessary work, more just a stopgap if someone needed to check out the current development head or something, and that’s it.

        That sounds accurate. To add more context, it was Andrew Tridgell who ‘reverse engineered’ it. He became the target of Torvald’s ire due to this. He did reveal his ‘reverse engineering’ later. He telnetted into the server and typed ‘help’.

        😆

        I’ll update my comment to reflect this history, since I didn’t remember this level of detail.

    • aard
      link
      fedilink
      16 months ago

      There also were other distributed VCS around - with arch being available not too long after bitkeeper - but they all typically worked only for some styles of working, and pretty much all of them ran into massive performance issued once the codebasw got large.

      • mo_ztt ✅
        link
        fedilink
        English
        36 months ago

        Yeah. Seeing the development thought process at work during the engineering of git was really cool. The philosophy was basically, at its core it’s not a version control system. It’s a content-addressable filesystem. That’s what you need in order to build a good distributed version control, so we’ll make two layers and make each one individually very good at what it does. Then in a UI sense, the idea was to give you the tools to be able to do needed operations easily, but still expose the underlying stuff if you need direct access to it. And then to optimize the whole thing to within an inch of its life under the types of workloads it’ll probably be experiencing when being used as version control.

        It was also, as far as I’m aware, the first nontrivial use of something like a blockchain. The property where each commit is referred to by its hash, and the hash encompasses the hash of the previous commit, was a necessary step for security and an obvious-in-retrospect way to identify commits in a unique way.

        Basically the combination of innovative design with a bunch of core concepts that weren’t really commonly in use at the time, combined with excellent engineering to make it all solid and working well, was pretty mind-blowing to see, and it all came together in just a few weeks and started to get used for real in a big sense. Then, the revolution accomplished, Linus handed git off to someone else and everyone just got back to work on the kernel.

        • aard
          link
          fedilink
          26 months ago

          It really was just the (very solid) foundation he knocked out in that time - the ui was horrible back then. Linus just did what was required to get the kernel unstuck.

          I started moving the first of my own CVS repos to git in late 2007, and it wasn’t ready for the average user at that time yet.

          Linus handing it off quickly was the right thing to do, though - otherwise we all might be using something else nowadays, with just the kernel and a handful of projects with similar requirements using it. Many great developers would’ve wanted to hold on to their baby in that situation, preventing it from growing to its full potential.

  • @mrkite
    link
    16 months ago

    Back before it was awful, sourceforge required your code to be in CVS and then later svn.