cross-posted from: https://programming.dev/post/8121669

Taggart (@mttaggart) writes:

Japan determines copyright doesn’t apply to LLM/ML training data.

On a global scale, Japan’s move adds a twist to the regulation debate. Current discussions have focused on a “rogue nation” scenario where a less developed country might disregard a global framework to gain an advantage. But with Japan, we see a different dynamic. The world’s third-largest economy is saying it won’t hinder AI research and development. Plus, it’s prepared to leverage this new technology to compete directly with the West.

I am going to live in the sea.

www.biia.com/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/

  • @[email protected]
    link
    fedilink
    English
    1226 months ago

    Nice, time to train one with all the Nintendo leaks and generate some Zelda art and a new Mario title!

    • ZickZack
      link
      fedilink
      856 months ago

      train one with all the Nintendo leaks

      This is fine

      generate some Zelda art and a new Mario title

      This is copyright infringement.

      The ruling in japan (and as I predict also in other countries) is that the act of training a model (which is just a statistical estimator) is not copyrightable, so cannot be copyright infringement. This is already standard practice for everything else: You cannot copyright a mathematical function, regardless of how much data you use to fit to it (that is sensible: CERN has fit physics models to petabytes worth of data, that doesn’t mean they hold a copyright on laws of nature, they just hold the copyright on the data itself). However, if you generate something that is copyrighted, that item is still copyrighted: It doesn’t matter whether you used an AI image generator, photoshop, or a tattoo gun.

        • ZickZack
          link
          fedilink
          36 months ago

          And that would be completely legal, just like any random guy on deviantart can draw something in the style of e.g. Picasso without getting into trouble (unless of course they claim it was painted by picasso, but that should be obvious).

    • @[email protected]
      link
      fedilink
      English
      156 months ago

      Nintendo would have coup the government if the decision made this scenario actually possible.

  • @[email protected]
    link
    fedilink
    English
    60
    edit-2
    6 months ago

    I think this is a difficult concept to tackle, but the main argument I see about using existing works as ‘training data’ is the idea that ‘everything is a remix’.

    I, as a human, can paint an exact copy of a Picasso work or any other artist. This is not illegal and I have no need of a license to do this. I definitely don’t need a license to paint something ‘in the style of Picasso’, and I can definitely sell it with my own name on it.

    But the question is, what about when a computer does the same thing? What is the difference? Speed? Scale? Anyone can view a picture of the Mona Lisa at any time and make their own painting of it. You can’t use the image of the Mona Lisa without accreditation and licensing, but what about a recreation of the Mona Lisa?

    I’m not really arguing pro-AI here, although it may sound like it. I’ve just heard the ‘licensing’ argument many times and I’d really like to hear what the difference between a human copying and a computer copying are, if someone knows more about the law.

    • @[email protected]
      link
      fedilink
      English
      50
      edit-2
      6 months ago

      Um - your examples are so old the copyright expired centuries ago. Of course you can copy them. And you can absolutely use an image of the Mona Lisa without accreditation or licensing.

      Painting and selling an exact copy of a recent work, such as Banksy, is a crime.

      … however making an exact copy of Banksy for personal use, or to learn, or to teach other people, or copying the style… that’s all perfectly legal.

      I don’t think think this is a black and white issue. Using AI to copy something might be a crime. You absolutely can use it to infringe on copyright. The real question is who’s at fault? I would argue the person who asked the AI to create the copy is at fault - not the company running the servers.

        • @[email protected]
          link
          fedilink
          English
          9
          edit-2
          6 months ago

          Huh? What does being non profit have to do with it? Private companies are allowed to learn from copyrighted work. Microsoft and Apple, for example, look at each other’s software and copy ideas (not code, just ideas) all the time. The fact Linux is non-profit doesn’t give them any additional rights or protection.

        • @[email protected]
          link
          fedilink
          English
          46 months ago

          They’re not gatekeeping llms though, there are publicly available models and data sets.

      • @[email protected]
        link
        fedilink
        English
        7
        edit-2
        6 months ago

        Thanks for your response. I realize I muddied the waters on my question by mentioning exact copies.

        My real question is based on the ‘everything is a remix’ idea. I can create a work ‘in the style of Banksy’ and sell it. The US copyright and trademark laws state that a work only has to be 10% differentiated from the original in order to be legal to use, so creating a piece of work that ‘looks like it could have been created by Banksy, but was not created by Banksy’ is legal.

        So since most AI does not create exact copies, this is where I find the licensing argument possibly weak. I really haven’t seen AI like MidJourney creating exact replicas of works - but admittedly, I am not following every single piece of art created on Midjourney, or Stable Diffusion, or DALL-E, or any of the other platforms, and I’m not an expert in the trademarking laws to the extent I can answer these questions.

        • @[email protected]
          link
          fedilink
          English
          9
          edit-2
          6 months ago

          Thanks for your response

          Always happy to discuss copyright. :-) Our IP laws are long overdue for an overhaul in my opinion. And the only way to make that happen is for as many people as possible to discuss the issues. I plan to spend the rest of my life creating copyrighted work, and I really hope I don’t spend all of it under the current rules…

          The US copyright and trademark laws state that a work only has to be 10% differentiated from the original in order to be legal to use

          The law doesn’t say that.The Blurred Lines copyright case for example was far less than 10%. Probably less than 1%, and it was still unclear if it was infringement or not. It took five years of lawsuits to reach an unclear conclusion where the first court found it to be infringing then an appeals panel of judges reached a split decision where the majority of them found it to be non-infringing.

          Copyright is incredibly complex and unclear. It’s generally best to just not get into a copyright lawsuit in the first place. Usually when someone accuses you of copyright infringement you try to pay them whatever amount of money (in the Blurred Lines case, there were discussions of 50% of the artist’s income from the song) to make them go away even if your lawyers tell you you’re probably going to get a not guilty verdict.

        • Jojo
          link
          fedilink
          English
          26 months ago

          I really haven’t seen AI like MidJourney creating exact replicas of works

          I don’t have a source to cite, but I did read an article that showed a bad faith actor deliberately trying to use ai to copy images directly, and while the results weren’t exact replicas, they were reasonable facsimiles of the original, to the extent that if a human has created it without ai, it would have been blatant copyright infringement, despite not being quite identical.

          I wish I had the examples on hand to show, but it was months ago, and unfortunately I have not the skills nor time to retrieve it.

      • @[email protected]
        link
        fedilink
        English
        0
        edit-2
        6 months ago

        To be at fault the user would have to know the AI creation they distributed commits copyright infringement. How can you tell? Is everyone doing months of research to be vaguely sure it’s not like someone else’s work?

        Even if you had an AI trained on only public domain assets you could still end up putting in the words that generate something copyrighted.

        Companies created a random copyright infringement tool for users to randomly infringe copyright.

        • redfellow
          link
          fedilink
          English
          8
          edit-2
          6 months ago

          The same way you can tell if you repainted a Banksy yourself. If you don’t realize, and monetize, then you are liable for a copyright lawsuit regardless of the way you created the piece in question.

          And if noone can detect similarities beyond influences, then it’s not infringing anything.

          • @[email protected]
            link
            fedilink
            English
            -2
            edit-2
            5 months ago

            You may recognize a Banksy but to another it’s like I said you aught to know your work is like one from Coinsey: who?

            This is exasperated when people can create creative works via AI, having even less knowledge about your peers who know how to DIY. A potentially life-ruining lawsuit is a bad system to find out you can’t monetize something.

            • redfellow
              link
              fedilink
              English
              46 months ago

              If only there was some way to find out prior to selling stuff as if you made it. If only. Darn it!

              • @[email protected]
                link
                fedilink
                English
                -16 months ago

                I don’t understand. If I make something that doesn’t mean I’m not infringing someone’s works.

                • redfellow
                  link
                  fedilink
                  English
                  2
                  edit-2
                  6 months ago

                  Point: regardless of the HOW it was made, the process of figuring if it infringes on something is the same. It’s still not always easy and due to the shittyness of current IP laws, even long time professional artists sometimes make mistakes.

                  In the end it’s just about money.

      • @[email protected]
        link
        fedilink
        English
        -46 months ago

        Your example is a dude who paints unsolicited on other people’s property. What kind of copyright does a ghost have?

        • Jojo
          link
          fedilink
          English
          26 months ago

          A surprising amount, though it would potentially be quite difficult to prove.

    • @[email protected]
      link
      fedilink
      English
      86 months ago

      Here’s the thing… Generative AI had a plagiarism/remix phase. It raised some serious questions about copyright

      It lasted for a matter of weeks.

      We’re all still stuck up on it, but go to civit.ai

      Play with it. Look at what people are creating.

      If you’re not convinced, put up a bounty for something extremely specific

      Art has changed. There’s no putting it back in the bottle, this is the tiniest leading edge of the singularity

      • Camelbeard
        link
        fedilink
        English
        16 months ago

        Just a small warning, I just played around with civit. Tried to make some Images, also wanted to try to make some nsfw images. Anyway be really careful what you prompt, I accidentally generated some images with very young people I never intended.

    • @ericjmoreyOP
      link
      English
      186 months ago

      Or it leads the way in producing the most useless, misleading bullshit more efficiently. We’ll see.

      • @[email protected]
        link
        fedilink
        English
        36 months ago

        Not sure this is the flex you think it is. The US health industry utilizes fax to send client health information millions of times a day, and it is considered a secure communication.

        • @[email protected]
          link
          fedilink
          English
          14
          edit-2
          6 months ago

          I don’t think you realize how boomer Japan is regarding technology in the new Millennium. Their industry tech is always on the curve (especially robotics), but their lifestyle tech is just…god it was like going back 10-15 years in time. They still had as many flip phone commercial and plans as smartphone ones back when I was living there in 2018. Stores in Ginza, one of the most expensive places in Japan, would have “cash only” signs because they didn’t want to learn how to set up a card machine. The older population has really been holding them back.

          They’ve had to digitize a lot of stuff due to Covid (thank god) but me and most people I knew were issued actual paper paycheques we’d have to physically take to the bank for payday. The lines at the bank on the 15th or 25th of the month in Tokyo were something else.

  • @[email protected]
    link
    fedilink
    English
    26
    edit-2
    6 months ago

    What’s stopping somebody from making an LLM that can reproduce media that was used in its training with close to 100% accuracy? If that happens, then we’ll have a copyright laundering service.

    • @[email protected]
      link
      fedilink
      English
      296 months ago

      Reproducing copywrited works would be a problem. Consuming them is not.

      In your example, a copyright case would be able to move forward and be tested in court. I would think it stands as good of a shot at prevailing in that example. It would be the same as a case against someone who wrote a script for a website to reproduce copyrighted work on command. The difference is this isn’t that. And if and when it does that, the ai can be tuned to prevent it from continuing to do it.

      • @[email protected]
        link
        fedilink
        English
        06 months ago

        Hi chatgpt7, I like legend of Zelda tears of the second kingdom, please code a similar game but change the colour of the grass from light green to medium light green.

        • @[email protected]
          link
          fedilink
          English
          15 months ago

          Again, that’s producing a copyrighted work. That would be illegal. That isn’t the same as inputting the code into the LLM to use as a reference for when someone asks for help coding movement mechanics for a 3rd person action game of their own imagination

    • @[email protected]
      link
      fedilink
      English
      186 months ago

      If you make it reproduce copyrighted media, it is a problem.

      As long as the stuff it generates doesn’t resemble any copyrighted works, even if it was trained on copyrighted works, I don’t see why that should be problem.

      • @[email protected]
        link
        fedilink
        English
        26 months ago

        I don’t even think there’s a problem recreating it, you just can’t distribute it.

        For personal use it’s fine.

        Its not like Disney is suing everyone drawing micky mouse in their personal art workbook

    • @[email protected]
      link
      fedilink
      English
      126 months ago

      What media is an LLM going to be able to reproduce that I can’t already reproduce with a copy paste?

      • @[email protected]
        link
        fedilink
        English
        -146 months ago

        That’s not the point. If you rip a dvd, you babe the movie, but you can’t sell DVDs with the movie, because it is copyrighted. After the “AI” has recreated it, the copyright is gone, so you can sell that version with impunity.

    • @[email protected]
      link
      fedilink
      English
      66 months ago

      Copyright infringement is about the act of reproduction, not the tools used to reproduce it. The court effectively said the LLM itself is not illegal just like a photocopier or CD/DVD burner is not illegal. It’s illegal if someone used an LLM, or photocopier, to make an unauthorized copy of a protected work though.

    • @[email protected]
      link
      fedilink
      English
      46 months ago

      It will go to a judge and the judge will say that changing three pixels doesn’t make it derivative. Regardless of the method of transformation, the same fair use and parody laws apply.

      • Camelbeard
        link
        fedilink
        English
        216 months ago

        If you read a book you can talk about it, quote it, draw characters from that book, write your own ending, etc.

        Isn’t that kind of the same? Let’s say some day we have an AI with near human intelligence, why can’t the AI be trained on copyright works, just like humans, all our school books are copyrighted works?

          • Camelbeard
            link
            fedilink
            English
            66 months ago

            So if AI companies pay for a book or music (like a consumer) it’s no problem? Because I don’t think this is about paying for content, it’s that content holders refuse to work with AI companies.

            • @[email protected]
              link
              fedilink
              English
              66 months ago

              Unironically yes, if AI companies paid for training data everyone would be much happier.

              I sincerely doubt that NOBODY is willing to sell data to them. It’s far more likely that they have not offered anyone a fair price yet, which makes sense because that would set a precedent.

              Even then, if people don’t want to sell them their copyrighted work then tough. You can’t compel people to take customers they don’t want.

              • Armok: God of Blood
                link
                fedilink
                English
                16 months ago

                So if I go on a free website that hosts art (ArtStation, DeviantArt, etc.) and get training data that I could have legally accessed for free…

                • @[email protected]
                  link
                  fedilink
                  English
                  -36 months ago

                  They’ve all already done that haha. You could argue that a human has only one life in which to remix that art but an AI is theoretically immortal, so it’s a different category of customer.

                  At any rate, it’s clear that AI should not have free access to copyrighted works, like news articles, academic papers, stock images, and various kinds of non deviantart art.

        • @[email protected]
          link
          fedilink
          English
          -2
          edit-2
          6 months ago

          I’m pretty sure its technically copyright infringement to draw the characters (if they have a design in the book in images) or write fanfic, but no one cares. The only fan stuff that actually get taken down is nintendo fan games and in the past, videos on nintendo games without permission.

      • @[email protected]
        link
        fedilink
        English
        16 months ago

        You can. Distributing copies is illegal, not downloading them. That’s why torrents are bad and streaming sites are fine. (Some exceptions might apply depending on your country).

  • @[email protected]
    link
    fedilink
    English
    2
    edit-2
    6 months ago

    From the “source” (the Japanese one, not the broken link at the bottom):

    AIによる解析・学習についての現段階の日本の著作権法上の見解を確認、権利侵害の懸念も政府に訴えた一方、生成・出力段階での出力自体の著作権の扱いや元データの著作権の扱いが未確認ですので今後質疑等で確認する予定です。

    生成系AIの活用、著作権者を守るための新たな規制が必要だ

    It’s an analysis of the current copyright law in Japan. It does not mean that they won’t update the law eventually. What a terrible article.

  • @[email protected]
    link
    fedilink
    English
    16 months ago

    Pouring the entire library into a sieve to get a gigabyte of linear algebra is pretty goddamn transformative. I do not understand why people think this is something they should stop, even if they mistakenly think it can be stopped.

    Do you really want an internet full of synthetic catgirls? Because this is how you get an internet full of synthetic catgirls.

    That’s this guy’s argument against AI. He thinks that’s a threat.

    This fight is over. It’s been barely a year, and random people are already advancing the state of the art using mundane consumer hardware. You’re not about to claw back their plain-text corpuseses and suspiciously thematic image folders. Destroy every existing model and new ones will emerge in a matter of days.

    Onerous legal obstacles will only restrict this to exactly the rich bastards you don’t want exploiting it. The sort of people who always see labor-saving technology as a way to fuck labor harder.