I have a program that require all keywords to be in a single paragraph, most of the time, separated by commas

For example:

I have those terms

1-Term
1.1-Term
2-Term
3-Term
4-Term

That i collected and organized into groups and subgroups with Titles and subtitles

Title

  • 1-Term

  • 1.1-Term

  • 2-Term

    • Sub-Title
      • 3-Term
      • 4-Term

But then i want to turn them into:

1-Term, 1.1-Term, 2-Term, 3-Term, 4-Term 
 

Removing certain marked words(Titles and sub-Titles), any Empty/Blank space, and Line breaks, while adding the commas between The Terms. I want to keep certain dashes “-”(like in words )

1-Term,1.1 -Term,2-Term,3-Term,4-Term

  • bus_factor@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    11 hours ago

    Your description is too vague to really get a good answer. In general, if you’re doing complex string manipulation, you’ll use a full-fledged programming language with regex support, like Python, Perl or Awk, possibly piped into each other and/or other tools like Sed or Cut. I can’t be more specific than that without a more specific description where you describe the actual data and criteria.

    Are you starting with the first or second example? Why do the prefix numbers change between examples? How do you tell text and title/subtitle apart?

    • Cactus_HeadOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 hours ago

      Why do the prefix numbers change between examples?

      My bad, i fixed it

      I want to show that the two terms are related e,g Star and Jedi by grouping them together

      Franchises

      Stars wars
      Jedi

      Transformers


      Also i am not able to add line breaks between bullet points in markdown. so instead i get this

      Franchises

      • Stars wars

      • Jedi

      • Transformers

      So i cant show the grouping thing in lemmy here. I would have also liked The list i make to be markdown compatible but i guess that separate issue.

    • Cactus_HeadOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      9 hours ago

      Basically i collect keywords( e.g: transformers, A Deep dive, Harry Potter The worst, Xbox, stars worst, Jedi) from videos on my YouTube home page and organize them into a lists

      • YouTuber terms:

        • A Deep Dive
        • The Worst

      • Franchises:
        • Star wars
        • Jedi
        • Harry Potter
        • Transformers

      • Companies:

        • Xbox

      And Turn it into:

      A Deep Dive,The Worst, Star wars, Jedi, Harry Potter, Transformers,Xbox  
      
      

      Removing the titles and subtitles.

      How do you tell text and title/subtitle apart

      I was thinking of putting a symbol like “#” for example, in front of the Title

      # - YouTuber terms:  
      

      so the script knows to ignore that whole line, like in general programming

      • a14o@feddit.org
        link
        fedilink
        arrow-up
        4
        ·
        9 hours ago

        This is not difficult to achieve at all with tools like sed or awk. But unless you provide a concrete example input file or files, all we can do is point to those tools.

        • Cactus_HeadOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          3 hours ago

          Something like this?

          - Franchise(Title): 
          
            - Harry potter
          
            - Perfect Blue
          
            - Jurassic world
            - Jurassic Park
          
            - Jedi
            - Star wars
            - The clone wars
          
            - MCU
          
            - Cartoons(Sub-Title):
          
              - Gumball 
          
              - Flapjack
          
              - Steven Universe
          
              - Stars vs. the forces of Evil
          
              - Wordgril
          
              - Flapjack
          
          

          Turned into

          Harry potter,Perfect Blue,Jurassic world,Flapjack,Jedi,Star wars,The clone wars,MCU,Gumball,Flapjack,Steven Universe,Stars vs. the forces of Evil
          

          Both “Franchis” and “Cartoons” where removed/ not included with the other words.

          • moonpiedumplings
            link
            fedilink
            arrow-up
            1
            ·
            6 hours ago

            This is technically yaml I think, a list (with one entry) of lists that contains mostly single items but also one other list. You should be able to parse this with a yaml parser like pythons built in one.

            Note that yaml is picky abiut the syntax though, so it wouldn’t be able to handle deviations.