Edit

My question was very badly written but the new title reflect the actual question. Thanks to 3 very friendly and dedicated users (@harsh3466 @tuna @learnbyexample) I was able to find a solution for my files, so thank you guys !!!

For those who will randomly come across this post here are 3 possible ways to achieve the desired results.

Solution 1 (https://lemmy.ml/post/25346014/16383487)

#! /bin/bash
files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

Solution 2 (https://lemmy.ml/post/25346014/16453351)

sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'

Solution 3 (https://lemmy.ml/post/25346014/16453161)

perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'

Relevant links

https://mike.bailey.net.au/notes/software/apps/obsidian/issues/markdown-heading-anchors/#background


Hi everyone !

I’m in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it’s way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !

With everything I gathered around the web, It seems it’s rather a complicated regex and sed substitution, here we go !

What Am I trying to achieve?

I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo…

Convert the following string:

[Some text](#Header%20Linking%20MARKDOWN.md)

Into

[Some text](#header-linking-markdown.md)

As you can see those are the following requirement:

  • Pattern: [Some text](#link%20to%20header.md)
  • Only edit what’s between parentheses
  • Replace space (%20) with -
  • Everything as lowercase
  • Links are sometimes in nested parentheses
    • e.g. (look here [Some text](#link%20to%20header.md))
  • Do not change a line that begins with https (external links)

While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/

What I tried

The furthest I got was the following:

sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase

sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -

These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn’t work with nested parentheses. Also this would change every %20 occurrence in the file.

The closest solution I found on stackoverflow looks similar but wasn’t able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.


I would appreciate any help even if a change of tool is needed, however I’m more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !

Thanks in advance.

  • learnbyexample
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    5 days ago

    Here’s a solution with perl (assuming you don’t want to change http/https after the start of ( instead of start of a line):

    perl -pe 's/\[[^]]+\]\(\K(?!https?)[^)]+(?=\))/lc $&=~s|%20|-|gr/ge' ip.txt
    
    • e flag allows you to use Perl code in the substitution portion.
    • \[[^]]+\]\(\K match square brackets and use \K to mark the start of matching portion (text before that won’t be part of $&)
    • (?!https?) don’t match if http or https is found
    • [^)]+(?=\)) match non ) characters and assert that ) is present after those characters
    • $&=~s|%20|-|gr change %20 to - for the matching portion found, the r flag is used to return the modified string instead of change $& itself
    • lc is a function to change text to lowercase
    • N0x0n@lemmy.mlOP
      link
      fedilink
      arrow-up
      2
      ·
      3 days ago

      Sorry for the late response… I was busy with another user :S My English is so bad I’m not able to response to every one at the same time… Whatever…

      I tried your pearl regex substitution and effectively it does what I ask from my post, so thank you very much for your help ! However, I missed a few use cases were your regex breaks… But that’s on me, your command works as expected !!!

      [Link with numbers](Another%20Markdown%20file.md#1.3%20this%20is%20another%20test.md)
      

      The part before the hashtag need to keeps it’s original form (even with %20) because it links to a markdown file directly and not a header (Hope it’s comprehensible?). It took me a lot of time with another user and we came to a wrapped up script that does everything:

      #! /bin/bash
      
      files="/home/USER/projects/test.md"
      
      mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
      mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"
      
      while IFS= read -r line; do
      	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
      	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
      	sed -i "s/$line/${dashlink}/" "$files"
      
      	#Puts everything to lowercase after a hashtag
      	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
      	sed -i "s/$dashlink/${lowercaselink}/" "$files"
      
      	#Removes spaces (%20) from markdown links after a hashtag
      	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
      	sed -i "s/$lowercaselink/${spacelink}/" "$files"
      
      done <<<"$mdlinks2"
      

      If you are motivated you can still improve your regex If you want :) I’m kinda curious If it’s possible with a one-liner ! Thank again for your help and sorry for the late response !!

      • learnbyexample
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 days ago

        This might work, but I think it is best to not tinker further if you already have a working script (especially one that you understand and can modify further if needed).

        perl -pe 's/\[[^]]+\]\((?!https?)[^#]*#\K[^)]+(?=\))/lc $&=~s:%20|\d\K\.(?=\d):-:gr/ge'
        
        • N0x0n@lemmy.mlOP
          link
          fedilink
          arrow-up
          1
          ·
          2 days ago

          Thank you ! It does actually ticks every use case (for my files) looks pretty rad !

          This might work, but I think it is best to not tinker further if you already have a working script (especially one that you understand and can modify further if needed).

          I totally agree but I will keep your regex as reference, in the near future I will give it a try to decompose you regex as learning process but it looks rather very complex !

          Another user came up with the following solution:

          sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'
          

          Just as a little experiment, If you want to spend some time and give me a answer, what do you think? It’s a another way to achieve the same kind of results but they are significantly different. I know there a thousand ways to achieve the same results but I’m kinda curious how it looks from an experts eyes :).

          Thanks again for your help and the time you took to write up a complex regex for my use case ! 👍

    • bizdelnick@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 days ago

      I didn’t test this, but it will change the whole URL while changes are only needed in its fragment component (after the first #).

      • learnbyexample
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 days ago

        Hmm, OP mentioned “Only edit what’s between parentheses” - don’t see anywhere that whole URL shouldn’t be changed…