• 1 Post
  • 6 Comments
Joined 3 months ago
cake
Cake day: August 19th, 2024

help-circle






  • webbureaucrattoRustXHTML 1.0 Transitional parser?
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    3 months ago

    I wish I could be more help. My advice is you need a better grade of general purpose HTML parsing library, possibly even a browser emulator, rather than a lib specifically for XHTML 1.0 transitional or a converter.

    In my Python web automation course in college we used BeautifulSoup and I think maybe mechanize. I think either of those would probably be robust enough to do what you’re trying to do, but if it has to be Rust I’m not sure what’s out there. Otherwise you could upgrade to Selenium or something.

    Or if you’re trying to do something fairly simple and you don’t need to parse the whole thing but it’s still a little too complex for plain old regular expressions, you might be able to build a simple parser with the rust pest crate, but of course I would absolutely not recommend trying to build your own full-featured XHTML parser.