The original paper is how most language printers are implemented, and this post describes a slightly more expressive algorithm.
The basic rule of these printers is: you have a “single-line” and “multi-line” representation for most AST nodes and have to choose one. But if you choose multi-line (and sometimes in general), child nodes have their own single and multi-line representations, which can be chosen differently. You want to print as many single-line nodes as possible without exceeding a certain line-length. Brute-forcing this is exponential to the AST depth, but Wadler’s algorithm and this derivative are linear time.
From a r/ProgrammingLanguages comment there’s also this paper (OOPSLA 2023) which is even more expressive: instead of just putting as many nodes on one line as possible, it lets the user specify a semi-arbitrary algorithm (“cost factory”) to determine which layout is optimal.