Question is how to do these in Rust. An example might be a browser DOM: each node has a parent pointer, a list of child pointers, left and right sibling pointers, maybe a CSS node pointer, etc. Inserting or deleting nodes has to repair the pointers to and from the neighboring nodes as needed.
I know this is doable since obviously Servo (Rust’s initial driving application) has to do it. I hope the answer doesn’t involve the word “unsafe”. But I am quite unclear about how to create such a structure under Rust’s rules of pointer ownership, and also how to reliably reclaim storage when nodes and trees/subtrees are deleted. Plus there will be thread safety rules that should be statically enforced if possible.
I’ve heard that doubly linked lists in Rust are handled by an unsafe library, yet this seems even worse. Thanks.
The safe, fast and easy way to do trees is by using Rc<RefCell<T>>. Rc/Arc allows data to be owned multiple times. You want this because this way a node can be referenced by its parent and its child at the same time. However, Rc makes the inner type inmutable. And you probably will want to mutate it in a tree, that’s what RefCell is for. With RefCell you do the borrow checking at run-time instead of at compile-time. This allows you to mutate T even though Rc only gives you an inmutable reference. This is called interior mutability.
RefCell doesn’t eliminate the borrow checker though, you must still follow its rules. If you try to get 2 mutable references to the inner type of RefCell, it will panic.
I know you don’t want to read unsafe, but you gotta hear about the alternative. Just use pointers. Pointers don’t have the borrow checker to restrict them. And self-referencing structures with interior mutability are not easy to borrow-check automatically. You can have the raw pointers as private fields of the struct so the code that is actually unsafe will be a few very small functions.
Here’s why the other options worse than pointers:
Rc<RefCell<T>> will clutter your code with boilerplate and it’s a pain to deal with. Pointers are not too ergonomic in rust (mainly because there is no
->
operator), but they need way less boilerplate. Also, you already need to manually check the mutability rules, why not all the rules.Another option that I’ve seen is “have a hashmap with all the nodes, and just store the id of the node instead of a reference”. This is the same as “have all the nodes on a Vector and store the index”. Think about this for a second. You have a pool of memory and a number that identifies what part of that pool is the memory you want. Seen it yet? That is exactly what a pointer is! If you do that, you’re just disabling the borrow-checker anyway. You just created your own memory allocator and will have to manage your memory manually, at that point just use pointers, it will be the same except with fewer boilerplate and indirection.
Typically for this I’ve done some wrapper type around a vector storage for the node data (a wish.com arena, if you will). Links are just some abstraction around indices.
It’s kind of annoying to write initially, but it was easier than learning unsafe stuff. To do it correctly & most efficiently though, one would likely need unsafe code.
Typically for this I’ve done some wrapper type around a vector storage for the node data (a wish.com arena, if you will). Links are just some abstraction around indices.
Not sure what wish.com is, but yeah, you’re describing a traditional Fortran approach. I would say machine pointers are a hardware primitive, and using indices like that are an abstraction. Anyway, the reply is appreciated. It at least tells me that I’m not missing something super obvious.
Check out the crate
indexmap
. Check its dependants too if you want to see how higher level abstractions can be built utilizing it.Thanks, I will check that. It sounds interesting.
One way to do this is to use reference-counting pointers such as
std::rc::Rc
orstd::sync:Arc
. The parent node can hold a strong reference to each child node and each child node has aWeak
reference to its parent.I guess that’s possible. The sibling pointers can also be weakrefs, I think. I don’t know if there are ever arbitrary links between DOM nodes rather than just the tree-like ones. But, you are onto something if creation of weakrefs safely avoids the pointer ownership rules. I’ll have to look into this further.
One way of solving this is to structure all of your nodes into a HashMap with the node ID as the key and the node type as the value. The underlying node type could have node IDs for referencing purposes. You lose the ability to reference the parent/child/sibling directly, but you avoid direct circular dependencies. That said, now you need to manage dangling references for when the node is removed from the main HashMap collection.
Thanks, this sounds like a pretty serious abstraction inversion, exposes you to various kinds of bugs such as memory leaks, and gives a performance hit from all the hashing compared with using machine pointers, but I guess it does get rid of the circular references. Is it really the idiomatic answer? I’m ok if it is, but it’s not what I’d have expected from a low level language designed to replace C.
There are a few different ways to solve this problem without using unsafe and I’m not sure what would be considered idiomatic. Another option is to ultimately encapsulate all of the nodes in a reference-counted box like Rc or Arc and specify the type of your parent/child/sibling references to the same reference-counted box type. With that, you just share cloned instances around as needed.
The primary drawback here is that for mutable access you end up having to perform a lock every time on an underlying Mutex (or something similar). You also no longer have direct access to the singular instance of the node.
There are pros and cons to every approach.
Thanks. I might ask on irc how Servo does it. Given Servo’s connection with Rust, it would be surprising for this to be too awkward.
One note about doubly linked lists - I believe the received wisdom these days is that in almost all situations they’re a bad algorithm to solve your problem. (I watched a video about it sometime… And read something in the Rust docs… I’m not an expert!)
I see other comments mentioning a hash map… Hopefully you’ve got some good options.
Linked lists in general are cache unfriendly, though it helps if you have a relocating GC that puts the nodes back in order. The hash map idea is a possibility. I’m not actually trying to implement something like this. It’s more of a question about Rust’s approach from a PL perspective.
One way of solving this is to structure all of your nodes into a HashMap with the node ID as they key and the node type as the value. The underlying node type could have node IDs for referencing purposes. You lose the ability to reference the parent/child/sibling directly, but you avoid direct circular dependencies. That said, now you need to manage dangling references for when the node is removed from the main HashMap collection.
Doubly linked list is one of std’s collections. And safety in Rust is built on top of unsafely, because there is no way around that.
Did you try to actually look up literally anything before asking?! Because simply checking out
std::collections
docs would have given you some answers.I don’t see anything in the std::collections doc relevant to this question. Yes I knew about the presence of doubly linked lists in that library. Obviously you could implement the DOM structure unsafely with C-style pointers but then what are you getting from using Rust instead of C? So I hoped to hear how to do it safely, maybe using a library object as a building block. But, I don’t think the stuff in std::collections suffice for the purpose.
Rust safety isn’t necessarily built on unsafety, though apparently it is in the case of doubly linked lists. I think singly linked lists can be made safely in Rust, similar to using C++ std::unique_ptr. It’s possible, though very complicated, to eliminate all the unsafety everywhere. Rust doesn’t attempt this, but (for example) ATS does (ats-lang.org).
Anyway your scolding is unhelpful. If you’ve got a usable and clear answer to my question I’d be happy to try to digest it. Otherwise I don’t see any point in continuing to respond.