Consider this Clojure snippet:
(defn fetch-thing [db user-id thing-id]
(if (db/has-permission db user-id thing-id)
{:status 200 :body (db/fetch-thing db thing-id)}
{:status 400})
It’s clean, does clearly what it looks like it does, but there’s a small
problem; it doesn’t enforce a transaction. If db is a simple database
connection without a transaction and something happens in the database between
the second and the third line, for example if the thing
referenced by
thing-id
is deleted, the output of db/fetch-thing
can get quite
unpredictable. Could we make this kind of programs safer?
Enter Rust
Rust is a relatively new programming language, with focus on speed and safety. Rust has the unique notion of ‘ownership’, which enables memory safety without relying on a garbage collector. Nowadays in most languages all variables are actually references, which are just thrown around and are either wildly mutable or completely immutable. Rust provides a way to write code where mutation is explicit and safe.
Rust implements this by providing a unique feature: the borrow checker. Essentially, the borrow checker is a compile-time feature which ensures that objects live long enough, and at the same time prevents unsafe concurrent access to variables. This is implemented by following a couple of simple rules:
- If an object drops out of scope, it is destroyed
- An object must outlive all references to it
- An object can only have either multiple immutable references, or a single mutable reference
In this blog post we are mostly concerned about the first rule – if an object drops out of scope, it is destroyed.
The borrow checker is mostly praised for its ability to enable manual memory management without suffering from null pointer exceptions or segfaults – Safe Rust doesn’t have any null or otherwise invalid references! In my opinion, the best benefit of the borrow checker is not the speedup it provides by eliminating the garbage collector, but the ability to catch logic bugs. If you manage to construct your problem as a problem around ownership, the Rust compiler can check your logic at compile-time!
Trait interlude
Rust doesn’t have inheritance nor interfaces. Instead, generic code is implemented around traits. Traits are similar to interfaces in the sense that they contain a bunch of methods, and certain types implement those interfaces. Types can only implement traits either where the type is defined, or where the trait is defined. This unfortunately means that one cannot implement a third-party trait for a third-party type. However, unlike interfaces in eg. Java, one can implement their own traits for other types.
When writing generic code, generic type parameters can receive trait bounds to specify things that can be done with the types. For example, in the following function
fn debug_print<T: Debug>(t: T) { dbg!(t); }
the type T is required to implement the trait Debug
, which allows turning the
object into an programmer-readable (but not necessarily human-readable) format.
The generic functions are compiled similarily as in C++: any calls to the function are specialized to the types, which technically can create binary bloat, but practically reduces it as the compiler has better options for optimization.
Modeling database queries around ownership
So, we need to be able to make a single database query without a transaction, convert a database connection into a transaction and make multiple queries inside a transaction.
Sounds like while making a query without a transaction, the connection should
refuse any additional queries. If the queries are made within a transaction,
additional queries should be accepted. Furthermore, the transactions should be
cleaned up when dropped, which postgres::Transaction
does by default,
defaulting to rollback, rather than commit. Let’s sketch an API for it:
pub fn one(connection: db::Connection) {
db::find_person(connection, "Joe"); // OK
}
pub fn two(connection: db::Connection) {
connection.transaction(|tx| {
let person = db::find_person(&tx, "John")?;
tx.commit()?;
Ok(person)
});
}
pub fn three(connection: db::Connection) {
connection.transaction(|tx| {
let first = db::find_person(&tx, "Mary")?;
let second = db::find_person(&tx, "Suzy")?;
tx.commit()?;
Ok((first, second))
});
}
pub fn four(connection: db::Connection) {
db::find_person(connection, "Foo");
db::find_person(connection, "Bar"); // ERROR: two queries without a transaction
}
Technical detals
The typical postgres
connection types are modeled around typical usage rather
than this special case, so we clearly need some wrappers around them. Let’s
start by defining them:
pub struct Connection(Box<postgres::Connection>);
pub struct Transaction<'a>(Box<postgres::transaction::Transaction<'a>>);
These types contain exactly one member: the respective postgres
connection
type, with the crucial difference that Connection
and Transaction
do not
derive the Clone
trait used to acquire new copies of the connection. Ignore
the Box
and 'a
, they only tell Rust that the references to the variables
exist long enough.
Next, we need a trait and relevant implementations to convert these types to
the trait provided by postgres
which provides methods such as query
.
pub trait IntoGenericConnection {
type G: postgres::GenericConnection;
fn into_generic_connection(&self) -> &Self::G;
}
impl IntoGenericConnection for Connection {
type G = postgres::Connection;
fn into_generic_connection(&self) -> &Self::G {
&self.0
}
}
impl<'a> IntoGenericConnection for &'a Transaction<'a> {
type G = postgres::transaction::Transaction<'a>;
fn into_generic_connection(&self) -> &Self::G {
&self.0
}
}
impl Connection {
pub fn transaction<F, R, E>(self, callback: F) -> Result<R, E>
where F: FnOnce(Transaction) -> Result<R, E> {
let tx = self.0.transaction().unwrap();
let res = callback(Transaction(Box::new(tx)))?;
Ok(res)
}
}
Then we just call into_generic_connection()
while inside the relevant
database function:
pub fn find_person<IGC: IntoGenericConnection>(db: IGC, name: &str) -> Option<Person> {
let conn = db.into_generic_connection();
conn.query("SELECT id, name FROM account WHERE name=$1", &[&name]).unwrap()
.into_iter()
.map(|row| Person { id: row.get(0), username: row.get(1) })
.next()
}
With these couple lines of code, any database accesses are guarded against
accidental unsafe usage. The programmers still have access to the backdoor used
to gain a reference to the connection without using a transaction (through
IntoGenericConnection
), but using it is explicit, rather than accidental. The
same wrapper types can also be used to enforce a transaction around database
functions which make multiple queries simply by replacing the IGC
generic
with Transaction
.
Now, the last example gives an error message around the lines of the following snippet:
error[E0382]: use of moved value: `connection`
--> src/backend/src/router.rs:3:20
|
1 | pub fn four(connection: db::Connection) {
| ---------- move occurs because `connection` has type
| `db_traits::Connection`, which does not implement the
| `Copy` trait
2 | db::find_person(connection, "Foo");
| ---------- value moved here
3 | db::find_person(connection, "Bar");
| ^^^^^^^^^^ value used here after move
Performance
Does this provide any performance drawbacks? Both Connection
and
Transaction
are exactly the size of Box
, which means moving them around is
essentially free. Both implementations of into_generic_connection
are no-ops,
as boxes are secretly just pointers, and the trait methods are referencing the
first and only member of the struct – which is a no-op. Thus, there shouldn’t
be any performance hits when using this method.
Another way to implement this would be to use a zero-sized type for connections and load the connection when converting to GenericConnection or Transaction, but the implementations would be more complex.
Conclusion
While a relatively new language, Rust can already be used to tackle practical problems. This is just one example how Rust can be used to implement safe and fast programs.