What would it take to add refinement types to Rust?
Comments
amenghra
dwattttt
> Do you want to be able to add inches and feet?
This probably doesn't have to be too complicated; the usual answer for Rust is "no". Rust doesn't even let you "just" add two unsigned integers of different sizes. Following that design, I would imagine units would require an explicit "turn feet into inches" or the other way around.
valenterry
That's where Scala shines. I wrote about this here a bit: https://valentin.willscher.de/posts/contextual-syntax/
Rust is heavily inspired by Scala, but I guess achieving something like the examples in my post is difficult. I really hope Rust finds one way or another to make it work. Because simply forbidding everything all the time isn't even the safest way - it drives many people to just avoid it altogether and use unsafe code.
nine_k
While at it, could you give a few examples that illustrate how Rust was inspired by Scala proper, and not Haskell / SML / OCaml (which also influenced Scala)?
akkad33
This page from official Rust website of its influences does not mention Scala. https://doc.rust-lang.org/reference/influences.html
In fact this is the first time I've seen anyone say Scala influenced Rust let alone "heavily". Seems like a stretch
amelius
It says here:
https://en.wikipedia.org/wiki/ML_(programming_language)
that ML influenced Rust, Scala, Haskell and OCaml, so that's the common denominator.
However, although Rust wants to be an ML style language all the idioms break because there's no GC.
threeseed
So I regularly jump between Scala and Rust and it's more a feeling.
In that overall the way option types, pattern matching, FP operations e.g. map/filter, immutability by default and type parameters have been implemented are very similar to Scala. And since these make up a large percentage of the everyday code you write you see the similarity as being larger than maybe it is.
akkad33
All these are just functional features you find in every functional language though. Rust was highly influenced by OCAML and other ML languages
threeseed
Hence why I said it was a feeling.
And I think the fact the overall style of the language e.g. braces, semicolon, method signatures, loop handling is so similar to Scala versus OCaml contributes to this.
vlovich123
Might be annoying if you’re working with floating point but given that each conversion introduces error, it’s probably good to be explicit and recommend internally to be consistent
stouset
Thanks to generics you could theoretically work with whatever underlying type makes the most sense for your use-case.
let v1 : Inches<Ratio> = Ratio::new(5, 8).into();
let v2 : Inches<Ratio> = Ratio::new(3, 8).into();
let v3 : Feet<_> = (v1 + v2).into();
assert_equal!(v3, Ratio::new(1, 12));
throwawaymaths
no that's not the point. if you let rust automagically decide when and where to apply conversions you could easily wind up in a situation where you have more operations than you need, which increases numerical error, and also be a bitch to uncover or refactor to minimize conversions.
stouset
Rust doesn’t automatically apply conversions, full stop.
And with a thoughtful approach to the API, you could avoid numerical error entirely by using integral types.
amenghra
That’s a good starting point. The temperature delta issue remains: adding and subtracting temperatures should create a different unit. Thankfully there aren’t a lot of non-linear units.
throwawaymaths
thats why i put the word "you" in there. rust will do whatever automagic conversions you specify in the impls. Perhaps i should have used the verb "make" instead of "let". if you make rust automagically convert.
int_19h
In F#, which brought the unit-of-measure feature to (relative) mainstream, there are no automatic translations between different units for the same measure, avoiding most of these traps. You can't add inches and feet - you have to explicitly convert one or the other first. Of course, there's still precision and rounding issues here, but no more so than with any other floating-point arithmetic expression you might write.
weinzierl
I find the idea of being able to specify numerical types with arbitrary ranges very appealing.
In Pascal these are called range types, e.g.
month: 1..12
would define an integer where the type system would ensure that it is always between 1 and 12.
Apart from Ada this seems to be an alien concept to all other languages. The concept of a "range type" also seems to have other meanings.
What is the Pascal "range type" properly called in type theory and what is its relationship to refinement types?
3eb7988a1663
Nim calls these subranges https://nim-lang.org/docs/manual.html#types-subrange-types
Subrange = range[0..5]
PositiveFloat = range[0.0..Inf]
frogulis
Anybody correct me if I'm wrong -- in Pascal these subranges can specify the size of the variable in memory and check compile-type assignment to a literal value, but don't (can't) do much else.
There's the concept of dependent types in e.g. Idris which lets us correctly do this sort of range check throughout the program (not just on literal assignment) but it comes with strict requirements on what the compiler can allow, such as no unbounded recursion, because checking dependent types is roughly equivalent to running the program.
touisteur
Ranges are awfully useful in Ada, but a very small help in dimensionality/units analyis.
If you mean to go all the way on units, dimensions and typing, there's a bestiary there, quited maintained (with several Ada 'takes' too) https://www.gmpreussner.com/research/dimensional-analysis-in...
evertedsphere
it is a refinement type over the base number type, where the predicate constrains the number to lie in an interval
e.g. { x:Int8 | 1 ≤ x ≤ 12 }
munchler
> an integer where the type system would ensure that it is always between 1 and 12
How does Pascal handle overflow/underflow? E.g. Month 10 + Month 11 = Month 21?
loeg
Uh, seemingly, yeah. 10+11 = 21. No bounds checking on addition or assignment.
dannymi
$ cat a.pp
{$R+}
var
a: 1..12;
b: 1..12;
c: 1..12;
begin
a := 10;
b := 11;
c := a + b;
Writeln(c)
end.
$ fpc a.pp
Free Pascal Compiler version 3.2.2 [2021/05/19] for x86_64
Copyright (c) 1993-2021 by Florian Klaempfl and others
Target OS: Linux for x86-64
Compiling a.pp
Linking a
12 lines compiled, 0.1 sec
$ ./a
Runtime error 201 at $00000000004010D8
$00000000004010D8
$0000000000422EEC
int_19h
That is just this one particular implementation of Pascal (unsure which one). The language spec does say that it is an error to assign a value out of range, but as far as what this actually means, it only says that "a complying processor is required to provide documentation concerning its treatment of errors".
eddsolves
I assume modulo?
bspammer
I frequently wish I had a natural number type. So many programs would benefit from the type system guaranteeing that numbers are never negative.
josephg
Rust has this one - though the ergonomics are a little awkward. Its called NonZero:
archargelod
Nim has Natural and Positive (signed int) types that checked at compile and runtime. You can also define arbitrary range types for any ordinal type, for example enums:
type
Month = enum Jan, Feb, Mar, Apr, May, ...
SpringMonth = range[Mar..May]
var m: SpringMonth = Mar
It will raise under/overflow exception if value falls out of range.frogulis
Easy enough to write in some languages... as long as you only need to support addition and multiplication, or a very limited set of numbers.
riffraff
raku has "type subsets" which can do this (and more)
williamdclt
I’ve seen “month” being implemented in typescript as a big sum type ‘1 | 2 | 3 | … | 12’ :) probably ranges could just be syntactical sugar on top of that, but I suppose it probably causes big inefficiencies for the compiler
int_19h
It can be syntactic sugar in terms of how it's defined, but the implementation can still be smart and use a compact representation where possible (this is basically RLE).
Similarly, if you have tuples (or structs with anon fields etc) in the language, you could unify those with arrays such that struct { int; int; } is the same as int[2].
amelius
In Pascal, how would you denote a type that can take all integers between 1 and 1000 except 123?
DylanSp
Haskell's Ord typeclass is for ordering; it just represents types that have a total order, it doesn't represent ranges.
cpa
My bad the ordinal data type class is Enum not Ord!
nmilo
It should be easier now with const generics, I know the popular C++ library looks like Value<Unit, T, L, M, ...> where the letters are numbers representing the dimension of time, length, mass, etc. So m/s^2 would always be Value<f64, -2, 1, 0, ...>. By keeping it normalized you don't need Equivalent or whatever
thesuperbigfrog
I would love to see refinement types added to Rust.
They are very handy for preventing errors and expressing intent in Ada:
https://learn.adacore.com/courses/Ada_For_The_CPP_Java_Devel...
JoshTriplett
As a related concept, we've talked about adding "pattern types" to Rust: `Result<T> is Err(_)` (the type of a Result that you've just confirmed is an `Err`, so you don't have to handle the `Ok` case), or `u32 is 1..` (equivalent to `NonZero<u32>` but more flexible).
bonzini
Is the former different from a hypothetical Result::Err type (I don't know if you'd spell that Result::Err<T> or Result::<T>::Err)?
n_plus_1_acc
Different solutions to the same problem. Pattern types are more general than enum variant types.
JoshTriplett
Same concept, but generalized to any type, not just enums.
loeg
The author didn't mention this, but maybe there could be a type (unit) canonicalization step that always produces the same reduced/simplified type for any equivalent set of Mul/Divs? So that you don't need the later equivalency check.
E.g. with the article's example, where 'a' is Meters and 'b' is Seconds, 'a/(b*b)' and 'a/b/b' both have type 'Div<Meters, Mul<Seconds, Seconds>>' instead of one having the type 'Div<Div<Meters, Seconds>, Seconds>'.
For only Muls and Divs you can basically just have a histogram of units to powers (e.g., m/s^2 => m: 1, s: -2) which uniquely represent equivalent types.
hinkley
James Gosling looked at this and wrote about it at the time. It can be a bit mind bending. Especially with units named after people.
If you’re looking at statistics for current or power the type system might try to convert it to joules even though you wanted to look at average wattage.
epcoa
I do not understand, average wattage/power would be integration of wattage (joules) divided by time (so back to watts), it’s still watts? Under what implementation would you end up with joules that isn’t just oddly broken in general?
I guess I could see a naive implementation (confusing integration/sampling and discrete summation) going the other way, erroneously ending up with a nonsensical W/s.
Also don’t understand what it has to do with eponyms, which are just substitutes for base units, either your DA works or not, no? Average wattage is kg⋅m²⋅s⁻³ not kg⋅m²⋅s⁻² (joules) or kg⋅m²⋅s⁻⁴
jillesvangurp
I used javax.measure a few years ago on a project that was about building a material search engine. It featured search by all sorts of chemical and physical properties. Complete with all the wonderful units that are used around the globe to measure tensile strength, breaking point, conductivity, magnetic properties, etc. To avoid comparing apples to oranges when searching, being able to normalize and convert between different units is key. Additionally, allowing to user to search with their preferred units is also important. We supported hundreds of different properties for a wide range of ceramics, metal alloys and other materials.
Not the easiest framework to work with but pretty well thought out. A good starting point if you are looking to reinvent that wheel.
loeg
Yeah, but equivalent units with different names is sort of only a display/formatting issue, right?
weinzierl
This is a matter of perspective.
For example the unit Sievert is an official SI unit despite being just J/kg. This is because confusing equivalent dose and absorbed dose, which also has the unit of J/kg, could be very dangerous.
Note, that this is different from J sometimes being written as Ws. While there are informal conventions, when we use J and when Ws, using the unconventional one would not be technically wrong because 1 J is simply 1 Ws, whereas 1 Sv is not necessarily 1 J/kg when the later is an absorbed dose.
I think one could reasonably disagree with these decisions but that is how the SI people see it.
bonzini
Sievert is more of a "newtype" in Rust/Haskell terminology than a separate unit. Likewise for radians and moles.
namibj
Torque is actually J/rad.
hinkley
Not in freedom units.
loeg
Sure, but I think this is still better than declaring type system bankruptcy because determining whether trees of units are equivalent is hard. You could do newtype wrappers in situations where it is warranted.
dwattttt
> This is a matter of perspective.
I'd make a stronger statement here; this is a specific example of when having units is most powerful. When even though two things are expressed in some common form, they nevertheless represent something different.
> Yeah, but equivalent units with different names is sort of only a display/formatting issue, right?
This could be said of two u32s as well.
pests
> When even though two things are expressed in some common form, they nevertheless represent something different.
This is where people go wrong trying to DRY and other refactors. Slightly forced example but
function averagePerClassroom(total) { return total / 30; } // 30 kids per class
function averagePerMonth(total) { return total / 30; } // assume 30
"Oh, the function body is the same therefore lets refactor this into an "averagePer" function" expect its two completely different concepts and once the code calculates the actual days per month or once classes are no longer 30 people suddenly things need to be un-refactored, or what I see more often, is just branching off inside the new single function based on an argument flag. Horrible.amalcon
A joule, notably, is a newton-meter. Torque is also measured in newton-meters, but the meters are perpendicular to the newtons.
Adding energy to torque is rarely going to be intentional.
jiggawatts
A simple way to do this is to store a vector of the powers.
For example momentum is kilogram * meter / second, which is MASS^1 * LENGTH^1 * TIME^-1
As a vector, this can be represented as (1,1,-1) where the positions are M, L, T respectively.
In that format velocity is represented as (0,1,-1), acceleration is (0,1,-2), etc...
This is automatically canonicalised and much easier to manipulate than a tree of operations.
Of course, this assumes uniform units such as CGS or MKS in something sane like the metric system. Conversion back and forth is generally straightforward, as long as the types encode the system used. E.g.: CGS<1,1,-1> and MKS<1,1,-1> both represent momentum, but at different scales.
Imperial also works, and other base units can be added to extend the system. This can include things like current, temperature, moles, etc...
loeg
> This is automatically canonicalised and much easier to manipulate than a tree of operations.
As long as the vector is sorted by unit, yeah. With that caveat, it's the same idea.
jiggawatts
It’s a positional system with the same basic measurement units always at the same index locations.
adgjlsfhk1
there are 7 si units, and you can represent all the powers you need in 8 bits, so you can pack every reasonable SI unit in 64 bits
int_19h
The author did mention this. It's exactly what the F# article they cite is about:
http://typesatwork.imm.dtu.dk/material/TaW_Paper_TypesAtWork...
loeg
I believe you that the F# PDF describes it, but the text of the fine article itself does not describe it in any recognizable way.
Waterluvian
If I understand this properly, I’ve needed this in typescript a lot.
I can make `type uuid = string` for self documentation, but a lot of plugins will just label it “string” and developers can (and have) mistakenly put some other identifier, like the robot’s hostname.
Of course we validate at the API but it’d be more skookum if we could prevent accidental wiring together of front-end components that make this error.
String literals help a ton. Gosh they’re wonderful to care about the shape of a string in the type system. But sometimes I really want to say “strict uuid” as in “I don’t care if it quacks like a duck, it’s not called duck.”
feznyng
As spockz pointed out, you’re looking for the new type pattern. Rust supports this explicitly but you’ve got to do some workarounds to get it in typescript.
https://kubyshkin.name/posts/newtype-in-typescript/
Unfortunately doesn’t help much when you’re dealing with functions from packages someone else has typed.
mcfedr
Check out what are often called tagged types.
You basically define a `type A = string &{a: SomeSymbol}`
And then have a type assertion function that just returns true, and you have control over the places A can come from
demurgos
Yes, it's the standard way if you want implicit compat with the base type, so you can pass a value of type `A` to a function expecting a `string`. An other name for this pattern is "branded types".
steve_adams_86
I really like the Effect schema library for tagged types. I’m not sure how well it works for primitives, though.
Waterluvian
Ah yeah I remember poking at that. I should review it again. I think at the time it felt like a bit of a hack, but maybe some features of the past years of TS updates have helped.
I assume the idea is to lie to the type system about the existence of the symbol, and at runtime it is just a string.
TOGoS
To me it feels like less of a hack if I can make the type declaration not a lie. e.g.
type FooID = string & { typeName? : "FooID" }
Read as 'of course this thing doesn't have a typeName[1] property, since it's a string, but if it did have the property, the value would be "FooID"'. You can then cast between FooID and string, but not between FooID and some other type that declares a typeName property.[1] I actually tend to use 'classRef' with an RDFish long name for the type, but that makes examples longer and isn't the point.
Waterluvian
That’s a really helpful way of framing it. I’m tempted to try this with our uuid type and see how well it works.
I think I’ll also try to base it on a template string too if that’s possible. Given we use a standard dash segmented uuid string.
spockz
Haskell has had this with `newtype` for ages. Type safety without runtime overhead.
jph
Type refinements are a great concept and I'd love to see them in Rust. And double-refinement types are great for helping with conversions, such as with Rust From/Into, and potentially a dynamic converter function.
Examples of double-refinements that I'd like:
- Common units like Length:Meter and Length:Foot.
- Color bits like Color:RGB24 and Color:CYMK24.
- Worldwide currency like Money:USD and Money:GBP with a converter function that knows exchange rates.
- Human languages like String:English vs String:Cymraeg with a converter function that knows translations.
skissane
> - Worldwide currency like Money:USD and Money:GBP with a converter function that knows exchange rates.
Exchange rates vary over time, so you'd arguably need a type which includes a timestamp (e.g. "USD 1000 at 2024-12-25").
And that's ignoring all these other complexities such as the spread, different currency converters offering differing rates, unofficial and multiple official rates in countries with currency controls (e.g. Argentina), hedging, etc
magicalhippo
> multiple official rates
One such case is customs agencies publishing their own exchange rates for use in custom declarations, for example here[1] for the US or here[2] for Sweden.
[1]: https://www.cbp.gov/trade/document/report/daily-foreign-curr...
[2]: https://tulltaxan.tullverket.se/arctictariff-public-web/#!/t...
jaza
Plus, perhaps the biggest complexity of all is that the currency rates are often not free, particularly if you want "live pricing" (updated every few seconds), and particularly if it's for commercial use. And, the fact that they may or may not be free, also illustrates well the fact that there are no definitive rates, there are only "rates according to X".
spockz
The currency converter is the odd one out because it context dependent. (Time, contract, etc)
0xFF0123
Similar to time with a timezone?
eddsolves
I think liquid Haskell is a great example of refinement types?
marxisttemp
Could someone clarify to me the different between dependent types and refinement types?
dietr1ch
Refinements embed a predicate that elements of the new type must pass.
def Nat = Int: (|x| -> x >= 0)
Dependent types allow types to be computed from functions (and depend on arguments, otherwise it seems they become just weird constants), def Five(as_type: String) -> NumericType(as_type):
match as_type:
"string" => "five"
"int" => 5
"float" => 5.0f
"double" => 5.0d
_ => panic() // Unnecessary if you refine `as_type` from a String to an enum or a fixed set of strings.
Dependent types seem weird, but they help making types first-class (https://www.youtube.com/watch?v=mOtKD7ml0NU&t=325s) and gaining types like `Array<T, N>` that allow ensuring things are the right length, and define append/extend properly.tubthumper8
What happens when the predicate returns false for the refinement type?
Like if `(Nat 5) - (Nat 6)`. Or is subtraction disallowed?
klibertp
My understanding is that you get a type error when the type checker cannot prove the result of an operation conforms to a refined type. For constants, it's easy: `(Nat 5) - (Nat 3)` is Nat while `(Nat 5) - (Nat 6)` is a type error. For values coming from elsewhere, you need to give the type-checker some guarantee yourself:
let x = Nat 5;
let y = read_nat();
assert y < x; // without this, the following would fail to type check
x - y
(you need occurrence typing, too, in this example)In terms of numbers, what you can expect from refinement types is similar to what you get with CLP(FD) in Prolog.
tubthumper8
Right, but what happens if the Assert isn't true? Does the program crash/abort? If so, is there a stack trace or how do you troubleshoot it?
dietr1ch
I guess the only natural way would be to have operations send you back to the Base type, and then have a `TryInto(v: Base) -> Result<Refined>` function that would verify the predicate and allow you getting back into the refined type.
Maybe some operations can be proved to stay within the refined type, like adding natural numbers, but that's something that the used would need to provide as a function allowing that under assertions or some proof that the compiler can verify and trust.
klibertp
It's a standard assert, so whatever happens on assertion failure normally happens here, too. This is orthogonal to refinement types - they are concerned with whether the check is made or not (ie. with them you cannot accidentally forget to write that assert). The check failing is a runtime concern (similar to dynamic cast failure), the type system has nothing to do with that.
xigoi
Just curious, what language is this?
dietr1ch
I made up the syntax to try to keep it easy to read for people using python/rust/c.
I just realized my lambda syntax on the Nat predicate is redundant because I didn't clean up and that using snake_case for function names would be better in a language that lets you operate on functions like they are values.
burakemir
Dependent types typically refers to type systems where a type can depend on a term. The canonical example is "Vector n" where n is some expression that evaluates to a natural number.
Refinement types typically(1) refers to a type systems that lets you create a subtype of a type through refining (qualifying) with a predicate or constraint on the shape. Examples {x \in int | is_even x } or { x \in List | len(x) = 1 }
Refinement types can be very powerful but that may well make type checking undecidable (think of a type of Turing machines, and the refinement that keeps only the ones that halt). By being careful about the logic used in the refinements, one may retain decidability.
(1) The article seems to have a different idea of what a refinement type is: quote "a type system that does its work after another type system has already done its work".
I am not going to play orthodox guardian of type theory terminology here, yet to me personally, it does seem unfortunate to use that term. The author seems to really want a form of type-level computation, which could be interesting if it could be rigorously specified and it's relation to the existing type level reduction clarified.
demurgos
Dependent types can make type constructors generic over _values_ (instead of only types). Refinement types keep the separation between types and values, but they let you wrap an existing type to enforce extra constraints or semantics. Another example of refinement are pattern types where you can attach a match pattern that is enforced to pass.
akdor1154
You could just read the first paragraph of TFA?
ordu
I'd really like to have them. If I'm writing an algorithm that maps interval (0, 1) of probabilities onto different intervals of u32 integers, and probabilites are also represented as integers (an integer part of p*2^n), I will want to have different units for those integer probabilites and their mapped values. I don't want to mix them accidentally and to add probability and a mapped probability accidentally. Probably I dont want to add cumulative probability and a probability, though I'm not really sure about that, because sometimes I want to add (for example when calculating cumulative probabilities) and I want to compare them. Though maybe it will work with some exotic rules, like CumProb-CumProb -> Prob, while CumProb+CumProb is forbidden?
And there are more examples of that, I can have probabilities from different distributions, I don't want to add them accidentally, though multiplications of them is all over the place, so let it be. Of course I have no hope that any language could deal with the explosion of types that are needed to represent this, but if compiler just gave me f64 as a result of multiplication of probabilites, I'd be happy.
It can be done in Rust on case by case bases, but it is a lot of boilerplate. The issue is the typedef IntProb = u32, treat IntProb like an alias to u32 and rustc converts these types into each other silently like C compiler converts int to char. One can do struct IntProb(u32), but then it would be needed to implement traits like Add, Mul, Cmp and so on, which is possible but it is too much work. If it was possible to force rustc to treat typedef MyType = {NumericType} as a distinct numeric type that requires explicit conversion into NumericType, while retaining all the traits of NumericType, just substituting in them NumericType with MyType, it would be great.
I think, that all the complexities described in the article stem from the attempt to create an universal instrument that can do everything and to keep bees. The real difficulty is to pick a small subset of wants, that will cover 80% of needs, while being really simple. I see no issues with occasional .into::<Length<f64>>(), like I see no issues with (my_struc.index as usize), if I keep index as u8 to spare memory, but use it as usize of course. I see no need in different units for the same quantity, because in any case I'd want to convert everything into the same uniform units before I start adding and multiplying. But I'd like to have restrictions on available operations between different types, and I'd like to have a possibility to add optional dynamic checks for type value, that I could turn on for a debug build and turn off for a release one. For example, I'd like to check that any probability p fits into [0, 1].
sieabahlpark
[dead]
pwdisswordfishz
[flagged]
sagacity
It's unfortunate that this is the only takeaway of the article for you.
voidfunc
[flagged]
skavi
Well sure they matter, but “yaoioum” wasn’t really the main point of the post. It was more of a threat; “add refinement types to Rust, or yoric will continue building poorly named static analyzers.”
Units might seem simple but they have a ton of edge cases. Do you want to be able to add inches and feet? Be careful about potential precision/rounding issues. What is the unit for a temperature delta? You can’t simply keep the original unit (eg C or F) because conversion from F to C is a different rule than ΔF to ΔC. Etc.
Units do prevent bugs in programs, so they have an important role to play. But they also need to be designed very carefully.
Java adopted units via JSR 385 (https://belief-driven-design.com/java-measurement-jsr-385-21...)