Luau is the primary programming language to place the facility of semantic subtyping within the fingers of tens of millions of creators.
Minimizing false positives
One of many points with sort error reporting in instruments just like the Script Evaluation widget in Roblox Studio is false positives. These are warnings which can be artifacts of the evaluation, and don’t correspond to errors which may happen at runtime. For instance, this system
native x = CFrame.new() native y if (math.random()) then y = CFrame.new() else y = Vector3.new() finish native z = x * y
experiences a kind error which can not occur at runtime, since CFrame
helps multiplication by each Vector3
and CFrame
. (Its sort is ((CFrame, CFrame) -> CFrame) & ((CFrame, Vector3) -> Vector3)
.)
False positives are particularly poor for onboarding new customers. If a type-curious creator switches on typechecking and is straight away confronted with a wall of spurious pink squiggles, there’s a sturdy incentive to instantly swap it off once more.
Inaccuracies in sort errors are inevitable, since it’s inconceivable to determine forward of time whether or not a runtime error shall be triggered. Kind system designers have to decide on whether or not to stay with false positives or false negatives. In Luau that is decided by the mode: strict
mode errs on the facet of false positives, and nonstrict
mode errs on the facet of false negatives.
Whereas inaccuracies are inevitable, we attempt to take away them at any time when potential, since they lead to spurious errors, and imprecision in type-driven tooling like autocomplete or API documentation.
Subtyping as a supply of false positives
One of many sources of false positives in Luau (and plenty of different comparable languages like TypeScript or Stream) is subtyping. Subtyping is used at any time when a variable is initialized or assigned to, and at any time when a operate known as: the sort system checks that the kind of the expression is a subtype of the kind of the variable. For instance, if we add sorts to the above program
native x : CFrame = CFrame.new() native y : Vector3 | CFrame if (math.random()) then y = CFrame.new() else y = Vector3.new() finish native z : Vector3 | CFrame = x * y
then the sort system checks that the kind of CFrame
multiplication is a subtype of (CFrame, Vector3 | CFrame) -> (Vector3 | CFrame)
.
Subtyping is a really helpful function, and it helps wealthy sort constructs like sort union (T | U
) and intersection (T & U
). For instance, quantity?
is applied as a union sort (quantity | nil)
, inhabited by values which can be both numbers or nil
.
Sadly, the interplay of subtyping with intersection and union sorts can have odd outcomes. A easy (however slightly synthetic) case in older Luau was:
native x : (quantity?) & (string?) = nil native y : nil = nil y = x -- Kind '(quantity?) & (string?)' couldn't be transformed into 'nil' x = y
This error is brought on by a failure of subtyping, the outdated subtyping algorithm experiences that (quantity?) & (string?)
just isn’t a subtype of nil
. This can be a false optimistic, since quantity & string
is uninhabited, so the one potential inhabitant of (quantity?) & (string?)
is nil
.
That is a synthetic instance, however there are actual points raised by creators brought on by the issues, for instance https://devforum.roblox.com/t/luau-recap-july-2021/1382101/5. Presently, these points largely have an effect on creators making use of subtle sort system options, however as we make sort inference extra correct, union and intersection sorts will turn into extra widespread, even in code with no sort annotations.
This class of false positives not happens in Luau, as we’ve got moved from our outdated method of syntactic subtyping to another referred to as semantic subtyping.
Syntactic subtyping
AKA “what we did earlier than.”
Syntactic subtyping is a syntax-directed recursive algorithm. The fascinating instances to take care of intersection and union sorts are:
- Reflexivity:
T
is a subtype ofT
- Intersection L:
(T₁ & … & Tⱼ)
is a subtype ofU
at any time when among theTᵢ
are subtypes ofU
- Union L:
(T₁ | … | Tⱼ)
is a subtype ofU
at any time when all theTᵢ
are subtypes ofU
- Intersection R:
T
is a subtype of(U₁ & … & Uⱼ)
at any time whenT
is a subtype of all theUᵢ
- Union R:
T
is a subtype of(U₁ | … | Uⱼ)
at any time whenT
is a subtype of among theUᵢ
.
For instance:
- By Reflexivity:
nil
is a subtype ofnil
- so by Union R:
nil
is a subtype ofquantity?
- and:
nil
is a subtype ofstring?
- so by Intersection R:
nil
is a subtype of(quantity?) & (string?)
.
Yay! Sadly, utilizing these guidelines:
quantity
isn’t a subtype ofnil
- so by Union L:
(quantity?)
isn’t a subtype ofnil
- and:
string
isn’t a subtype ofnil
- so by Union L:
(string?)
isn’t a subtype ofnil
- so by Intersection L:
(quantity?) & (string?)
isn’t a subtype ofnil
.
That is typical of syntactic subtyping: when it returns a “sure” consequence, it’s appropriate, however when it returns a “no” consequence, it may be incorrect. The algorithm is a conservative approximation, and since a “no” consequence can result in sort errors, this can be a supply of false positives.
Semantic subtyping
AKA “what we do now.”
Moderately than pondering of subtyping as being syntax-directed, we first think about its semantics, and later return to how the semantics is applied. For this, we undertake semantic subtyping:
- The semantics of a kind is a set of values.
- Intersection sorts are regarded as intersections of units.
- Union sorts are regarded as unions of units.
- Subtyping is regarded as set inclusion.
For instance:
Kind | Semantics |
---|---|
quantity |
{ 1, 2, 3, … } |
string |
{ “foo”, “bar”, … } |
nil |
{ nil } |
quantity? |
{ nil, 1, 2, 3, … } |
string? |
{ nil, “foo”, “bar”, … } |
(quantity?) & (string?) |
{ nil, 1, 2, 3, … } ∩ { nil, “foo”, “bar”, … } = { nil } |
and since subtypes are interpreted as set inclusions:
Subtype | Supertype | As a result of |
---|---|---|
nil |
quantity? |
{ nil } ⊆ { nil, 1, 2, 3, … } |
nil |
string? |
{ nil } ⊆ { nil, “foo”, “bar”, … } |
nil |
(quantity?) & (string?) |
{ nil } ⊆ { nil } |
(quantity?) & (string?) |
nil |
{ nil } ⊆ { nil } |
So based on semantic subtyping, (quantity?) & (string?)
is equal to nil
, however syntactic subtyping solely helps one path.
That is all positive and good, but when we wish to use semantic subtyping in instruments, we’d like an algorithm, and it seems checking semantic subtyping is non-trivial.
Semantic subtyping is tough
NP-hard to be exact.
We are able to cut back graph coloring to semantic subtyping by coding up a graph as a Luau sort such that checking subtyping on sorts has the identical consequence as checking for the impossibility of coloring the graph
For instance, coloring a three-node, two coloration graph may be executed utilizing sorts:
sort Crimson = "pink" sort Blue = "blue" sort Colour = Crimson | Blue sort Coloring = (Colour) -> (Colour) -> (Colour) -> boolean sort Uncolorable = (Colour) -> (Colour) -> (Colour) -> false
Then a graph may be encoded as an overload operate sort with subtype Uncolorable
and supertype Coloring
, as an overloaded operate which returns false
when a constraint is violated. Every overload encodes one constraint. For instance a line has constraints saying that adjoining nodes can not have the identical coloration:
sort Line = Coloring & ((Crimson) -> (Crimson) -> (Colour) -> false) & ((Blue) -> (Blue) -> (Colour) -> false) & ((Colour) -> (Crimson) -> (Crimson) -> false) & ((Colour) -> (Blue) -> (Blue) -> false)
A triangle is comparable, however the finish factors additionally can not have the identical coloration:
sort Triangle = Line & ((Crimson) -> (Colour) -> (Crimson) -> false) & ((Blue) -> (Colour) -> (Blue) -> false)
Now, Triangle
is a subtype of Uncolorable
, however Line
just isn’t, for the reason that line may be 2-colored. This may be generalized to any finite graph with any finite variety of colours, and so subtype checking is NP-hard.
We take care of this in two methods:
- we cache sorts to scale back reminiscence footprint, and
- hand over with a “Code Too Advanced” error if the cache of sorts will get too massive.
Hopefully this doesn’t come up in follow a lot. There may be good proof that points like this don’t come up in follow from expertise with sort methods like that of Customary ML, which is EXPTIME-complete, however in follow you need to exit of your option to code up Turing Machine tapes as sorts.
Kind normalization
The algorithm used to determine semantic subtyping is sort normalization. Moderately than being directed by syntax, we first rewrite sorts to be normalized, then test subtyping on normalized sorts.
A normalized sort is a union of:
- a normalized nil sort (both
by no means
ornil
) - a normalized quantity sort (both
by no means
orquantity
) - a normalized boolean sort (both
by no means
ortrue
orfalse
orboolean
) - a normalized operate sort (both
by no means
or an intersection of operate sorts) and so forth
As soon as sorts are normalized, it’s easy to test semantic subtyping.
Each sort may be normalized (sigh, with some technical restrictions round generic sort packs). The vital steps are:
- eradicating intersections of mismatched primitives, e.g.
quantity & bool
is changed byby no means
, and - eradicating unions of capabilities, e.g.
((quantity?) -> quantity) | ((string?) -> string)
is changed by(nil) -> (quantity | string)
.
For instance, normalizing (quantity?) & (string?)
removes quantity & string
, so all that’s left is nil
.
Our first try at implementing sort normalization utilized it liberally, however this resulted in dreadful efficiency (complicated code went from typechecking in lower than a minute to working in a single day). The explanation for that is annoyingly easy: there may be an optimization in Luau’s subtyping algorithm to deal with reflexivity (T
is a subtype of T
) that performs an inexpensive pointer equality test. Kind normalization can convert pointer-identical sorts into semantically-equivalent (however not pointer-identical) sorts, which considerably degrades efficiency.
Due to these efficiency points, we nonetheless use syntactic subtyping as our first test for subtyping, and solely carry out sort normalization if the syntactic algorithm fails. That is sound, as a result of syntactic subtyping is a conservative approximation to semantic subtyping.
Pragmatic semantic subtyping
Off-the-shelf semantic subtyping is barely completely different from what’s applied in Luau, as a result of it requires fashions to be set-theoretic, which requires that inhabitants of operate sorts “act like capabilities.” There are two the explanation why we drop this requirement.
Firstly, we normalize operate sorts to an intersection of capabilities, for instance a horrible mess of unions and intersections of capabilities:
((quantity?) -> quantity?) | (((quantity) -> quantity) & ((string?) -> string?))
normalizes to an overloaded operate:
((quantity) -> quantity?) & ((nil) -> (quantity | string)?)
Set-theoretic semantic subtyping doesn’t help this normalization, and as an alternative normalizes capabilities to disjunctive regular type (unions of intersections of capabilities). We don’t do that for ergonomic causes: overloaded capabilities are idiomatic in Luau, however DNF just isn’t, and we don’t wish to current customers with such non-idiomatic sorts.
Our normalization depends on rewriting away unions of operate sorts:
((A) -> B) | ((C) -> D) → (A & C) -> (B | D)
This normalization is sound in our mannequin, however not in set-theoretic fashions.
Secondly, in Luau, the kind of a operate utility f(x)
is B
if f
has sort (A) -> B
and x
has sort A
. Unexpectedly, this isn’t at all times true in set-theoretic fashions, attributable to uninhabited sorts. In set-theoretic fashions, if x
has sort by no means
then f(x)
has sort by no means
. We don’t wish to burden customers with the concept operate utility has a particular nook case, particularly since that nook case can solely come up in lifeless code.
In set-theoretic fashions, (by no means) -> A
is a subtype of (by no means) -> B
, it doesn’t matter what A
and B
are. This isn’t true in Luau.
For these two causes (that are largely about ergonomics slightly than something technical) we drop the set-theoretic requirement, and use pragmatic semantic subtyping.
Negation sorts
The opposite distinction between Luau’s sort system and off-the-shelf semantic subtyping is that Luau doesn’t help all negated sorts.
The widespread case for wanting negated sorts is in typechecking conditionals:
-- initially x has sort T if (sort(x) == "string") then -- on this department x has sort T & string else -- on this department x has sort T & ~string finish
This makes use of a negated sort ~string
inhabited by values that aren’t strings.
In Luau, we solely enable this sort of typing refinement on check sorts like string
, operate
, Half
and so forth, and not on structural sorts like (A) -> B
, which avoids the widespread case of normal negated sorts.
Prototyping and verification
In the course of the design of Luau’s semantic subtyping algorithm, there have been adjustments made (for instance initially we thought we had been going to have the ability to use set-theoretic subtyping). Throughout this time of fast change, it was vital to have the ability to iterate shortly, so we initially applied a prototype slightly than leaping straight to a manufacturing implementation.
Validating the prototype was vital, since subtyping algorithms can have surprising nook instances. Because of this, we adopted Agda because the prototyping language. In addition to supporting unit testing, Agda helps mechanized verification, so we’re assured within the design.
The prototype doesn’t implement all of Luau, simply the purposeful subset, however this was sufficient to find refined function interactions that may most likely have surfaced as difficult-to-fix bugs in manufacturing.
Prototyping just isn’t good, for instance the primary points that we hit in manufacturing had been about efficiency and the C++ normal library, that are by no means going to be caught by a prototype. However the manufacturing implementation was in any other case pretty easy (or at the least as easy as a 3kLOC change may be).
Subsequent steps
Semantic subtyping has eliminated one supply of false positives, however we nonetheless have others to trace down:
- Overloaded operate functions and operators
- Property entry on expressions of complicated sort
- Learn-only properties of tables
- Variables that change sort over time (aka typestates)
The hunt to take away spurious pink squiggles continues!
Acknowledgments
Because of Giuseppe Castagna and Ben Greenman for useful feedback on drafts of this publish.
Alan coordinates the design and implementation of the Luau sort system, which helps drive most of the options of improvement in Roblox Studio. Dr. Jeffrey has over 30 years of expertise with analysis in programming languages, has been an energetic member of quite a few open-source software program initiatives, and holds a DPhil from the College of Oxford, England.