lördag 20 juni 2009

GSoC status report, week 4

Just a quick note to get something in the HWN. :-)

As promised last week, I've implemented "everything" now, and I have released a series of release candidates for an actual stable version 1.0.0. All help in testing it for bugs will be greatly appreciated!

lördag 13 juni 2009

GSoC status report, week 3

Maybe I can get one of these written before the weekly news go up this time... 

My first milestone for haskell-src-exts is to have full and correct support for (almost) everything code-related. When I'm there, I will release haskell-src-exts 1.0.0. And I'm really quite close now.

On the issue of extensions, haskell-src-exts now supports nearly all extensions implemented by GHC. The two exceptions are UnicodeSyntax and NewQualifiedOperators, both of which are considered low priority. Neither should be hard to fix, but other things take precedence currently. There is also ExtensibleRecords, RestrictedTypeSynonyms and HereDocuments, all Hugs extensions, which are not yet supported. These are also considered less important for now.

On the issue of parametrisation, I've had to restructure quite a bit to enable the proper checks for types and contexts. In particular the fact that contexts are parsed as types, and passed in a ctype, means I've had to implement a two-tier check that they are correct. First to check that the type actually represents a valid context, which also transforms it into a list of assertions. Second to check that the assertions are correct in the context they are used. With FlexibleInstances on then anything goes, but without it there's a difference between contexts in class and instance declarations compared to elsewhere. This two-tier system is now in place, and seems to work just fine. In the process, it also gets rid of the wart that was the TyPred constructor in the Type data type, just like I previously got rid of e.g. Wildcard in the Exp data type. I'll probably write a separate entry about this stuff.

I've also improved the interface a lot, with multiple entry points to the parser and some tidying up of the top-level functions. In particular, parseFile now looks at the LANGUAGE pragmas to see what extensions to look for while parsing the rest.

So what remains for 1.0.0? Two things basically. First there are a few bugs in the trac left to squash, which (with one ugly but uncommon exception) shouldn't be much work. Second, there's that issue of handling operator fixities. That shouldn't be a lot of work either, so I have good hopes of having a release candidate out some time early next week. Stay tuned!

tisdag 9 juni 2009

What's in a forall?

Look at the following module:

module Foo where
type Foo = Int
Is this module syntactically correct? I doubt anyone would say no to that, it's just a standard type synonym declaration, nothing strange. Fully Haskell 98 compliant too.

What about this module then:
module Foo where
type Foo = Bar
Is it syntactically correct? Sure it is, the fact that Bar isn't in scope is completely irrelevant where syntax is concerned. Loading it into GHCi would give a
Not in scope: type constructor or class `Bar'
error, which comes from the name resolver/type inference stage, not from the parsing stage. There's a clear cut line between a syntactic error and a semantic error here.

Let's look at yet another module:
module Foo where
type Foo = forall a . a -> a
Is it syntactically correct? Obviously that depends on your angle - it isn't Haskell 98 due to the explicit forall, but GHC would accept it readily with the proper extension enabled, namely Rank2Types or RankNTypes. But does that mean that this is only syntactically correct when one of those extensions are enabled? Let's see.

With no extension flags given, GHCi refuses the module with the error message
Illegal operator `.' in type `forall a . (a -> a)'
It doesn't recognize the explicit quantification syntax, and instead thinks that . is a type-level operator, which isn't allowed since I don't have TypeOperators on. Just for fun, trying it with TypeOperators enabled makes GHCi complain that the names forall and a are not in scope. I should add that in both these cases, GHCi actually also suggests that I might be looking for something like Rank2Types, since it cleverly recognizes the forall word. At any rate, with no extension given or with only TypeOperators it is clear that we don't get the syntax we're after.

With either Rank2Types or RankNTypes enabled, GHCi obviously correctly recognizes the forall as a quantifier and will accept the program. But those two flags are not the only ones that enable the forall syntax. In fact there are three other flags that do - ExistentialQuantification, LiberalTypeSynonyms and PolymorphicComponents. If we try to load our module into GHCi with either of these enabled, we get a different error message:
Illegal polymorphic or qualified type: 
forall a. a -> a
Apparently GHCi recognized and accepted the syntax of the type declaration, but the type checker balked at it because we haven't enabled sufficiently liberal types. But this is not a syntactic issue, but something else entirely, at least where GHCi is concerned.

I don't fully agree with GHCi here though, because of ExistentialQuantification. For the two other cases - LiberalTypeSynonyms and PolymorphicComponents - I think it's perfectly fine that forall-quantified types are made generally available syntactically, even if the type checker will exclude some programs that try to use them in the wrong ways. Both those extention deal with how and where to allow forall-quantified types, the first as arguments to type constructors and the second as arguments to data constructors. But for ExistentialQuantification, the category of syntactically allowed programs is actually different. What ExistentialQuantification allows us is to write forall quantifiers that precede constructors in data type declarations, e.g:
module Bar where
data Bar = forall a . Bar a
Indeed when I try to run this with no extentions enabled, GHCi complains that forall is not a viable data constructor and suggests that I use ExistentialQuantification. I would definitely consider it a syntactic issue to allow the forall keyword to precede data constructor declarations like this, and hence I don't agree with GHCi.

On a side note, something funny happens when I turn on e.g. PolymorphicComponents which enables forall as a keyword. Then GHCi complains that
Data constructor `Bar' has existential 
type variables, or a context
and suggests I use ExistentialQuantification or GADTs to allow this. Indeed, GADTs are a generalisation of ExistentialQuantification, except that it doesn't enable forall as a keyword. So using any extension that enables the forall keyword, plus GADTs, or just ExistentialQuantification, will make GHCi accept the program.

Going back to types, what about the following module:
module Bar where
import Foo (Foo)
type Bar = (Foo -> Int) -> Int
Syntactically correct? Absolutely, there's nothing strange here at all, fully Haskell 98 compliant. Will GHCi accept it? That depends on Foo. If the type Foo is defined as we did earlier as an explicitly polymorphic type, then Bar is a rank-3 type, which requires the RankNTypes extension. The correctness of Bar doesn't depend on module Bar alone, but on its dependent module Foo. Even if I enable Rank2Types for Bar (and Foo of course, which needs it), I still get an error since this type is actually rank 3. And in fact, if I inline Foo in the definition of Bar to instead write
module Bar where
type Bar = ((forall a . a -> a) -> Int) -> Int
GHCi gives me the exact same error message (except of course it mentions the explicit polymorphic type instead of the equivalent Foo):
Illegal polymorphic or qualified type:
forall a. a -> a
So clearly this last program is also syntactically correct, as long as the forall keyword is enabled, just like the simple one that imported Foo. Neither is type correct without RankNTypes, but both are syntactically correct, assuming explicit quantification is enabled at all.

What I'm trying to get at here with all of this is that it's not easy to draw a line between what constitutes a syntactic error and what is rather a semantic error when it comes to forall-quantified types. There are a number of different stances to take here, and I need to pick one for haskell-src-exts to implement.

The core issue is this: Is there a general syntactic difference between programs accepted with Rank2Types and programs accepted with RankNTypes? It is certainly possible to syntactically detect some programs that would require the full RankNTypes, by inspection of the types used. If a type has a higher rank than allowed, and this can be seen right in the syntax by inspecting the arguments to the function arrows, then it would be possible to rule the program out. But as shown by the import example above, there is no way we can syntactically see the difference between these classes of programs in general, we would need to do actual type analysis (and import chasing) for that. Fundamentally it's simply not a syntactic issue.

The question then becomes, to what extent do we bother to even try to detect the difference for those cases where it is possible? I don't think we should, at all. There may not be a clean cut separation between syntactic and type/semantic issues at all times, but I feel we are best off trying to adhere to what separation we can, and making it as clear cut as possible. The principle of least surprise suggests to me that it would be better to not try to be clever, and instead document clearly what haskell-src-exts considers actual syntax and what is left for other analyses.

The stance I intend to take is thus the following:
  • Allow forall-quantified types in all shapes and forms as long as one of Rank2Types, RankNTypes, LiberalTypeSynonyms and PolymorphicComponents is enabled.

  • Allow forall-quantified data constructor declarations (using normal non-GADT-style syntax) if ExistentialQuantification is enabled. GADTs would not lexically enable the forall keyword, and even when coupled with another extension that does I don't think it makes sense to let it enable this syntax.

  • Don't make any other difference between any of these classes of programs in what is allowed and what isn't.
If you think I am making the wrong choice here, please speak up, I would love to get input on this. And of course, please speak up if you agree with me as well. :-)