面向语言的编程 - 懵懵灯灯的BLOG

面向语言的编程

Preview Lobby (On-Demand)

Event Title:
Language-oriented Programming and Language Workbenches: Shifting Paradigms (EVENT: 92643)

Event Date: August 17, 2007 at 03:42 PM Eastern Daylight Time

Session Date: August 17, 2007 at 03:42 PM Eastern Daylight Time

Launch Presentation (Real Video)

Transcription:

Neal Ford: So, good morning everybody. I am Neal Ford. This is Martin Fowler. We are going to be fighting our slide show all morning this morning. So, that should be interesting. We are here to talk about language-oriented programming and here are some questions that we are going to answer.

Why is there so much XML mixed in with my Java code? Why would not everybody shut up already about Ruby on Rails? It seems like any time you talk to someone about Ruby on Rails they are irrationally exuberant about it and there has got to be some reason for that and why do things like aspects exist? Why do we actually need something like aspects if Java is enough and is there an evolutionary step beyond object-oriented programming? Is the abstraction layer we have been using so far, is that sufficient for the problems that we are trying to solve today and what the heck is language-oriented programming anyway? That is what Martin and I am here to talk about is this idea of language-oriented programming.

For the past 20 years or so, we have been trying to model the world with trees. That is the way we model the world in object-oriented languages is build hierarchies, trees, trees of how we model things in code and it turns out that that works out pretty well because most of the world is tree shaped. Most of the world is hierarchical. You can fit things into trees pretty easily but that abstraction breaks down sometimes because a lot of times, we try to model the real world. We model it like this with a nice tree abstract picture of trees, and of course this is idealized. This is not the way the world really looks. The world really looks like this, tangled branches and interconnections and all sorts of other things that are really, really hard to model in these kind of idealized pictures. So what we have been trying to do is model the world with hierarchies but hierarchies fall down at some point and so we invent things like aspects. The red line that you see here represents aspects, which cut through the tree-shaped hierarchies that we built to try to model things in the world. But that just adds complexity to the problem we are trying to solve, and one of the things that we are trying to kind of kill off when we do abstractions is to try kill off complexity. Well, how have we done this in the past? If you look, for example, at assembly language, how many people here still write assembly language for their day job? Yeah, that is what I thought. Nobody writes in that any more because it is too low a level of abstraction. So, what we have always done in the past is take our abstractions and raise them a few levels. We do not write in assembly language any more because it just takes too long to get anything done. It is way too low a level of abstraction and so we build abstractions on top of that. In fact, if you think about your hard drive, it is really just a spinning platter with 1s and 0s on it. We never think of it that way either. We have all these nice metaphors and abstractions on top of it. So, what we are suggesting is that maybe it is time to upgrade our abstraction layer one more step toward language rather than just hierarchies to be able to represent stuff.

Martin Fowler: So, one, one…sorry.

Neal Ford: Go ahead. Go ahead.

Martin Fowler: One way of thinking about this is that the object-oriented stuff and the kinds of abstractions that we build with our object-oriented thinking are abstractions of really allowing us to build up a vocabulary. We are able to create our own words that talk about the problem space that we are working with. When we think about our languages in a way we talk to each other, it is not just about vocabulary, it is also about how we put the words together, the grammar of how we speak, and so one of the things we are beginning to think about is okay, we know how to now build up these vocabularies, how can we better do the combination of these things? How can we start thinking about the grammar side as well as the vocabulary side? Neal Ford: What we are talking about here is changing abstraction mechanisms modeling the world with the language instead of hierarchies, because in some cases language actually makes a better modeling mechanism than hierarchies do.

Well, what we are talking about here is using abstractions from the past, objects and generics and aspects and all those things to build better abstractions. We are not talking about throwing away object-oriented programming. Clearly, object-oriented programming has done a lot of good stuff for us, but for a lot of the problems we are trying to solve it still resides at too low a level of abstraction and that is what we are trying to do is raise our abstraction level one more step higher, which is what we have always done to solve problems in the computer science world.

So, why language? Why choose this as our new favorite form of abstraction mechanism? Well, it turns out that the human brain is really, really good at supplying context and context is a really important concept when you are talking about using languages in abstraction mechanism. Here is a classic example of a DSL: an Iced Decaf Triple Vanilla Skim with whip latte. Of course, this is the Starbucks DSL. This is how you order coffee at Starbucks. In fact, when new employees come to work at Starbucks, the first thing they have to do is learn the Starbucks DSL and if you say it to them incorrectly they repeat it back to you in the correct format. There is a very strict structure to the way people at Starbucks talk and there are all these rules, there are over a million combinations of possible ways you can order coffee at Starbucks and have a very exacting way of talking about that and I will let Martin talk about this one because I have no idea what this means.

Martin Fowler: Exactly, but the point is not so much the content of the example as the notion that it is again a combination of words for a specific contextual area arranged in a particular way but how that carries that meaning and you have a sense also with this but there is a sense of flow with this, but as you use and combine things together, you are able to express yourself much more clearly and you need both the vocabulary and the way of combining things to make that work. Neal Ford: This is really just a shorthand mechanism for human communication. When you think about this, this cricket example, if you were talking with your friends about cricket, think how cumbersome it would be to start at first principles all the time. There is a religion called sport, which is where you gather groups of people together as a team, and they play each other on something called a pitch and there is a ball with a bat. I mean it would take so long to have any kind of conversation that it would be useless to have the conversation and yet, that is exactly what we do when we talk to APIs and frameworks. We start at the lowest level possible of understanding and have to explain every single detail in code to our framework because it does not have any sort of context built into it, including your own business. If you are a Java ace and you go to work for a new business, day one, the hardest job that you face is learning the DSL for the business that you are going to work for. Every business has their own domain-specific language and it is very tightly keyed to the kind of problems they solve in their business and the kind of work that you do.

Martin Fowler: And this is where the tying between this kind of thinking comes with the classic object-oriented ideas of domain modeling or indeed a lot of what database people do with data modeling. In all of these cases, you are trying to understand some particular business domain and build up that vocabulary and talk about how the various ideas fit together. But what these techniques have not typically done is talked about how do we express these combinations. Again, it is the vocabulary is what we focussed on and what we have yet to focus on is the grammar side.

Neal Ford: So, all complex human endeavors have their own DSL. It is all about this implicit context that the brain is really, really good at supplying when you have a conversation. If someone not in your line of work has a discussion with you, they do not have to start over from first principles every time because you share that same context and that is what we are trying to convey to our APIs and frameworks in the computer world because people are really, really good at recognizing implicit content. Our brains are wired up specifically to be able to do that. So what we are talking about here, you may say, well, that is just another kind of API. What is the distinguishing factor between a DSL and a API? I am going to show you a couple of examples of this. Here is a good example of an API. This is what it would look like if you have to go order coffee at Starbucks using a Java framework for ordering coffee, and the interesting thing about this example you will notice is Coffee latte=new Coffee(Size,VENTI), latte.setFatContent, latte notice how much repetition there is there. We repeat the object over and over and over again because it is like Java is a completely context-free language. You have to tell it over and over again what the intent is for the code that we’re executing here. It is almost like Java severely retarded and we have to repeat ourselves over and over again to say this is what I want you to do. Remember this is what we are talking about still. It is still the same context… versus the DSL which has an implicit context, notice that the word coffee never actually shows up here and yet most people when they start reading this, when they get two or three words into it understand there is an implicit context here that we are actually talking about coffee. DSLs always have an implicit context that shows up either not at all or shows up in a very, very light way and usually at the most one time, so that you do not have to supply that context over and over again.

Martin Fowler: And what is happening here is further development in something that has been a constant part of the software development space for quite a while, which is what we have been doing is concentrating on how do we make the code we write, more readable and more expressive? I remember 10, 15 years ago, talking about people, arguing with some people about whether it is worth putting a lot of effort into good naming of objects and methods. How important is it where you have methods that really convey what you do? And increasingly over the years people have realized that it is important to have very clear method names, very clear class names. But by thinking about how you name things well, you can reduce the need for other forms of documentation, and that clarity of code becomes very important. Now, with the interest in DSLs people are beginning to say, “Okay, we have got that part of things sorted out”, but if we look at the example on the previous slide, it is still hard to read, because of all of this repetition, because of this lack of context. So, how can we take a step forward to again make things a lot more readable and clear? Because the real art of programming in the computer is not the communication with the computer, it is the communication with other human beings who’re going to have to read that code now and in the future. I have often liked to be quoted on saying that any damn fool can write a program that a computer can understand, but good programmers write code that humans understand and this is really part of this, “how do we communicate with humans” drive?

Neal Ford: And in fact if you look at this previous example that we are talking about, this is one of the reasons you business people do not like to try to read source code, because it is obfuscated because there is so much repeated context. The meaning here gets lost in noise. All end users who do not understand Java see as noise here drowning out the actual content of what you are actually talking about whereas this is very boiled down, this is the way that people actually talk, not the way that people usually write code. So, let us talk about some nomenclature, let us create some definitions here, and this is actually Martin’s definition of a domain-specific language, a limited form of computer language designed for a specific class of problem.

Martin Fowler: I would not call it my definition. This has been around in the software world for quite a while. One of the things that we will see, is that domain-specific languages, as a term has very fuzzy boundaries. There are things that are very clearly a domain-specific language, and things that are very clearly not a domain-specific language, but there is a very large overlap area. It is like classifying something as blue or green where things are clearly blue, things are clearly green, but there are some colors where I can argue with my wife endlessly about whether it is blue or green, and that is definitely the case with domain-specific languages. But, one key property of domain-specific languages is that they have a narrow focus. You could not write a programming system entirely using a single domain-specific language, because the range is just too small. The idea is that you combine one or multiple domain-specific languages with other domain-specific languages and usually with general-purpose programming language as well, in order to actually to get stuff done. Neal Ford: And this is actually one of the places where Martin and I disagree very slightly because Martin I do not believe thinks that this Starbuck’s DSL is actually falls in the definition of domain-specific language.

Martin Fowler: Yeah, I tend to use domain-specific language to specifically mean a software language. So, it is only something that we can actually execute on the computer, and when we talk about things that humans use, that we have not necessarily formalized to that degree, I tend to use a term like domain language or something a little bit broader, so as to make a distinction between the specific software construct and the more fuzzy real world construct.

Neal Ford: I am much more liberal with my definition. I believe that any language that describes a problem domain is a domain-specific language and we just agree to disagree about that.

Martin Fowler: Yeah. You are just wrong.

Neal Ford: Yes. The other term and, I think this is actually your term.

Martin Fowler: No, this is not my term either.

Neal Ford: So, you’ve stolen this one as well.

Martin Fowler: I try my best not to invent new terms, but you cannot help it from time to time. So, language-oriented programming, I first came across this term from an article by Sergey Dmitriev, he is one of the founders of JetBrains, who came up with the IntelliJ tool. As we will see shortly they are working on some very interesting stuff in this space and he got the term from some obscure academic paper. So, I do not know where it came from, but I like the term very much because it talks about this shift of moving from thinking about vocabulary, which is objects, to the notion of a language that combines vocabulary and grammar, and so I felt language-oriented programming was a good term. Also, I was very looking very much for a generic term that would stretch across a whole range of different styles that was not owned by a company. And so language-oriented programming seemed to make a good fit for that because it is not something that is tied to a particular product. Neal Ford: And this also kind of ties in the idea that what we are talking about here is an evolutionary step beyond just object-oriented programming, actually using objects as building blocks is the next abstraction layer, which is to use languages as an abstraction mechanism. There are a couple of different fundamental types of DSL, as you can pretty much separate DSLs into two broad categories, one of which is internal DSL built using an underlying syntax of base language.

Martin Fowler: Right. So, it is easy to quickly define both of them first I think. So, I use the terms internal and external and these are terms I decided to use, where external is minilanguages in the UNIX tradition and internal are languages that are really expressions within a programming language, but done in a sort of language feel to the way. So, with an internal DSL you are completely operating within your host language. If you are programming in Java, your DSL is Java. If you are programming in Ruby your DSL is Ruby. One of the strongest traditions of this style of programming is in the Lisp world. You talk to Lisp people about how they construct Lisp programs, and they often think in terms of building up languages and this is one of the many reasons why this kind of stuff is seen as very, very old school, because Lisp people were doing this 30, 40 years ago. So, it is nothing new in that sense, but the new thing is perhaps putting more attention into it, particularly in places where people have not put much attention to it in the past.

Neal Ford: In fact, I think this is so prevalent in Lisp, because Lisp as a language is so horrific to actually code in, the first thing you want to do is hide that under some extra layers of abstraction, so you can hopefully get away from all the parentheses and other sort of stuff. So, they almost had to invent this style of programming to get away from the core syntax, because it is so confusing, it is so daunting. Martin Fowler: All the Lisp people out there, he said that, not me.

Neal Ford: An external DSL is the opposite kind of DSL, which is built using your own grammar and a lexer and a parser generated code of some kind.

Martin Fowler: And this is the traditional UNIX style little language, where you have to say how something operates and you configure a little language to work things through. Very, very common in the UNIX world, and very often UNIX people will talk about how they will put together some little language in order to drive a particular programming or will configure a particular programming environment and I think of nothing to pulling out lex and yacc and twisting them together and producing the stuff that you have.

Neal Ford: So, let us look at some examples of internal DSLs. Is anybody already doing this in the mainstream world? If you discount the very interesting stuff that has been there in the Lisp community around this area, is anybody really doing this for real today right now? Well, a good example of this is Ruby on Rails. If you look at Ruby on Rails code, it is very, very declarative. In fact, most of the code that you see here is not technically Ruby code at all. It is this DSL they have written in Rails. Ruby is actually a very popular target for building internal DSLs because the language has very, very loose syntax rules and you can get by with a lot of stuff that you cannot get up by with in a more strongly typed, statically-typed language that has much stricter rules about its syntax and the way that its code looks. Martin Fowler: The important thing here is that, I do not know how familiar you are with Ruby, this is actually old valid Ruby code, but it feels like a different kind of language. It is like you have invented whole new key words and ways of putting them together. So, as a result, you feel like you are in a different language to the actual Ruby language itself. And at one level, yes it is all Ruby. But it does not feel quite like that, and here again, we are talking about this very fuzzy thing about what is the difference with an internal DSL and what is the difference between that and an API. The fuzzy boundary for internal DSLs is between APIs and the language, but the essence of it is, I think, a sense of you just do not feel as if you are in regular Ruby. You feel like you have extended the language in this case, and to do something slightly different. And it is partly about the syntax as Neal said, but it is also partly about the features of the language, certain programming constructs. In particular, the ability to have closures are very, very useful for doing this kind of thing. And one of the things that makes Ruby very interesting and particularly in comparison with something like Lisp, is that Lisp gives you a limited set of mechanisms to work with but those mechanisms work really, really well. Ruby gives you a very wide range of mechanisms. Some of the work that I have been doing, experimenting for a book, I am working on this topic, I took a very simple DSL and ended up implementing it 20 odd different ways in Ruby, using different combinations of language constructs. Some of them worked well, some of them worked less well, but the really interesting thing is, how many different ways you can take a different DSL and work it out with Ruby, because of the fact that Ruby gives you so many options to work with. If you are working in Lisp, you have equivalent power, but less range of options and if you are working with Java, you also have less range of options, but also less power, because it does not give you some of these alternatives that you might need. Neal Ford: In fact, as Martin said one of the things that makes Ruby so effective, and Rails uses it very effectively is this idea of a closure, the last line of code you see here before destroy actually takes in a block of code but it is nicely contextual, so it is very easy to read right in the context of where it is being defined rather than having to create a new class, an anonymous inner class, and attach a lot of handlers and that sort of stuff. A lot of behavior can be defined just in line using closures the way that Ruby supports this idea of closures. And I think that the DSL portion of Rails is one of the reason that people are so irrationally exuberant about it. How many times had someone come up to you and just almost phoned with the mouth that how much they loved using a Java Web framework of some kind, and yet you see these Rails’ guys do this all the time, and part of the reason for that is that the tool they are using - Rails - is perfectly suited for the problem they are trying to solve. This is a domain-specific language for building web applications that offer persistence. It is very, very highly tuned to do that and so the tool fits into your hand really well as you write code in it, there is very little friction between what you want to accomplish and what the tool allows you to do. You do not have to do a lot work around and lots of other stuff, it just, you can express the intent of what you want to do very clearly and very succinctly using the DSL portion of Rails, and that movement should directly toward intent is really important because the more friction you can remove between your intent and the way that you realize that intent is really, really important.

Another good example of where this is being used in the world right now are the expectations in Mock Object Libraries. Virtually every Mock Object Library that you see and I have got a couple of snippets here of JMock, EasyMock does the same thing and even in the .NET world RhinoMock does the same thing. So, does Mocha in the Ruby world. The reason that you see expectations written this way goes exactly to what we were talking about for DSLs. They are a limited problem domain, which we are trying to solve which is to set expectations from Mock Object. Think about how many lines of code this would take if this were written in a more traditional sort of Java API style. You would have at least five different lines of set code here, set this, set this, set this, set this and what that actually does is obscure the intent of what you are trying to do here, which is set an expectation to say that this things expects, this thing wants, with this method, etc., notice how much context has been drained away from setting this expectation in JMock and what you are left with is just the intent of what you are trying to accomplish. Martin Fowler: A few interesting things here with this. Again notice this, this is using Java. We are not using some weird language like Ruby or Lisp to do this kind of thing. So, again it brings out the point that this kind of internal DSL work can be done in a relatively straightforward language. It is also important to notice that the language kind of looks different, even in terms of formatting, in fact most obviously in terms of formatting. Because now we have got these cascades of methods with dots on each line which is formatted very ugly. I remember the first time I looked at code that had been written this way, it kinda looked weird, but you get used to it fairly quickly, and then you begin to appreciate how useful it can be.

It is also worth mentioning at this point as well that I use this term here internal DSL, you will also hear some people and particularly the Mock people also use this, they use the term embedded DSL. In fact, an embedded DSL has a longer usage than internal DSL. I avoid it because it gets confused with embedded languages in the sense that say VBAs is an embedded language in Microsoft word, and so, because embedded has these two meanings, I decided to use internal DSL to focus on these things. It is also important to realize here that this is very much again the way of manipulating an object model. In the end, these are just objects that are being wired together in a particular way. There is no reason you could not use a regular API to do this. Indeed, what is going on under the covers in the JMock library is a regular API, and they originally built it with a traditional API. What they did with this expression syntax that they put over it is they have added a layer over that API that allows you to correct these expectations in a more friendly and readable format.

Neal Ford: If you think about how this is implemented it is not like this is some sort of mysterious rocket science implementation, especially the way this is chained together because in Java for all of your traditional set methods instead of returning void, which is kind of a waste of a perfectly good return value, why not just return this? That allows you to chain together a series of method calls like this and achieve this kind of what we are calling a fluent interface which we will talk about in just a second. In fact, we will talk about it now, fluent interfaces where you treat lines of code as sentences because in English and in most spoken languages a sentence is a complete unit of thought and this idea of a fluent interface comes from readability and I am going to give Martin credit for this term too but he will probably deny this one as well, this fluent interface. Martin Fowler: I get half credit for this.

Neal Ford: Okay.

Martin Fowler: The origin of this came from, I was at a workshop with Eric Evans, you might have heard of Eric Evans. He wrote the excellent book ‘Domain-Driven Design’, and one of the things about myself and Eric is that we are both ex-Smalltalkers, and we did a lot of Smalltalk programming in the mid 90s, in fact that is where we first met, working together on a Smalltalk project. And one of the things that we talked about during the course of the workshop that Eric particularly lamented was the fact that when he worked with APIs in Java, they did not seem to have that same flow that a lot of better Smalltalk APIs seem to have perhaps and we would try and explain what we meant and in that discussion when we came up with the term fluent interface, and we liked the term because it brought up that notion of a language. If you read a regular API, it seems to have that kind of stuttery quality of somebody that does not really speak your language properly the way…probably more lack of the other languages.. being British, I cannot say anything, but you know what I mean in the sense somebody is speaking not a language that they are really comfortable with, and what we wanted to see was much more of this sense of fluency in a flow. And Smalltalk people unlike Lisp people never really talked about defining languages in Smalltalk. They talked about building domain models and putting them together, but they still had, a good number of them still had this very strong notion of trying to make that model really work in this flowing way. I remember sitting down and doing some programming once with Ward Cunningham and was really quite taken aback at the way he would rearrange what I was doing with an API just to make this read and things that fit together much better. I did not really understand what he was doing at the time, it’s something I only really appreciated it later on, and again, it is this push towards readability that does it. And so we felt that fluent interface was a good term for thinking of that. You are taking an API and then making it flow a bit more, and tactics, particular mechanisms like having set methods that return themselves, return the object you have just changed, cascading the method things, those are a common thing to do, and in fact that was default practice in Smalltalk, but that is a mechanism. The real aim at what we are trying to do is to have something that has this readability, this sense of flow, which is a very hard to define thing, it is not a precise thing, but it is what we are trying to achieve. Now, fluent interface and an internal DSL are really two ways of looking at the same thing depending on where you come from, whether you think of yourself as “I am trying to create a language here” or “I just have an API and I am just trying to make it more readable and more useful.”

Neal Ford: And readability is critical. Everybody is well familiar with the statistic that lines of code are read two-and-a-half times as often as they are written. So, even if it takes a little bit longer to write a line of code to make it more readable there is a big payoff at the end because anyone who reads it can actually understand what you are talking about in a much clearer way. So, let us look at a concrete example of building a fluent interface, like I said this does not have to be any sort of rocket science or any kind of brand new technique or anything, this is the kind of code most of us deal with all day every day. This is a traditional sort of API kind of code in Java where you are creating a Car object and you are associating a MarketingDescription with it. So, you create a Car object and you create a MarketingDescription and you setType and setSubType and set all these attributes, and finally, you set the description to that Car object. You can very easily convert this into a fluent API that looks like this, and all this really is Car.describedAs that is instantiating any Car object and each one of the dots here are really just what used to be, the setters that you had in the API style of code, but what you are doing here is actually creating set methods that are aware of the context in which you are operating that allows you to create a sentence out of this Java code and what is nice about this is that your business analyst or whoever is consuming this code now actually has a fighting chance to be able to read this code because it is much, much closer to the way they talk about Cars and MarketingDescriptions rather than being in this kind of stilted, almost very formal old English style of writing code which we are accustomed to in the Java World with all this repeated context and all this extra noise in terms of sets and properties and that sort of stuff. Martin Fowler: This writer on graphic displays called Edward Tufte who is highly regarded as one of the best people to read when it comes to presenting visual information, and he says that when you are doing this in a chart or diagram, it is very important to remove the noise dots on the diagram. Everything on the diagram should convey some meaning. There should not be any extraneous stuff, and that is really what we are trying to do with the code, remove all the extraneous stuff. The second thing that I tend to do a lot when I am doing this which is not at all obvious from something like in the final picture that we have here, is that I typically create an additional object as a layer over the basic API and I refer to this as an expression builder object. The problem is that if you put all these methods on the Car class and you looked at the API of the Car class, it would look very odd because the methods do not make much sense sitting there in the javadoc on their own. The methods look meaningless because they are robbed of their context. So, what I would like to do is to have an expression builder object that I put over my regular API, which is purely designed to support the fluent interface. And then I have a way of getting that, the Car object back out when I am done and that is the kind of mechanism I use to implement it. In that way, the regular object can have a normal API with nice-looking javadoc and the fluent stuff is nicely contained in this builder. And I can play all sorts of tricks inside the builder in order to make the language flow because I will focus on language.

The jMock libraries are an excellent example of that structure. Again, they have their regular API and then they have the fluent stuff that they place on top of it through just a couple of extra classes that they add to the structure. They do quite a sophisticated approach to this that uses quite a number of tricks to makes things flow well. One in particular that is very nice is that depending on where you are in your expression certain terms are legal or not and they do this by use of interface, multiple interfaces on the same object. And the nice consequence of this is that if you are in a good editor, the IntelliSense that you get leads you through building the expression, and so, it has that as a server usability thing when you are writing it. Neal Ford: Absolutely, if you are writing jMock code it is very well designed because every time you hit dot the things that you see there are the things that you are interested in doing next with that expectation that you are setting. And I want to re-emphasize the point here that this is not some sort of magical thing that we are creating here, all the building blocks are already there in the Java language and doing this, it is more an attitude shift of creating these fluent APIs more than anything else because there is no rocket science technology at all here, it is really just the intent of let us see how readable we can make code rather than how obfuscated we can make it which seems to be the default in the Java world for some reason.

There is another good example of this style of coding that exists out in a world. There is an open source library you can download from Google called Hamcrest, which is really just fluent interface wrappers for JUnit matchers. So, this is an add-on to JUnit that allows you to say things like, I guess at Google when they do good work they give them biscuits because all their examples are in terms of biscuits. So, I guess that is what they are doing with their millions of dollars is buying biscuits, but you can say for example assertThat(theBiscuit is(equalTo(myBiscuit))) and that is much more readable line of code than the kind of a formal assert equals this, that syntax that you normally see. And all this is very light patina over the existing JUnit match or classes that are already there. In fact, if we look at the latest versions of jMock, they incorporate the Hamcrest library to make their expectations even more expressive. So there are lots of people out in the world that are kind of embracing this style of coding. There are some building blocks for internal DSL. As we emphasized you do not have to do anything special to create these kind of fluent interfaces, but there are some building blocks that you can use. Languages with looser rules tend to make better DSL bases, simply because the looser rules allow you to get closer to English and closer to this goal of fluency because Java has some pretty strict rules about where its punctuation goes, and every line has to have a semicolon and all those rules that make up Java. And so, you see a lot of people pursuing this style of coding in these more dynamic languages like Groovy or Ruby and of course JRuby, which allows you to write Ruby code that runs on the JVM, and here is a good example of this. You can add a support for time intervals in Groovy because let us face it, the java.util.Calendar class is pretty broken in Java. It is different from all the other classes. The sets work completely differently from every other class in Java. They cannot seem to get date stuff exactly right in Java because the java.util.Date and then they said no let us take a mulligan on that. Let us throw it away and redo it. So they duplicated that whole thing and said no, use Calendar instead.

What you can use in Groovy is this thing called a Category, which allows you to essentially add new methods to built-in classes like Integer. So, I have got a category here IntegerWithTimeSupport that lets you add method calls to Integer and the goal here is to actually create this line of code that you see in red there 2.days.fromToday.at(4.pm) that returns a java.util.Calendar class instantiated at, as you probably guessed, two days from today with the time set at 4 pm. This is a perfect example of what we were talking about before is using the building blocks that we already have java.util.Calendar and the APIs that exist in Java and building a fluent interface on top of that and the nice thing that Groovy allows us to do here, Ruby does as well, is to do things like add methods to numbers so that you can create a time-interval support for core objects like Integers in Java. Martin Fowler: Yeah, there are a number of things that I think help make a language more DSL friendly. Looser rules is important. A lot of this is syntactic stuff like dropping things are necessary. It is amazing how much difference readability you get if you do not have to put parentheses, particularly empty parentheses pairs around things. Little things like that often make a lot of difference. Being able to add methods to existing classes within a context allows you to reorganize things and gives you more flexibility. Closures are really very, very important for more sophisticated cases. Another thing that is very important is the ability to have a literal collection structures particularly lists and HashMaps and be able to easily put those in as literals. In fact, that is the coolest structure of lists. It is the easability to write lists easily, and similarly you need that, I think in order to be able to do DSLs well. It is also useful to be able to easily construct symbolic types as opposed to using strings everywhere or enums because again it will allow a shorter, more compact way of representing things.

Neal Ford: And this is an example of open classes which is what you add behavior to core types like Integers and the stuff that you normally cannot touch in Java, but it is very handy to be able to add behavior to them, for example, add this idea of time intervals to the Integer class so that you can actually say 2. something and have that instantiate the time interval for you based on that method on Integer.

Martin Fowler: Lots of little things, but each one like the parentheses, does not seem that important when you mention it on its own and yet it makes a difference in getting something that reads clearly.

Neal Ford: So, when you think about how much it would obfuscate this example if we had to put parentheses at the end of each one of these things that would just be noise that would help destroy the fluency of the line of code that we were trying to create here. So, dynamic languages tend to make better building blocks just because they have support for some of these building block elements that we are talking about. Of course, it is not required that you use these, but they make generally better base for internal DSLs. External DSLs are the next thing, the other category of DSLs that we have here which are written in a different language than the main host language of the application and transform using some form of compiler or interpreter into some executable code of some kind. And these can be plain text files like configuration files in the Apache world, in UNIX world, XML documents, I think that is one of the reasons why Java is overrun with XML right now is we really want some sort of external syntax and we have kind of settled upon XML because it is so easy to parse, and of course now we are overrun with XML. Every framework has at least one XML configuration document and you end up with a non-trivial application and you have four or five different dialects of XML document, each of which is essentially their own language because when you create a grammar for an XML file you are simply defining the grammar for an external language that is embodied in that XML document.

Martin Fowler: This is a very important point because people will often say, well, do we actually use DSLs, and I would argue that most Java projects use a lot of DSLs and they are all embedded in these endless XML files that float around, and they are an external DSL, you have to parse them and bring them in, and usually, the framework will do that for you. We typically, we kind of gravitated to XML because there are very easy tools available for the syntactic analysis of the XML files, but the resulting XML is not very clear, and you do not really get great readability because there is a lot of noise words involved. And again you have that lack of flow. But I think the fact that the XML files are so pervasive in the Java world, in the .NET world is a testament of the fact that we need and we wanted to express things in domain-specific languages, but the mechanism we have used actually is not terribly good.

Neal Ford: And so what we have done is sacrificed readability for parsability in the XML world. You would never show an XML document to an end user to try to explain anything to them because it would frighten them to death. All those pointy places on it looks like a porcupine, looks like if you touched it you can cut yourself on it because it looks hazardous to consume, and one of my favorite quotes about XML is actually a Dave Thomas quote, he said that XML is really just data dressed up like a hooker. So, building blocks for external DSLs, you have parser generators and we have a wealth of these in the Java world right now, we have Antlr, which the first time I saw it I thought it had something to do with ant which has nothing whatsoever to do with ant, JavaCC and Yacc and SableCC including some really sophisticated tools in the Antlr world now are called AntlrWorks which makes it very easy to work with these grammars that you create. Martin Fowler: Yeah, the interesting thing about these tools, however, is that there is not very much out there that is written about how to use them well. Like most people I did a little compiler class when I was at college for a few weeks and promptly forgot most of what I was taught there, and as a result I never got terribly comfortable with using these tools. Over the last few months as part of the research for the book I am working on, I had been very much burying my head in these kinds of things, looking at example, little languages in the UNIX world, in the Java world, and seeing how people use these kinds of tools. And one thing that is very, very obvious to me is (a) the tools are actually not that hard to use once you get the hang of them, but (b) there is very, very little stuff out there to help you get the hang of them. So, it is a real struggle to learn how to use these. Now, the situation is improving on some fronts, in particular with the Antlr toolset. There is a book just appeared in ‘The Pragmatic Programmer’ series, that talks about how to use Antlr. There is a very nice IDE called AntlrWorks that gives you things like syntax highlighting, refactoring of grammars, visual…visualizations of what gets recognized by the grammar files, specialized debugger and that will certainly help to make it a lot easier to use this kind of stuff. But there is still a big gap there between the actual tools and the knowledge to be able to exploit them effectively. I am hoping that some of the work I am doing will help fill that gap, but at the moment I do have to warn you on that. If you want to play around with these kinds of external tools, for the moment I would definitely recommend Antlr because of the book and because of the IDE. It helps you get into it and use it a lot more easily. Unfortunately, what is not in the book is really very much a device about how to choose and design a DSL very appropriately. It tells you a lot about how Antlr works, but not a huge amount on how to use it for DSL work.

Neal Ford: Internal DSLs have some advantages because you have the full power of the underlying language and you have full access to sophisticated tools like IDEs. When you are defining a fluent interface in Java, you can use IntelliJ or Eclipse and you get code insight and all the other things that we are accustomed to in the Java world because we have all these sophisticated tools for working with Java code and of course an internal DSL if it is written in Java is fundamentally a Java program just expressed in a different way.

Martin Fowler: And that is both the plus and the minus because in many ways what you want to do with the DSL is kind of restrict your level of expression because that way you make less mistakes. And so, sometimes having that full blown-ness can be confusing, but of course if you are very familiar with that then it becomes much less of an issue. Neal Ford: The disadvantage of internal DSLs is it is hard to write these in modern “curly brace” languages because you just cannot boil away a lot of the context. Java is very strict about parentheses and periods and semicolons and that sort of punctuation that is required by the Java language. You are limited by the syntax and the semantics of the language, so for example if you are doing this purely in Java, you cannot add methods to the Integer class to allow you to express time ranges and using Integers as the ultimate object that you are calling your method on. And you have to understand the base language or you are in syntax trouble. This is a classic blunder by people from the PHP world that look at Rails and go, oh Rails, this is a much more sophisticated form of PHP. I will start doing that and they read some recipes from the Rails book somewhere, but they do not understand Ruby syntax and they very quickly get in trouble because ultimately you are writing a Ruby code even though it does not look much like Ruby code, that is what you are writing because it is an internal DSL sitting on top of this base language of Ruby.

External DSLs had the advantage that you are free to use any form you like, just let your imagination be your guide. That is actually an advantage and a disadvantage because you have infinite possibilities for what this thing can look like. You are limited really only by your ability to parse the language that you create into something that you can produce code from. Like Martin said, he has created 20 different versions of the same DSL experimenting with exactly how does this look and how does it work.

Martin Fowler: And then the parse point is important because there are certain things that you can do to make parsing easier and things you can do to make it harder. And again, this is part of the stuff that is not often very clearly indicated out there.

Neal Ford: The disadvantages of external DSLs is you have to build the translator and that is no small feat because you have to understand grammars and parsing and even though the tools make that easier, it is still a daunting task. You lack the support of your base language which means that you finally create this beautiful external DSL that you have created, but now you have got to use just a regular text editor to be able to edit it. You do not get any of the nice symbolic integration that you get with tools like Eclipse and IntelliJ because you are ultimately just writing out a plain text file that gets translated into something else. Of course, you could write your own IntelliJ for your own language, but now you are talking about a really daunting task of building not only a language, but also the tools that understands the language. Martin Fowler: This was of course less of an issue in UNIX days because hey! You just make your own emacs major mode and that will be straightforward, but the bar has gone up now. With more sophisticated IDEs you expect a lot more in the programming experience. So, a plain text file becomes much more awkward to use.

Neal Ford: That is what you want to tell your users is, oh well, here is emacs, just edit it in emacs, that is perfectly fine, which brings us around to Language Workbenches. Well, Martin and I are not the only people in the world who are interested in this whole language-oriented programming idea. In fact, there is three, at least, major vendors, actually more than this, but these are three representatives ones who are actively pursuing this as a style of coding. There is Intentional Software developed by Charles Simonyi. A lot of you probably recognize his name. He was the guy at Microsoft who created Word and Excel and at some point, I think he was trying to get Microsoft to kind of pursue his vision for this Language Workbench kind of tool and they were not that interested in pursuing it. So, he did what any self-respecting Microsoft employee would do. He took his billions of dollars and he stomped out the door and created his own company called Intentional Software, and they have been working for years now on this very sophisticated tool that allows you to do external DSLs. Microsoft has since started embracing this idea and they are creating what they are calling Software Factories, which is sort of this Language Workbench idea but it is a little more tilted toward the modeling world in creating executable models and those sort of things and finally, Meta Programming System which is developed by JetBrains, the makers of IntelliJ, next the one we are going to talk a little bit about here is MPS because it is the only one that you can really actually touch right now.

Martin Fowler: Yeah, just before we dive into it though, I will mention briefly, partly to do with the Microsoft stuff. We focused on textual languages so far in this talk with all sorts of reasons, I am not going to go delving into but there is also a school that is very interested in graphical languages and the Microsoft, the main specific language stuff is very much focused on that. A lot of people in that world come out of the CASE tool model-driven development kind of background and there is a lot of questionable stuff in that area but there is also some very good stuff as well and there is some definite interlinks with some of the things that we are talking about and I tend to see that as within the broad category of the main specific languages but it is just outside the bounds of what we have got time to talk about today. Neal Ford: So let us do a little bit of background and talk about Language Workbenches for a second. This is pretty much what compilation has looked like since the time you wrote your very first computer program where you have some sort of external representation, a text file, you hand it to a compiler, which parses it and compiles it into a syntax tree and then use something some sort of linker or some other mechanism and create an executable representation of that code and that is the way compilers have worked for a very, very long time now since the 1950s at least.

Martin Fowler: You think about this as a sequence of transformations. You begin with source code, which is one representation. We transform this into a different representation that sits inside the compiler’s memory which is the parse tree or the abstract syntax tree and then we use a code generator to transform it into a third representation which is the executable representation that we actually run. In Java’s case, that executable representation is Java bytecode which is itself then goes through another transformation step to actually get into a machine code that runs for a particular system. So in many ways, what we are thinking here is a kind of a pipeline of multiple representations and transformation steps between them, and when I say that you parse a source file into an abstract syntax tree, if you look inside that process, again you find a sequence of transformations inside there.
Neal Ford: But all of this have changed a little bit when IntelliJ first came out and this is a phrase that Martin coined and I believe this is yours, wholly yours, post-IntelliJ IDEs because IntelliJ was really the first tool that let you edit directly against the abstract syntax tree instead of just the text. That is actually how refactoring works. They do not do a massive search or replace operation in your source file. What they do is modify the abstract syntax tree and then reflect that back in the actual source code for your application and you will notice that we are getting better and better at this because the early refactoring tools were very intolerant of compilation errors. You had to be able to compile the whole thing. Now, they’ve gotten much better and you could actually edit parts of an abstract syntax tree even if another part of it does not actually work because there are syntax errors and other source of ambiguities in it. That leads us around to this idea of a Language Workbench, which rotates around this relationship with your custom tool from the traditional world because now you are operating mostly in some sort of projection from the abstract syntax tree and that could be a plain text file but it could just as easily be some sort of graphical designer tool or it could be database schema designer or something like that. You still have some sort of versioning storage that let you version this abstract syntax tree and you have some way to transform it into an executable representation of some kind, but the transformation process is actually different here. The general focus of your work in traditional compilation is the beginning of that sequence, which flows through to the end. This is more concerned with the middle part, which is really the important part, which is the abstract syntax tree there in the middle.

Martin Fowler: In traditional compilation, the key artifact is the source code which has to contain everything about the program and the abstract syntax tree is something which is really very transient because it occurs during compilation although it also occurs during editing in a post-IntelliJ IDE. With the Language Workbench, the central thing is the abstract representation and when you edit it, you edit through this projection what it looks to you and you do not have to have everything in the projection. You can just have different projections with different subsets of information and then the abstract representation ties everything together. So, it twists around the traditional way of looking at the relationship between source code and the other representations. Neal Ford: The editable representation is a projection of the abstract representation and the abstract representation has to be comfortable with errors and ambiguities just the way refactoring tools are now. Think how annoying a tool would be that said you have to be able to compile it before we can actually save your work. We would never want to use tool like this. So, one of the challenges of building a tool like this is to build it in such a way that it is actually comfortable and lets you save that internal representation even if it is an error or if it is incomplete somehow which is a nifty trick.

So the examples we are going to show here just very briefly are screenshots and these are actually fairly old screenshots of MPS. The idea behind MPS is that instead of building objects and APIs and hierarchies like you normally do, you build concepts and a concept is something like a date, a very, very small thing but along with that concept, you also build an editor that is concept aware. So, this is an example that Martin actually created in conjunction with the guys at JetBrains to create a DSL that describes rate plans for electricity usage, I believe it is a…

Martin Fowler: Hmm…mm.
Neal Ford: …gas utility usage and what you do is build up a date concept and a numeric financial concept and you build all these concepts as individual entities and then wire them together to make bigger and bigger things. So, this is an example and notice you get syntax highlighting because now the language you have created is aware of what the pieces are. So, you can do intelligent syntax highlighting. You can also do intelligent editing so that when you enter a cell, it knows what format it is expecting. You can also do things like code insight because the concept is defined as some sort of entity that you are interested in. You can define in what ways it can be edited legally. So, one of the things that is created along with this rate plan DSL is this concept of financial number, which can encompass dollar figures but it can also encompass Excel-like formulas and other sort of functionality, and the interesting thing about a Language Workbench like this is, when you tab into this field, just like in an IntelliJ, the options that you have there are the options that are in context over this particular concept that you created and those are the only things that show up there are the things that are pertinent to that context.

So, one of the common complaints against the style of coding is doesn’t this lead to language cacophony? Well, if I have developers out here creating their own languages, is not it going to be a nightmare because some guy who develops some sort of language and he will wander off and now I have got some sort of mess of a whole bunch of different languages and if you end up in this case, that means your language is very, very poorly designed. This is exactly the same problem you run into when you have your your developers internally building APIs. If they do really a terrible job of building the API, it is going to be hard to maintain too. Exactly, the same with language, because in some ways, what we are trying to do is represent the same kind of concept just in a slightly different way. Martin Fowler: In many ways, really what is going on with domain-specific languages is very much the thing that is doing with frameworks. In order to take on a project, you have to not just use your base language. We have to also introduce multiple APIs, multiple libraries, multiple frameworks as part of this. If you are going to build a web app these days, you have to decide oh, I need some kind of something like Spring MVC. I need to use Spring. I need to use Hibernate or use a whole bunch of these frameworks and that becomes part of your development environment and it is a whole bunch of things you have to learn. I do not think actually that domain-specific languages add very much to that. They are not any more hard to learn than the fundamental frameworks that they cover because all they are fundamentally is a fairly slim veneer over the actual frameworks themselves. It is these abstractions and frameworks, which are the hard things. The domain-specific languages are just merely small things that make it easier to use them. So I do not think it actually makes a big difference. We are not, in particular, it is important to stress we are not talking about multiple general-purpose languages. Each domain-specific language is usually something very small and limited. Again, the comparison is to think of something like a Hibernate configuration file or a Struts configuration file. The only difference is it is written in something a bit more readable than the XML.

Neal Ford: And in fact, if it is hard to read your DSL, then you have done a very poor job of creating it, because that was one of the goals, to create more readable code.

Martin Fowler: Well, keeping an eye on our time, we should probably think about wrapping up. I know we have got a few more slides but, so we can…

Neal Ford: Yep. We will just go over these very quickly. Why are not we already doing this? External DSLs give you the most potential but up into at a much higher cost because you have to build your own parser generator, etc. There is this COBOL inference which says, well, once we are able to write our own languages, then we will not need developers any more and this is another Martin Fowlerism, that most technologies that are supposed to eliminate professional programmers do nothing of the sort. Your end users are not interested in writing code. They are interested in the code doing stuff and they still need programmers to do that. What we are talking about is creating code that a business analyst can read, not write, and speaking the same language as the business people. If you can create really fluent interfaces, it allows you to communicate with them better which is what I said. Martin Fowler: And that is a point worth emphasizing. That is something we have begun to see in some of the projects that we have done at ThoughtWorks where we have taken a deliberate attempt to use DSLs and make them readable for business people and the important thing here is readable and reviewable, not that they are necessarily writable.

Neal Ford: So, the boundary between external DSLs and general-purpose languages, Martin has already talked about, you do not want to create a general-purpose language. You want to create a very, very tightly focused domain-specific language that covers just one domain. You are better off having a lot of small domain-specific languages and trying to create a new general-purpose language, which is...

Martin Fowler: In general, domain-specific languages are not too incomplete and in general, they do not provide abstraction mechanisms such as subroutines or object or things of that kind. It is not 100% true but it is true most of the time.

Neal Ford: They tend to be very declarative rather than imperative in nature.

Martin Fowler: Yeah.

Neal Ford: So, there is a token type not recognized….

Martin Fowler: Oh! Cool. Neal Ford: That is interesting. When you are building DSLs, it is a good idea to start with the end in mind. There are two different ways to do this. You can take an API and morph it into a DSL, but the other way to do this, this is a classic picture. This is the Rake napkin. Jim Weirich when he created the Rake, which is make for Ruby, was frustrated with the make utility that he was using and he was at lunch one day and talking to a colleague and he said, “I am so frustrated with make. I think I am just going to rewrite it in Ruby”. So they sat down and sketched down on a napkin, what it would look like if it were in Ruby and this is an actual scan of the Rake napkin. He took it back to his desk and two hours later, he had the beginnings of the Rake project. So he started with what he wanted to see and worked toward that. So, internal DSLs and dynamic languages, Language Workbenches for static languages, we believe that this is a huge competetive advantage because you have got better abstraction mechanisms, slightly harder to write but easier to maintain. This may well be the next big paradigm. This may be the next evolutionary step beyond object-oriented programming.

Martin Fowler: Yeah. I do not tend to look at it as the next big thing. I do not necessarily know what the next big thing is but I do think it is something that is interesting, and the fact that I am putting the time in…to make this almost certainly my next book is a sign of how interesting I think something like this might be. But I do not want to oversell how important it is. It is a useful, another tool to consider adding to the toolbox but it is also worth when you have some bits of it you can use today, particularly the internal DSL stuff. Some bits of it are definitely much more on the horizon. The Language Workbenches, I think are really, really interesting but it is going to be a few years before most people can think about using them for real projects.

Neal Ford: So they are running us away from here, so thank you very much and I hope you enjoyed the rest of the conference.

posted on 2008-05-25 14:09 懵懵灯灯阅读(328) 评论(0) 编辑收藏所属分类: Language

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理
相关文章: 面向语言的编程 [ZT]微软架构师谈编程语言的发展

懵懵灯灯的BLOG

常用链接

留言簿(2)

随笔分类

随笔档案

文章档案

收藏夹

链接

搜索

最新评论

阅读排行榜

评论排行榜