Incremental compilation for Crystal - Part 1

Read Time:7 Minute, 15 Second

I strongly believe that the Crystal programming language has great potential. The only thing that, in my mind, is holding it back a bit is its slow compile times, which become more and more noticeable as your project grows.

Why is the compiler slow? And can we improve the situation? This is what I’m going to talk about here, and think about solutions as we go.

The main reason is that the compiler compiles the entire program, from scratch, every time you invoke it. Well, it at least does a full semantic analysis of your program: there is some caching going on regarding producing object files. But semantic analysis is always done, from scratch, for the entire program.

And when I say “the entire program” it’s the code that you wrote, but also the code from shards you use, and even the standard library. And, well, it’s not the “entire program” because only those types and methods that are effectively used need to be analyzed and compiled, but it can still be quite a lot of code.

The actual algorithms used in the compiler are good and fast. We could try to make them even faster. But, as you add more and more code to your program, inevitably, no matter how fast the algorithms, it will take more and more time to compile programs.

Not doing semantic analysis from scratch means doing it by chunks, and reusing that previous knowledge.

Most, if not all, compiled programming languages will analyze and compile individual files and then link them afterwards. Can we do the same in Crystal?

Let’s consider this file:

Enter fullscreen mode Exit fullscreen mode

Well, we have no idea what are the types of x and y. If we call it with two ints then we can type and compile that method with that information. And the same is true if we pass it two strings. Or any object that has a + method with an argument. So we can’t type and compile that file just like that.

That’s one thing. Another thing is not knowing what types resolve to. For example in this file:

Enter fullscreen mode Exit fullscreen mode

Here something doesn’t take any arguments so we could try to type and compile it right away. But if we do so… what’s Foo? And what’s the class method bar in it? We can’t know. Well, maybe a require is missing in that file:

Enter fullscreen mode Exit fullscreen mode

and now if defines Foo all is good. But Crystal (and Ruby) allow not requiring a dependency like that, and instead it can come from somewhere else. For example:

Enter fullscreen mode Exit fullscreen mode

Here defines a class Foo with a class method bar, and requires dependency which in turns defines something that uses that Foo. And this is perfectly fine, even though there’s no explicit mention of where does Foo come from in something.

The conclusion so far is that it’s almost always impossible to look at a single file and type it without knowing the full program.

At this point one might wonder “is it a good idea that a file can work without it specifying where do types and methods come from?” But here I’m not going to try to answer that question, which is mainly subjective, and instead try to focus on how we could improve Crystal as it is right now. Of course Ruby could be “improved” (made faster) in many ways if we changed it (like, disallow some dynamisms, force some type declarations, etc.) but Ruby is being improved while not changing the way it is, while keeping its philosophy. Ruby has a vision, and it’s being improved while keeping that vision. And I think that with Crystal we should strive to do the same.

So let’s discard for now the idea of being able to compile files individually. What else can we do?

One idea is to compile the entire program first, and then try to reuse that knowledge for subsequent compilations. It would work like this:

  1. We compile the entire program and build a dependencies graph: which files depend on which other files?
  2. We also remember the types used in methods? For example if add(x, y) was called with two ints, we remember that it returns an int.
  3. Next time we compile a program, if we find a call to add(x, y) with two ints, we check if the file where add was defined, or any of its dependencies (recursively) changed, and if not, we could avoid typing that method again as we would know that it returns an int.

Would that really work?

I created a branch that modifies the compiler to output a file dependencies graph for a given program. You can try it by checking out that branch, creating a compiler from it and then compiling any program. This will generate a file which you can then turn into a PDF by using this command:

Enter fullscreen mode Exit fullscreen mode

(you’ll need to install graphviz first)

Running it against this program:

Enter fullscreen mode Exit fullscreen mode

will produce this graph:

Incremental compilation for Crystal – Part 1

What a lovely tree-like graph! It’s very easy to see how no cycles exist here.

They say “beauty lies in the eyes of the beholder” but I hope we can all agree that the graph above is a mess.

Still, within all that mess, we can find this:

Image description defines puts, so it’s natural that dependes on it. But what’s interesting is that nobody depends on (naturally.) And that also means that if next time we only change then we can reuse everything else from the previous compilation.

This looks promising. But running the tool with some bigger programs I found something. Let’s take a look at this program:

Enter fullscreen mode Exit fullscreen mode

The graph near now looks like this:

Image description

Now has more dependencies, but something curious is that someone is now depending on Who?

We could play the “trace the line” game to find it, but looking at the dot file there’s this:

Enter fullscreen mode Exit fullscreen mode

The above means “src/” depends on “”. WAT? How can the file that defines IO depend on “”? It should be the other way around!

Debugging this a bit, we can see that the call the program makes is this one:

Enter fullscreen mode Exit fullscreen mode

Next we have that the definition of puts is:

Enter fullscreen mode Exit fullscreen mode

So that’s calling puts on an IO (STDOUT). Let’s take a look at that:

Enter fullscreen mode Exit fullscreen mode

This is calling IO#<<(obj) where obj is an instance of Foo. Let’s take a look at that << method:

Enter fullscreen mode Exit fullscreen mode

So this is calling obj.to_s(self), where obj is an instance of Foo. And where is that method defined? In! Bingo!

That’s why suddenly depends on

One way to understand this is to know that in Crystal all methods are monomorphized. Monomorwhat? It just means that if you call IO#puts with a type X, a method will be generated for that particular X. If you call it with a Y, a method will be generated for that particular Y (for that particular type.) In the example above, an IO#<<(obj) method was generated where obj is of type Foo.

This is one reason that, I think, explains how messy the graph above is.

In other compiled, statically typed programming languages, this doesn’t happen. Usually obj responds to some explicit interface and then the method lies behind a virtual table and a virtual dispatch. None of this happens in Crystal.

Does it matter that depends on Maybe it’s not a big deal. Well, it probably is. It means that changes to a file are likely to affect seemingly unrelated files. And so if we try to use the “reuse previous compilation” strategy, not a lot could be reused in the end.

This is not the end of the story. It’s just the end of this blog post. I didn’t continue thinking about how to approach this problem, but hopefully these posts spark some interest and some thoughts from others about how to solve this interesting problem.


Tag Cloud

Java Java Logical Programs OTP Generation in Java python Recursion youtube video ASCII Upper and Lower Case blockchain javascript graph learn to code software development Successful Software Engineers breadth first search Java Array Programs Java Programs Uncategorized android ios programming kotlin web-development django data sql cybersecurity database swiftui serverless aws swift rust react background-position gradients loader mask grid nth-child pseudo elements indieweb WordPress Print Array without brackets C++ factorial Java String Programs Final Keyword Static Variable Axie Infinity Cryptokitties NFT games tool inserting MISC Tips Codes python code python projects python3 system info python project Bigginers How to Do Integrations Payment Gateways PHP checkout page in php Implement stripe payment gateway in Step by step in PHP integrate stripe gatway in php mysql payment gateway integration in php step by step payment gateway integration in php step by step with source code payment gateway integration in website PHP Integrate Stripe Payment Gateway Tutorial PHP shopping cart checkout code shopping cart in php stripe php checkout PHP/MySQL/JSON best international payment gateway does google pay accept international payments how to accept international payments in india paytm payment gateway razorpay codeigniter github razorpay custom checkout github razorpay get payment details razorpay integration in codeigniter github razorpay international payments Razorpay payment gateway integration in CodeIgniter razorpay payment gateway integration in php code Razorpay payment gateway integration with PHP and CodeIgniter Razorpay payment gateway setup in CodeIgniter Library & Frameworks Tips & Tricks UI/UX & Front-end coding birds online html code for google sign in login with google account in PHP login with google account using javascript login with google account using javascript codeigniter login with google account using php login with google account using php source code
6 ways we improved our documentation in 2022 Previous post 6 ways we improved our documentation in 2022
#DevDiscuss: Monoliths vs. Microservices Next post #DevDiscuss: Monoliths vs. Microservices

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.