Cache build information between webpack processes

The current watch mode for webpack is fairly good at caching load results and general workload between file changes as long as you keep the same webpack process running. The initial build for some projects, on the other hand, can be on the order of more than a minute. This can make the build process very slow if for some reason you need to stop your watcher, or node crashes, or you simply want to restart your connect server with different flags.

It would be nice if webpack were able to persist compilation information between invocations of the node process to the filesystem and reload that cache later — so ideally, performance between different $ webpack calls would be close to how long it takes to --watch to recompile when it detects a change.

Author: Fantashit

6 thoughts on “Cache build information between webpack processes

  1. This could be very useful also for speeding up incremental builds in a continuos deployment setup.
    @sokra could you point us in the right direction if we want to help with this?

  2. I’ve been looking at this too. Here’s my current implementation. https://github.com/mzgoddard/webpack-cache-module-plugin

    I dove into webpack’s source while trying my hand at it so far. I realized the cyclical parts of the cache are caused by Compilation handling built modules and primarily storing Reasons that point to the module that depends on the newly built module. If a cached module is used the old Reasons are thrown away and new ones are computed during an iterative build.

    From reading some other plugins I thought I might use this detail and hard code the members of modules to be serialized. To deserialize them I’d let webpack’s normalModuleFactory do a bunch of the lifting and in the factory’s module plugin interface wrap the NormalModule in a proxy object that uses the specifically selected serialized members for the first run if the timestamp wasn’t out of date for its dependencies.

    This works. It works for at least in my generated sandbox. I think the take away either way is to when serializing or deserializing the cache treat the modules as disconnected (

    webpack/lib/Module.js

    Lines 29 to 37
    in
    f7d799a

    Module.prototype.disconnect = function() {
    this.reasons.length = 0;
    this.lastId = this.id;
    this.id = null;
    this.index = null;
    this.index2 = null;
    this.chunks.length = 0;
    DependenciesBlock.prototype.disconnect.call(this);
    };

    ). As far as I can tell Module#disconnect removes the circular references and its essentially the state the modules are in when they are first created and the state Compilation wants when using a CachedModule.

    The other thing I noticed, and I’m not sure if it affects @bholloway‘s work but I think it does, is even with using these cached proxies around the normal modules I saw maybe a 10%% improvement in performance. It wasn’t after some more poking around and logging timestamps that I noticed the biggest consumer of time was spent resolving locations of dependencies. Implementing a stored UnsafeCache (https://github.com/mzgoddard/webpack-cache-module-plugin/blob/0f09a55b01715c8e70dbc64098a220b6f5c67f68/lib/CachePlugin.js#L80-L97) gave me the performance improvement we’re all hoping for from a persisting cache. Compilation when processing modules for dependencies and recursing, building the depended on modules, has to resolve what that dependency is, or more specifically the module factory has to. I thought I knew how the iterative builds avoided this but looking back at webpack’s source I’m not as sure. Just thinking about it now I’d guess the CachedInputFileSystem is helping here once primed from the first run.

    Freezing and thawing the cache to disk is pretty straight forward. I think caching dependency resolutions or possibly part of the CachedInputFileSystem is where a large win may be waiting but they are less straight forward. One thought is if we could list all attempted filepaths for context and dependency pair, we might be able to watch or check those paths on start to invalidate path resolutions. Another thought is maybe CachedInputFileSystem can be a help here. Maybe if we persist to disk the stats, readlink and readdir info in there and check it on revalidate that on start we could gain a similar impact to UnsafeCache but be … safer. I imagine though that revalidating info for the CachedInputFileSystem could take a lot of time considering the number of paths involved in file resolving dependencies and loaders.

    To reiterate, from what I’ve found, Reason and Chunk objects and others removed in disconnect stored on Modules are the circular references and are not wanted during iterative compilations so they can be safely ignored. Resolving dependencies is very elaborate and takes a lot of time, to approach iterative build times somehow safely caching dependency path resolutions or CachedInputFileSystem is likely a needed step.

    Hope this helps.

  3. I think flame graphs could give us better insights on where so much time is spent. I’ll try to get some from my current project and post instructions on how to get them (if you do not know it anyway ^^).

    We talked about this issue at our last weekly meeting. We’re planing to deprecate all parts of the loader APIs that make parallel compilations impossible (namely sync APIs and _compiler/_compilation). However, in order to deprecate these we need to provide better alternatives for loaders that had to access these internal objects (namely typescript loaders but probably some others too).

    @sokra had the idea of a loader API that allows to hook into different compilation states. Thus, loaders would be more like plugins but still on a per-file basis. He wanted to create a proposal so that we can discuss it with other loader authors.

    Since webpack@2 already uses a separate module to execute loaders, we could write a parallel-loader which spawns multiple processes (see our meeting notes for details). The parallel-loader would need to provide a webpack-like loader context and handle all the communication in the background. I think it’s a good idea to push this into user-space instead of embedding it into webpack core. This way we can keep the core simple, and parallel compilation is probably not always desired due to the costs of spawning a new process.

  4. @abergs run your webpack with node’s inspector (a command like the following):

    node --debug-brk --inspect webpack
    

    Then launch your browser to visit the URL that command will show you (make sure you’re on node 6+, 7 preferably)

    In the browser page, find the Profiles panel and click on “Start Profile”, then let the debugger continue (you may need to go back to the Sources panel to do this). Once the run is complete, go to the Profiles panel and click on “Stop Profile” or such, then you’ll find the collected profile in a list to the left. That profile will contain the flame graph you’re after.

  5. That’s what you get when legacy code moves to webpack, lol. It’s gotten slightly smaller now, but for the longest time we had a massive hairball chunk about 50mb in size, for which i had to write a custom plugin emulating @sokra‘s agressive splitting plugin to split it into slices that get loaded in parallel. Good times… 😂

  6. I was running some tests on how hard-source-webpack-plugin could be added to the Angular CLI pipeline and the results seem promising.

    It does seem fairly sensitive to Webpack internals though. Since it relies on serializing Webpack sources any change in them can possibly break the plugin.

    This unfortunately means that it will necessarily lag behind latest Webpack stable its changes are integrated into hard-source-webpack-plugin. Webpack 4 also seems to be bringing in a fair number of performance improvements which will probably affect sources and thus need to be integrated into the plugin.

    This makes me think that the only way a plugin such as hard-source-webpack-plugin could be stable is if it’s integrated directly in Webpack, or has access to a stable API (e.g sources supporting serialization). Maybe the second option would provide plugin authors with more options.

Comments are closed.