Analysis server hangs with Linux 5.5 and IntelliJ based IDEs

After upgrading to Linux 5.5 (I’m on Linux 5.5.2-1-MANJARO), analysis in IntelliJ stops working after a while (analysis errors view doesn’t update anymore, formatting times out, no autocomplete, …).
The analyzer diagnostics page is still reachable after that happens, but it appears that the analysis server just stops serving regular requests. Interestingly, I can’t reproduce this with VS Code. I tried deleting ~/.dartServer/, but that didn’t fix the problem.

I first thought this was an IntelliJ problem, but flutter/flutter#49185 (comment) makes me think that this is analyzer related. I wanted to open another issue here because for me that also happens on the non-Flutter Dart SDK (both stable & latest dev).

I can reproduce this consistently and on different projects, so I’d be glad to provide more information if that’s necessary.

Author: Fantashit

3 thoughts on “Analysis server hangs with Linux 5.5 and IntelliJ based IDEs

  1. The following simple program would reproduce the hang on a compute instance with 5.5.x linux kernel.

    import 'dart:io' as io;
    import 'dart:convert';
    
    void main(List<String> args) async {
      if (args.length != 1) {
        print('Usage: run.dart child|parent');
        return;
      }
      if (args[0] == 'child') {
        for (int i = 0; i < 30000; i++) {
          print('line $i');
        }
        print('done');
        return;
      } else if (args[0] == 'parent') {
        final p = await io.Process.start(io.Platform.executable,[io.Platform.script.toFilePath(), 'child']);
        p.stdout.transform(utf8.decoder).listen((x) => print('stdout: $x'));
        p.stderr.transform(utf8.decoder).listen((x) => print('stderr: $x'));
        final exitCode = await p.exitCode;
        print('process exited with ${exitCode}');
      }
    }
    $ dart run.dart parent
    

    The parent process would then hang.

    Based on my cursory analysis (I have never looked at this part of the code before) I think this is a bug in our code – we don’t seem to be using epoll correctly. We use edge-triggered mode for file descriptors (except server sockets). This mode comes with some warnings in the man pages. Namely it warns that if you use EPOLLET you should only epoll_wait after you have received EAGAIN from read/write – I don’t see us following this rule. We drain some available amount of bytes from the file descriptor but we don’t really drain it until we hit EAGAIN (and we don’t even update our estimate of the available bytes as we are draining it – so we more bytes come while we are reading we are going to ignore that, which does not work with edge-triggered mode).

    I am not sure why this problem only surfaces now – but I see that there were some changes to the Kernel around epoll, so this might have caused it to surface.

    We should either stop using ET mode or we should fix our code to follow man page guidelines.

    Assigning to @zichangg for actual implementation work.

Comments are closed.