Fix crashes due to python GIL released too early
authorPhilippe Waroquiers <philippe.waroquiers@skynet.be>
Sat, 23 Nov 2019 10:08:12 +0000 (11:08 +0100)
committerPhilippe Waroquiers <philippe.waroquiers@skynet.be>
Tue, 26 Nov 2019 20:01:58 +0000 (21:01 +0100)
When running GDB tests under Valgrind, various tests are failing due
to invalid memory access.
Here is the stack trace reported by Valgrind, for gdb.base/freebpcmd.exp :
  ==18658== Invalid read of size 8
  ==18658==    at 0x7F9107: is_main (signalmodule.c:195)
  ==18658==    by 0x7F9107: PyOS_InterruptOccurred (signalmodule.c:1730)
  ==18658==    by 0x3696E2: check_quit_flag() (extension.c:829)
  ==18658==    by 0x36980B: restore_active_ext_lang(active_ext_lang_state*) (extension.c:782)
  ==18658==    by 0x48F617: gdbpy_enter::~gdbpy_enter() (python.c:235)
  ==18658==    by 0x47BB71: add_thread_object(thread_info*) (object.h:470)
  ==18658==    by 0x53A84D: operator() (std_function.h:687)
  ==18658==    by 0x53A84D: notify (observable.h:106)
  ==18658==    by 0x53A84D: add_thread_silent(ptid_t) (thread.c:311)
  ==18658==    by 0x3CD954: inf_ptrace_target::create_inferior(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&
  , char**, int) (inf-ptrace.c:139)
  ==18658==    by 0x3FE644: linux_nat_target::create_inferior(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&,
   char**, int) (linux-nat.c:1094)
  ==18658==    by 0x3D5727: run_command_1(char const*, int, run_how) (infcmd.c:633)
  ==18658==    by 0x2C05D1: cmd_func(cmd_list_element*, char const*, int) (cli-decode.c:1948)
  ==18658==    by 0x53F29F: execute_command(char const*, int) (top.c:639)
  ==18658==    by 0x3638EB: command_handler(char const*) (event-top.c:586)
  ==18658==    by 0x36468C: command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) (event-top.c:771)
  ==18658==    by 0x36407C: gdb_rl_callback_handler(char*) (event-top.c:217)
  ==18658==    by 0x5B2A1F: rl_callback_read_char (callback.c:281)
  ==18658==    by 0x36346D: gdb_rl_callback_read_char_wrapper_noexcept() (event-top.c:175)
  ==18658==    by 0x363F70: gdb_rl_callback_read_char_wrapper(void*) (event-top.c:192)
  ==18658==    by 0x3633AF: stdin_event_handler(int, void*) (event-top.c:514)
  ==18658==    by 0x362504: gdb_wait_for_event (event-loop.c:857)
  ==18658==    by 0x362504: gdb_wait_for_event(int) (event-loop.c:744)
  ==18658==    by 0x362676: gdb_do_one_event() [clone .part.11] (event-loop.c:321)
  ==18658==    by 0x3627AD: gdb_do_one_event (event-loop.c:303)
  ==18658==    by 0x3627AD: start_event_loop() (event-loop.c:370)
  ==18658==    by 0x41D35A: captured_command_loop() (main.c:381)
  ==18658==    by 0x41F2A4: captured_main (main.c:1224)
  ==18658==    by 0x41F2A4: gdb_main(captured_main_args*) (main.c:1239)
  ==18658==    by 0x227D0A: main (gdb.c:32)
  ==18658==  Address 0x10 is not stack'd, malloc'd or (recently) free'd

The problem seems to be created by gdbpy_enter::~gdbpy_enter () releasing the GIL lock
too early:
~gdbpy_enter () does:
      ...
      PyGILState_Release (m_state);
      python_gdbarch = m_gdbarch;
      python_language = m_language;

      restore_active_ext_lang (m_previous_active);
    }

So, it releases the GIL lock, does 2 assignments and then leads to the following
call sequence:
  restore_active_ext_lang => check_quit_flag => python.c gdbpy_check_quit_flag
     => PyOS_InterruptOccurred => is_main.
is_main code is:
    static int
    is_main(_PyRuntimeState *runtime)
    {
        unsigned long thread = PyThread_get_thread_ident();
        PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;
        return (thread == runtime->main_thread
                && interp == runtime->interpreters.main);
    }

The macros and functions to access the thread state are documented as:
    /* Variable and macro for in-line access to current thread
       and interpreter state */

    #define _PyRuntimeState_GetThreadState(runtime) \
        ((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current))

    /* Get the current Python thread state.

       Efficient macro reading directly the 'gilstate.tstate_current' atomic
       variable. The macro is unsafe: it does not check for error and it can
       return NULL.

       The caller must hold the GIL.

       See also PyThreadState_Get() and PyThreadState_GET(). */
    #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime)

So, we see that GDB releases the GIL and then potentially calls
_PyRuntimeState_GetThreadState that needs the GIL.

It is not very clear why the problem is only observed when running under
Valgrind.  Probably caused by the slowdown due to Valgrind and/or to the 'single
thread' scheduling by Valgrind.

This patch fixes the crashes by releasing the GIT lock later.

2019-11-26  Philippe Waroquiers  <philippe.waroquiers@skynet.be>

* python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after
restore_active_ext_lang, as GIL is needed for (indirectly)
called PyOS_InterruptOccurred.

gdb/ChangeLog
gdb/python/python.c

index 79c09f95e7f9353dc48640deeb17e0ff1d37c3a6..b061e887f5716dda586e6cd19c10a49a56126f7c 100644 (file)
@@ -1,3 +1,9 @@
+2019-11-26  Philippe Waroquiers  <philippe.waroquiers@skynet.be>
+
+       * python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after
+       restore_active_ext_lang, as GIL is needed for (indirectly)
+       called PyOS_InterruptOccurred.
+
 2019-11-26  Simon Marchi  <simon.marchi@efficios.com>
 
        * sparc-nat.c (sparc_xfer_wcookie): Sync declaration with
index 7b561a1c35f099f0e0d5a2d83f6e3d487e4a4319..1cc5ebb7a733c83e03645cb00475c4b4c8c778b0 100644 (file)
@@ -228,11 +228,11 @@ gdbpy_enter::~gdbpy_enter ()
 
   m_error->restore ();
 
-  PyGILState_Release (m_state);
   python_gdbarch = m_gdbarch;
   python_language = m_language;
 
   restore_active_ext_lang (m_previous_active);
+  PyGILState_Release (m_state);
 }
 
 /* Set the quit flag.  */
This page took 0.031064 seconds and 4 git commands to generate.