From aa36950904393728b2d5e75fb5bca7a25418c00f Mon Sep 17 00:00:00 2001 From: Philippe Waroquiers Date: Sat, 23 Nov 2019 11:08:12 +0100 Subject: [PATCH] Fix crashes due to python GIL released too early When running GDB tests under Valgrind, various tests are failing due to invalid memory access. Here is the stack trace reported by Valgrind, for gdb.base/freebpcmd.exp : ==18658== Invalid read of size 8 ==18658== at 0x7F9107: is_main (signalmodule.c:195) ==18658== by 0x7F9107: PyOS_InterruptOccurred (signalmodule.c:1730) ==18658== by 0x3696E2: check_quit_flag() (extension.c:829) ==18658== by 0x36980B: restore_active_ext_lang(active_ext_lang_state*) (extension.c:782) ==18658== by 0x48F617: gdbpy_enter::~gdbpy_enter() (python.c:235) ==18658== by 0x47BB71: add_thread_object(thread_info*) (object.h:470) ==18658== by 0x53A84D: operator() (std_function.h:687) ==18658== by 0x53A84D: notify (observable.h:106) ==18658== by 0x53A84D: add_thread_silent(ptid_t) (thread.c:311) ==18658== by 0x3CD954: inf_ptrace_target::create_inferior(char const*, std::__cxx11::basic_string, std::allocator > const& , char**, int) (inf-ptrace.c:139) ==18658== by 0x3FE644: linux_nat_target::create_inferior(char const*, std::__cxx11::basic_string, std::allocator > const&, char**, int) (linux-nat.c:1094) ==18658== by 0x3D5727: run_command_1(char const*, int, run_how) (infcmd.c:633) ==18658== by 0x2C05D1: cmd_func(cmd_list_element*, char const*, int) (cli-decode.c:1948) ==18658== by 0x53F29F: execute_command(char const*, int) (top.c:639) ==18658== by 0x3638EB: command_handler(char const*) (event-top.c:586) ==18658== by 0x36468C: command_line_handler(std::unique_ptr >&&) (event-top.c:771) ==18658== by 0x36407C: gdb_rl_callback_handler(char*) (event-top.c:217) ==18658== by 0x5B2A1F: rl_callback_read_char (callback.c:281) ==18658== by 0x36346D: gdb_rl_callback_read_char_wrapper_noexcept() (event-top.c:175) ==18658== by 0x363F70: gdb_rl_callback_read_char_wrapper(void*) (event-top.c:192) ==18658== by 0x3633AF: stdin_event_handler(int, void*) (event-top.c:514) ==18658== by 0x362504: gdb_wait_for_event (event-loop.c:857) ==18658== by 0x362504: gdb_wait_for_event(int) (event-loop.c:744) ==18658== by 0x362676: gdb_do_one_event() [clone .part.11] (event-loop.c:321) ==18658== by 0x3627AD: gdb_do_one_event (event-loop.c:303) ==18658== by 0x3627AD: start_event_loop() (event-loop.c:370) ==18658== by 0x41D35A: captured_command_loop() (main.c:381) ==18658== by 0x41F2A4: captured_main (main.c:1224) ==18658== by 0x41F2A4: gdb_main(captured_main_args*) (main.c:1239) ==18658== by 0x227D0A: main (gdb.c:32) ==18658== Address 0x10 is not stack'd, malloc'd or (recently) free'd The problem seems to be created by gdbpy_enter::~gdbpy_enter () releasing the GIL lock too early: ~gdbpy_enter () does: ... PyGILState_Release (m_state); python_gdbarch = m_gdbarch; python_language = m_language; restore_active_ext_lang (m_previous_active); } So, it releases the GIL lock, does 2 assignments and then leads to the following call sequence: restore_active_ext_lang => check_quit_flag => python.c gdbpy_check_quit_flag => PyOS_InterruptOccurred => is_main. is_main code is: static int is_main(_PyRuntimeState *runtime) { unsigned long thread = PyThread_get_thread_ident(); PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp; return (thread == runtime->main_thread && interp == runtime->interpreters.main); } The macros and functions to access the thread state are documented as: /* Variable and macro for in-line access to current thread and interpreter state */ #define _PyRuntimeState_GetThreadState(runtime) \ ((PyThreadState*)_Py_atomic_load_relaxed(&(runtime)->gilstate.tstate_current)) /* Get the current Python thread state. Efficient macro reading directly the 'gilstate.tstate_current' atomic variable. The macro is unsafe: it does not check for error and it can return NULL. The caller must hold the GIL. See also PyThreadState_Get() and PyThreadState_GET(). */ #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&_PyRuntime) So, we see that GDB releases the GIL and then potentially calls _PyRuntimeState_GetThreadState that needs the GIL. It is not very clear why the problem is only observed when running under Valgrind. Probably caused by the slowdown due to Valgrind and/or to the 'single thread' scheduling by Valgrind. This patch fixes the crashes by releasing the GIT lock later. 2019-11-26 Philippe Waroquiers * python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after restore_active_ext_lang, as GIL is needed for (indirectly) called PyOS_InterruptOccurred. --- gdb/ChangeLog | 6 ++++++ gdb/python/python.c | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/gdb/ChangeLog b/gdb/ChangeLog index 79c09f95e7..b061e887f5 100644 --- a/gdb/ChangeLog +++ b/gdb/ChangeLog @@ -1,3 +1,9 @@ +2019-11-26 Philippe Waroquiers + + * python/python.c (gdbpy_enter::~gdbpy_enter): Release GIL after + restore_active_ext_lang, as GIL is needed for (indirectly) + called PyOS_InterruptOccurred. + 2019-11-26 Simon Marchi * sparc-nat.c (sparc_xfer_wcookie): Sync declaration with diff --git a/gdb/python/python.c b/gdb/python/python.c index 7b561a1c35..1cc5ebb7a7 100644 --- a/gdb/python/python.c +++ b/gdb/python/python.c @@ -228,11 +228,11 @@ gdbpy_enter::~gdbpy_enter () m_error->restore (); - PyGILState_Release (m_state); python_gdbarch = m_gdbarch; python_language = m_language; restore_active_ext_lang (m_previous_active); + PyGILState_Release (m_state); } /* Set the quit flag. */ -- 2.34.1