git.efficios.com Git - deliverable/linux.git/log

projects / deliverable / linux.git / log

Mathieu Desnoyers [Mon, 31 Oct 2016 15:44:30 +0000 (11:44 -0400)]

Restartable sequences: tests: introduce simple rseq start/finish

Introduce rseq_start/rseq_finish that do not require the rseq lock. The
new rseq_start_rlock and rseq_finish_rlock deal with it.

This is useful for use-cases that do not require locking to handle
fallback. For instance, using split counter techniques for fallback when
possibe, or using reference counting to do a lock-free fallback.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Mathieu Desnoyers [Sun, 16 Oct 2016 06:22:56 +0000 (08:22 +0200)]

Restartable sequences: don't clear rseq_cs after c.s.

We can remove this rseq_cs=NULL clearing step at the end of the critical
section, since the kernel checks for lower and upper address ranges when
it preempts or delivers a signals. The kernel will either take care of
clearing it at the next preemption or signal delivery, or the next
critical section will either change its value, or set it to the same
value.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Mathieu Desnoyers [Sun, 16 Oct 2016 08:37:05 +0000 (10:37 +0200)]

Fix: Restartable sequences: tests: powerpc 32-bit use stw

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Mathieu Desnoyers [Sun, 16 Oct 2016 06:16:25 +0000 (08:16 +0200)]

Restartable sequences: clear rseq_cs even if non-nested

We can speed up the user-space fast path by removing the final
"rseq_cs=NULL" assignment at the end of the critical section, since we
check for both upper and lower bounds anyway in the fixup code.

Take care of setting rseq_cs back to NULL even if we are not nested over
an actual rseq critical section.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Mathieu Desnoyers [Tue, 20 Sep 2016 15:41:31 +0000 (11:41 -0400)]

Fix: rseq: arm branch to failure

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Mathieu Desnoyers [Mon, 5 Sep 2016 22:20:26 +0000 (18:20 -0400)]

Fix: Restartable Sequences: testing: ip-relative x86-64

Allow rseq to be used in PIC code (library code) on x86-64.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Mathieu Desnoyers [Mon, 30 May 2016 07:46:53 +0000 (09:46 +0200)]

Restartable sequences: self-tests

Implements two basic tests of RSEQ functionality, and one more
exhaustive parameterizable test.

The first, "basic_test" only asserts that RSEQ works moderately
correctly.
E.g. that:
- The CPUID pointer works
- Code infinitely looping within a critical section will eventually be
  interrupted.
- Critical sections are interrupted by signals.

"basic_percpu_ops_test" is a slightly more "realistic" variant,
implementing a few simple per-cpu operations and testing their
correctness.

"param_test" is a parametrizable restartable sequences test. See
the "--help" output for usage.

As part of those tests, a helper library "rseq" implements a user-space
API around restartable sequences. It takes care of ensuring progress in
case of debugger single-stepping with a fall-back to locking, and
exposes the instruction pointer addresses where the rseq assembly blocks
begin and end, as well as the associated abort instruction pointer, in
the __rseq_table section. This section allows debuggers may know where
to place breakpoints when single-stepping through assembly blocks which
may be aborted at any point by the kernel.

The following rseq APIs are implemented in this helper library:
- do_rseq(): Restartable sequence made of zero or more loads, completed
  by a word-sized store,
- do_rseq2(): Restartable sequence made of zero or more loads, one
  speculative word-sized store, completed by a word-sized store,
- do_rseq_memcpy(): Restartable sequence made of zero or more loads,
  a speculative copy of a variable length memory region, completed by a
  word-sized store.

PowerPC tests have been implemented by Boqun Feng.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org

commit | commitdiff | tree

Boqun Feng [Wed, 27 Jul 2016 15:05:16 +0000 (23:05 +0800)]

Restartable sequences: Wire up powerpc system call

Wire up the rseq system call on powerpc.

This provides an ABI improving the speed of a user-space getcpu
operation on powerpc by skipping the getcpu system call on the fast
path, as well as improving the speed of user-space operations on per-cpu
data compared to using load-reservation/store-conditional atomics.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Boqun Feng [Wed, 27 Jul 2016 15:05:15 +0000 (23:05 +0800)]

Restartable sequences: powerpc architecture support

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal when a
signal is delivered on top of a restartable sequence critical section.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

commit | commitdiff | tree

Mathieu Desnoyers [Wed, 27 Jan 2016 22:13:45 +0000 (17:13 -0500)]

Restartable sequences: wire up x86 32/64 system call

Wire up the rseq system call on x86 32/64.

This provides an ABI improving the speed of a user-space getcpu
operation on x86 by removing the need to perform a function call, "lsl"
instruction, or system call on the fast path, as well as improving the
speed of user-space operations on per-cpu data.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org

commit | commitdiff | tree

Mathieu Desnoyers [Wed, 27 Jan 2016 22:12:07 +0000 (17:12 -0500)]

Restartable sequences: x86 32/64 architecture support

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal frame
when a signal is delivered on top of a restartable sequence critical
section.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org

commit | commitdiff | tree

Mathieu Desnoyers [Wed, 27 Jan 2016 22:10:44 +0000 (17:10 -0500)]

Restartable sequences: wire up ARM 32 system call

Wire up the rseq system call on 32-bit ARM.

This provides an ABI improving the speed of a user-space getcpu
operation on ARM by skipping the getcpu system call on the fast path, as
well as improving the speed of user-space operations on per-cpu data
compared to using load-linked/store-conditional.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org

commit | commitdiff | tree

Mathieu Desnoyers [Wed, 27 Jan 2016 22:09:24 +0000 (17:09 -0500)]

Restartable sequences: ARM 32 architecture support

Call the rseq_handle_notify_resume() function on return to
userspace if TIF_NOTIFY_RESUME thread flag is set.

Increment the event counter and perform fixup on the pre-signal frame
when a signal is delivered on top of a restartable sequence critical
section.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org

commit | commitdiff | tree

Mathieu Desnoyers [Mon, 6 Jun 2016 13:37:36 +0000 (09:37 -0400)]

tracing: instrument restartable sequences

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org

commit | commitdiff | tree

Mathieu Desnoyers [Tue, 22 Dec 2015 13:29:56 +0000 (08:29 -0500)]

Restartable sequences system call (v8)

Expose a new system call allowing each thread to register one userspace
memory area to be used as an ABI between kernel and user-space for two
purposes: user-space restartable sequences and quick access to read the
current CPU number value from user-space.

* Restartable sequences (per-cpu atomics)

Restartables sequences allow user-space to perform update operations on
per-cpu data without requiring heavy-weight atomic operations.

The restartable critical sections (percpu atomics) work has been started
by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
critical sections. [1] [2] The re-implementation proposed here brings a
few simplifications to the ABI which facilitates porting to other
architectures and speeds up the user-space fast path. A locking-based
fall-back, purely implemented in user-space, is proposed here to deal
with debugger single-stepping. This fallback interacts with rseq_start()
and rseq_finish(), which force retries in response to concurrent
lock-based activity.

Here are benchmarks of counter increment in various scenarios compared
to restartable sequences:

ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck

                      Counter increment speed (ns/increment)
                             1 thread    2 threads
global increment (baseline)      6           N/A
percpu rseq increment           50            52
percpu rseq spinlock            94            94
global atomic increment         48            74 (__sync_add_and_fetch_4)
global atomic CAS               50           172 (__sync_val_compare_and_swap_4)
global pthread mutex           148           862

ARMv7 Processor rev 10 (v7l)
Machine model: Wandboard

                      Counter increment speed (ns/increment)
                             1 thread    4 threads
global increment (baseline)      7           N/A
percpu rseq increment           50            50
percpu rseq spinlock            82            84
global atomic increment         44           262 (__sync_add_and_fetch_4)
global atomic CAS               46           316 (__sync_val_compare_and_swap_4)
global pthread mutex           146          1400

x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:

                      Counter increment speed (ns/increment)
                              1 thread           8 threads
global increment (baseline)      3.0                N/A
percpu rseq increment            3.6                3.8
percpu rseq spinlock             5.6                6.2
global LOCK; inc                 8.0              166.4
global LOCK; cmpxchg            13.4              435.2
global pthread mutex            25.2             1363.6

* Reading the current CPU number

Speeding up reading the current CPU number on which the caller thread is
running is done by keeping the current CPU number up do date within the
cpu_id field of the memory area registered by the thread. This is done
by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within the registered user-space memory
area. User-space can then read the current CPU number directly from
memory.

Keeping the current cpu id in a memory area shared between kernel and
user-space is an improvement over current mechanisms available to read
the current CPU number, which has the following benefits over
alternative approaches:

- 35x speedup on ARM vs system call through glibc
- 20x speedup on x86 compared to calling glibc, which calls vdso
  executing a "lsl" instruction,
- 14x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cpu_id value can be read from an inline
  assembly, which makes it a useful building block for restartable
  sequences.
- The approach of reading the cpu id through memory mapping shared
  between kernel and user-space is portable (e.g. ARM), which is not the
  case for the lsl-based x86 vdso.

On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the cpu id cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.

Benchmarking various approaches for reading the current CPU number:

ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck
- Baseline (empty loop):                                    8.4 ns
- Read CPU from rseq cpu_id:                               16.7 ns
- Read CPU from rseq cpu_id (lazy register):               19.8 ns
- glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
- getcpu system call:                                     234.9 ns

x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop):                                    0.8 ns
- Read CPU from rseq cpu_id:                                0.8 ns
- Read CPU from rseq cpu_id (lazy register):                0.8 ns
- Read using gs segment selector:                           0.8 ns
- "lsl" inline assembly:                                   13.0 ns
- glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
- getcpu system call:                                      53.9 ns

- Speed

Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
expectations, that enabling CONFIG_RSEQ slightly accelerates the
scheduler:

Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
restartable sequences series applied.

* CONFIG_RSEQ=n

avg.:      41.37 s
std.dev.:   0.36 s

* CONFIG_RSEQ=y

avg.:      40.46 s
std.dev.:   0.33 s

- Size

On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
2855 bytes, and the data size increase of vmlinux is 1024 bytes.

* CONFIG_RSEQ=n

   text    data     bss     dec     hex filename
9964559 4256280 962560 15183399 e7ae27 vmlinux.norseq

* CONFIG_RSEQ=y

   text    data     bss     dec     hex filename
9967414 4257304 962560 15187278 e7bd4e vmlinux.rseq

[1] https://lwn.net/Articles/650333/
[2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf

Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---

Changes since v1:
- Return -1, errno=EINVAL if cpu_cache pointer is not aligned on
  sizeof(int32_t).
- Update man page to describe the pointer alignement requirements and
  update atomicity guarantees.
- Add MAINTAINERS file GETCPU_CACHE entry.
- Remove dynamic memory allocation: go back to having a single
  getcpu_cache entry per thread. Update documentation accordingly.
- Rebased on Linux 4.4.

Changes since v2:
- Introduce a "cmd" argument, along with an enum with GETCPU_CACHE_GET
  and GETCPU_CACHE_SET. Introduce a uapi header linux/getcpu_cache.h
  defining this enumeration.
- Split resume notifier architecture implementation from the system call
  wire up in the following arch-specific patches.
- Man pages updates.
- Handle 32-bit compat pointers.
- Simplify handling of getcpu_cache GETCPU_CACHE_SET compiler barrier:
  set the current cpu cache pointer before doing the cache update, and
  set it back to NULL if the update fails. Setting it back to NULL on
  error ensures that no resume notifier will trigger a SIGSEGV if a
  migration happened concurrently.

Changes since v3:
- Fix __user annotations in compat code,
- Update memory ordering comments.
- Rebased on kernel v4.5-rc5.

Changes since v4:
- Inline getcpu_cache_fork, getcpu_cache_execve, and getcpu_cache_exit.
- Add new line between if() and switch() to improve readability.
- Added sched switch benchmarks (hackbench) and size overhead comparison
  to change log.

Changes since v5:
- Rename "getcpu_cache" to "thread_local_abi", allowing to extend
  this system call to cover future features such as restartable critical
  sections. Generalizing this system call ensures that we can add
  features similar to the cpu_id field within the same cache-line
  without having to track one pointer per feature within the task
  struct.
- Add a tlabi_nr parameter to the system call, thus allowing to extend
  the ABI beyond the initial 64-byte structure by registering structures
  with tlabi_nr greater than 0. The initial ABI structure is associated
  with tlabi_nr 0.
- Rebased on kernel v4.5.

Changes since v6:
- Integrate "restartable sequences" v2 patchset from Paul Turner.
- Add handling of single-stepping purely in user-space, with a
  fallback to locking after 2 rseq failures to ensure progress, and
  by exposing a __rseq_table section to debuggers so they know where
  to put breakpoints when dealing with rseq assembly blocks which
  can be aborted at any point.
- make the code and ABI generic: porting the kernel implementation
  simply requires to wire up the signal handler and return to user-space
  hooks, and allocate the syscall number.
- extend testing with a fully configurable test program. See
  param_spinlock_test -h for details.
- handling of rseq ENOSYS in user-space, also with a fallback
  to locking.
- modify Paul Turner's rseq ABI to only require a single TLS store on
  the user-space fast-path, removing the need to populate two additional
  registers. This is made possible by introducing struct rseq_cs into
  the ABI to describe a critical section start_ip, post_commit_ip, and
  abort_ip.
- Rebased on kernel v4.7-rc7.

Changes since v7:
- Documentation updates.
- Integrated powerpc architecture support.
- Compare rseq critical section start_ip, allows shriking the user-space
  fast-path code size.
- Added Peter Zijlstra, Paul E. McKenney and Boqun Feng as
  co-maintainers.
- Added do_rseq2 and do_rseq_memcpy to test program helper library.
- Code cleanup based on review from Peter Zijlstra, Andy Lutomirski and
  Boqun Feng.
- Rebase on kernel v4.8-rc2.

Man page associated:

RSEQ(2)                 Linux Programmer's Manual                 RSEQ(2)

NAME
       rseq - Restartable sequences and cpu number cache

SYNOPSIS
       #include <linux/rseq.h>

       int rseq(struct rseq * rseq, int flags);

DESCRIPTION
       The  rseq()  ABI accelerates user-space operations on per-cpu data
       by defining a shared data structure ABI  between  each  user-space
       thread and the kernel.

       It  allows user-space to perform update operations on per-cpu data
       without requiring heavy-weight atomic operations.

       Restartable sequences are atomic with respect to preemption  (mak‐
       ing  it  atomic  with respect to other threads running on the same
       CPU), as well as signal delivery  (user-space  execution  contexts
       nested over the same thread).

       It is suited for update operations on per-cpu data.

       It  can be used on data structures shared between threads within a
       process, and on data structures shared between threads across dif‐
       ferent processes.

       Some examples of operations that can be accelerated by this ABI:

       · Querying the current CPU number,

       · Incrementing per-CPU counters,

       · Modifying data protected by per-CPU spinlocks,

       · Inserting/removing elements in per-CPU linked-lists,

       · Writing/reading per-CPU ring buffers content.

       The  rseq argument is a pointer to the thread-local rseq structure
       to be shared between kernel and  user-space.  A  NULL  rseq  value
       unregisters the current thread rseq structure.

       The layout of struct rseq is as follows:

       Structure alignment
              This structure is aligned on multiples of 128 bytes.

       Structure size
              This structure has a fixed size of 128 bytes.

       Fields

           cpu_id
              Cache of the CPU number on which the current thread is run‐
              ning.

           event_counter
              Counter guaranteed  to  be  incremented  when  the  current
              thread  is  preempted  or when a signal is delivered to the
              current thread.

           rseq_cs
              The rseq_cs field is a pointer to a struct rseq_cs.  Is  is
              NULL when no rseq assembly block critical section is active
              for the current thread.  Setting it to point to a  critical
              section  descriptor (struct rseq_cs) marks the beginning of
              the critical section. It is cleared after the  end  of  the
              critical section.

       The layout of struct rseq_cs is as follows:

       Structure alignment
              This structure is aligned on multiples of 256 bytes.

       Structure size
              This structure has a fixed size of 256 bytes.

       Fields

           start_ip
              Instruction pointer address of the first instruction of the
              sequence of consecutive assembly instructions.

           post_commit_ip
              Instruction pointer address after the last  instruction  of
              the sequence of consecutive assembly instructions.

           abort_ip
              Instruction  pointer  address  where  to move the execution
              flow in case of abort of the sequence of consecutive assem‐
              bly instructions.

       Upon registration, the flags argument is currently unused and must
       be specified as 0. Upon unregistration, the flags argument can  be
       either  specified  as  0,  or as RSEQ_FORCE_UNREGISTER, which will
       force unregistration of  the  current  rseq  address  rather  than
       requiring each registration to be matched by an unregistration.

       Libraries  and  applications  should  keep the rseq structure in a
       thread-local storage variable.  Since only one rseq address can be
       registered  per  thread,  applications and libraries should define
       their struct rseq as a volatile thread-local storage variable with
       the weak symbol __rseq_abi.  This allows using rseq from an appli‐
       cation executable and from multiple shared libraries linked to the
       same executable. The cpu_id field should be initialized to -1.

       Each  thread  is responsible for registering and unregistering its
       rseq structure. No more than one rseq  structure  address  can  be
       registered  per  thread  at  a given time. The same address can be
       registered more than once for  a  thread,  and  each  registration
       needs  to  have  a  matching  unregistration before the address is
       effectively unregistered. After the rseq  address  is  effectively
       unregistered for a thread, a new address can be registered. Unreg‐
       istration of associated rseq  structure  is  implicitly  performed
       when a thread or process exits.

       In  a  typical  usage  scenario,  the  thread registering the rseq
       structure will be performing loads and stores from/to that  struc‐
       ture. It is however also allowed to read that structure from other
       threads.  The rseq field updates performed by the  kernel  provide
       relaxed  atomicity  semantics,  which guarantee that other threads
       performing relaxed atomic reads  of  the  cpu  number  cache  will
       always observe a consistent value.

RETURN VALUE
       A  return  value of 0 indicates success. On error, -1 is returned,
       and errno is set appropriately.

ERRORS
       EINVAL Either flags contains an invalid value, or rseq contains an
              address which is not appropriately aligned.

       ENOSYS The rseq() system call is not implemented by this kernel.

       EFAULT rseq is an invalid address.

       EBUSY  The rseq argument contains a non-NULL address which differs
              from  the  memory  location  already  registered  for  this
              thread.

       EOVERFLOW
              Registering  the  rseq  address  is  not allowed because it
              would cause a reference counter overflow.

       ENOENT The rseq argument is NULL, but no memory location  is  cur‐
              rently registered for this thread.

VERSIONS
       The rseq() system call was added in Linux 4.X (TODO).

CONFORMING TO
       rseq() is Linux-specific.

ALGORITHM
       The restartable sequences mechanism is the overlap of two distinct
       restart mechanisms: a sequence  counter  tracking  preemption  and
       signal  delivery for high-level code, and an ip-fixup-based mecha‐
       nism for the final assembly instruction sequence.

       A high-level summary of the algorithm to use rseq from  user-space
       is as follows:

       The  high-level  code between rseq_start() and rseq_finish() loads
       the current value of the sequence  counter  in  rseq_start(),  and
       then  it  gets  compared  with  the  new  current value within the
       rseq_finish()   restartable    instruction    sequence.    Between
       rseq_start()  and  rseq_finish(),  the high-level code can perform
       operations that do not have side-effects, such as getting the cur‐
       rent CPU number, and loading from variables.

       Stores  are  performed at the very end of the restartable sequence
       assembly block. Each  assembly  block  defines  a  struct  rseq_cs
       structure   which   describes   the  start_ip  and  post_commit_ip
       addresses, as well as the abort_ip address where the kernel should
       move  the  thread  instruction  pointer if a rseq critical section
       assembly block is preempted or if a signal is delivered on top  of
       a rseq critical section assembly block.

       Detailed algorithm of rseq use:

       rseq_start()

           0. Userspace  loads  the  current event counter value from the
              event_counter field of the registered struct rseq TLS area,

       rseq_finish()

              Steps [1]-[3] (inclusive) need to be a sequence of instruc‐
              tions  in  userspace  that  can  handle  being moved to the
              abort_ip between any of those instructions.

              The abort_ip address needs to be  less  than  start_ip,  or
              greater-or-equal  the  post_commit_ip.   Step  [4]  and the
              failure code step [F1] need to be at addresses lesser  than
              start_ip, or greater-or-equal the post_commit_ip.

           [ start_ip ]

           1. Userspace stores the address of the struct rseq_cs assembly
              block descriptor into the rseq_cs field of  the  registered
              struct rseq TLS area.

           2. Userspace  tests  to  see whether the current event_counter
              value match the value loaded at [0].  Manually  jumping  to
              [F1] in case of a mismatch.

              Note  that  if  we are preempted or interrupted by a signal
              after [1] and before post_commit_ip, then the  kernel  also
              performs the comparison performed in [2], and conditionally
              clears the rseq_cs field of struct rseq, then jumps  us  to
              abort_ip.

           3. Userspace   critical   section   final  instruction  before
              post_commit_ip is the commit. The critical section is self-
              terminating.

           [ post_commit_ip ]

           4. Userspace  clears  the rseq_cs field of the struct rseq TLS
              area.

           5. Return true.

           On failure at [2]:

           F1.
              Userspace clears the rseq_cs field of the struct  rseq  TLS
              area. Followed by step [F2].

           [ abort_ip ]

           F2.
              Return false.

EXAMPLE
       The following code uses the rseq() system call to keep a thread-local
       storage variable up to date with the current CPU number, with a fall‐
       back on sched_getcpu(3) if the cache is not  available.  For  example
       simplicity,  it  is  done in main(), but multithreaded programs would
       need to invoke rseq() from each program thread.

           #define _GNU_SOURCE
           #include <stdlib.h>
           #include <stdio.h>
           #include <unistd.h>
           #include <stdint.h>
           #include <sched.h>
           #include <stddef.h>
           #include <errno.h>
           #include <string.h>
           #include <stdbool.h>
           #include <sys/syscall.h>
           #include <linux/rseq.h>

           __attribute__((weak)) __thread volatile struct rseq __rseq_abi = {
               .u.e.cpu_id = -1,
           };

           static int
           sys_rseq(volatile struct rseq *rseq_abi, int flags)
           {
               return syscall(__NR_rseq, rseq_abi, flags);
           }

           static int32_t
           rseq_current_cpu_raw(void)
           {
               return __rseq_abi.u.e.cpu_id;
           }

           static int32_t
           rseq_current_cpu(void)
           {
               int32_t cpu;

               cpu = rseq_current_cpu_raw();
               if (cpu < 0)
                   cpu = sched_getcpu();
               return cpu;
           }

           static int
           rseq_register_current_thread(void)
           {
               int rc;

               rc = sys_rseq(&__rseq_abi, 0);
               if (rc) {
                   fprintf(stderr,
                       "Error: sys_rseq(...) register failed(%d): %s\n",
                       errno, strerror(errno));
                   return -1;
               }
               return 0;
           }

           static int
           rseq_unregister_current_thread(void)
           {
               int rc;

               rc = sys_rseq(NULL, 0);
               if (rc) {
                   fprintf(stderr,
                       "Error: sys_rseq(...) unregister failed(%d): %s\n",
                       errno, strerror(errno));
                   return -1;
               }
               return 0;
           }

           int
           main(int argc, char **argv)
           {
               bool rseq_registered = false;

               if (!rseq_register_current_thread()) {
                   rseq_registered = true;
               } else {
                   fprintf(stderr,
                       "Unable to register restartable sequences.\n");
                   fprintf(stderr, "Using sched_getcpu() as fallback.\n");
               }

               printf("Current CPU number: %d\n", rseq_current_cpu());

               if (rseq_registered && rseq_unregister_current_thread()) {
                   exit(EXIT_FAILURE);
               }
               exit(EXIT_SUCCESS);
           }

SEE ALSO
       sched_getcpu(3)

Linux                           2016-08-19                        RSEQ(2)

commit | commitdiff | tree

Greg Kroah-Hartman [Thu, 20 Oct 2016 08:03:41 +0000 (10:03 +0200)]

Linux 4.8.3

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Oct 2016 20:07:36 +0000 (13:07 -0700)]

mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.

This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Artem Savkov [Tue, 13 Sep 2016 21:40:35 +0000 (07:40 +1000)]

Make __xfs_xattr_put_listen preperly report errors.

commit 791cc43b36eb1f88166c8505900cad1b43c7fe1a upstream.

Commit 2a6fba6 "xfs: only return -errno or success from attr ->put_listent"
changes the returnvalue of __xfs_xattr_put_listen to 0 in case when there is
insufficient space in the buffer assuming that setting context->count to -1
would be enough, but all of the ->put_listent callers only check seen_enough.
This results in a failed assertion:
XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175
in insufficient buffer size case.

This is only reproducible with at least 2 xattrs and only when the buffer
gets depleted before the last one.

Furthermore if buffersize is such that it is enough to hold the last xattr's
name, but not enough to hold the sum of preceeding xattr names listxattr won't
fail with ERANGE, but will suceed returning last xattr's name without the
first character. The first character end's up overwriting data stored at
(context->alist - 1).

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Heiner Kallweit [Wed, 3 Aug 2016 19:49:03 +0000 (21:49 +0200)]

scsi: configure runtime pm before calling device_add in scsi_add_host_with_dma

commit 0d5644b7d8daa3c1d91acb4367731f568c9c9469 upstream.

Runtime PM should be configured already once we call device_add. See
also the description in this mail thread
https://lists.linuxfoundation.org/pipermail/linux-pm/2009-November/023198.html
or the order of calls e.g. in usb_new_device.

The changed order also helps to avoid scenarios where runtime pm for
&shost->shost_gendev is activated whilst the parent is suspended,
resulting in error message "runtime PM trying to activate child device
hostx but parent yyy is not active".

In addition properly reverse the runtime pm calls in the error path.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Laurent Pinchart [Wed, 17 Aug 2016 12:57:37 +0000 (09:57 -0300)]

v4l: rcar-fcp: Don't force users to check for disabled FCP support

commit fd44aa9a254b18176ec3792a18e7de6977030ca8 upstream.

The rcar_fcp_enable() function immediately returns successfully when the
FCP device pointer is NULL to avoid forcing the users to check the FCP
device manually before every call. However, the stub version of the
function used when the FCP driver is disabled returns -ENOSYS
unconditionally, resulting in a different API contract for the two
versions of the function.

As a user that requires FCP support will fail at probe time when calling
rcar_fcp_get() if the FCP driver is disabled, the stub version of the
rcar_fcp_enable() function will only be called with a NULL FCP device.
We can thus return 0 unconditionally to align the behaviour with the
normal version of the function.

Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Greg Kroah-Hartman [Sun, 16 Oct 2016 16:03:56 +0000 (18:03 +0200)]

Linux 4.8.2

commit | commitdiff | tree

Jarkko Sakkinen [Fri, 2 Sep 2016 19:34:17 +0000 (22:34 +0300)]

tpm_crb: fix crb_req_canceled behavior

commit 72fd50e14e46dc0edf360631bdece87c2f066a97 upstream.

The req_canceled() callback is used by tpm_transmit() periodically to
check whether the request has been canceled while it is receiving a
response from the TPM.

The TPM_CRB_CTRL_CANCEL register was cleared already in the crb_cancel
callback, which has two consequences:

* Cancel might not happen.
* req_canceled() always returns zero.

A better place to clear the register is when starting to send a new
command. The behavior of TPM_CRB_CTRL_CANCEL is described in the
section 5.5.3.6 of the PTP specification.

Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Jarkko Sakkinen [Tue, 16 Aug 2016 19:00:38 +0000 (22:00 +0300)]

tpm: fix a race condition in tpm2_unseal_trusted()

commit d4816edfe706497a8525480c1685ceb9871bc118 upstream.

Unseal and load operations should be done as an atomic operation. This
commit introduces unlocked tpm_transmit() so that tpm2_unseal_trusted()
can do the locking by itself.

Fixes: 0fe5480303a1 ("keys, trusted: seal/unseal with TPM 2.0 chips")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Miklos Szeredi [Fri, 16 Sep 2016 10:44:20 +0000 (12:44 +0200)]

ima: use file_dentry()

commit e71b9dff0634edb127f449e076e883ef24a8c76c upstream.

Ima tries to call ->setxattr() on overlayfs dentry after having locked
underlying inode, which results in a deadlock.

Reported-by: Krisztian Litkey <kli@iki.fi>
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dmitry Tunin [Wed, 21 Sep 2016 16:13:08 +0000 (19:13 +0300)]

Bluetooth: Add a new 04ca:3011 QCA_ROME device

commit 1144a4eed04b2c3e7d20146d1b76f7669b55971d upstream.

BugLink: https://bugs.launchpad.net/bugs/1535802
T:  Bus=01 Lev=02 Prnt=02 Port=04 Cnt=01 Dev#=  3 Spd=12  MxCh= 0
D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=04ca ProdID=3011 Rev=00.01
C:  #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I:  If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Christophe Jaillet [Thu, 11 Aug 2016 13:02:30 +0000 (15:02 +0200)]

ARM: cpuidle: Fix error return code

commit af48d7bc3756a0cd882d65bff14ab39746ba57fe upstream.

We know that 'ret = 0' because it has been tested a few lines above.
So, if 'kzalloc' fails, 0 will be returned instead of an error code.
Return -ENOMEM instead.

Fixes: a0d46a3dfdc3 ("ARM: cpuidle: Register per cpuidle device")
Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Linus Walleij [Fri, 5 Aug 2016 08:38:38 +0000 (10:38 +0200)]

ARM: dts: MSM8660 remove flags from SPMI/MPP IRQs

commit dcf5907e0e09a160a57160729f920add5df8e358 upstream.

The Qualcomm SPMI GPIO and MPP lines are problematic: the
are fetched from the main MFD driver with platform_get_irq()
which means that at this point they will all be assigned the
flags set up for the interrupts in the device tree.

That is problematic since these are flagged as rising edge
and an this point the interrupt descriptor is assigned a
rising edge, while the only thing the GPIO/MPP drivers really
do is issue irq_get_irqchip_state() on the line to read it
out and to provide a .to_irq() helper for *other* IRQ
consumers.

If another device tree node tries to flag the same IRQ
for use as something else than rising edge, the kernel
irqdomain core will protest like this:

type mismatch, failed to map hwirq-NN for <FOO>!

Which is what happens when the device tree defines two
contradictory flags for the same interrupt line.

To work around this and alleviate the problem, assign 0
as flag for the interrupts taken by the PM GPIO and MPP
drivers. This will lead to the flag being unset, and a
second consumer requesting rising, falling, both or level
interrupts will be respected. This is what the qcom-pm*.dtsi
files already do.

Switched to using the symbolic name IRQ_TYPE_NONE so that
we get this more readable.

This misconfiguration was caused by a copy/pasting the
APQ8064 set-up, the latter has been fixed in a separate
patch.

Tested with one of the SPMI GPIOs: after this I can
successfully request one of these GPIOs as falling edge
from the device tree.

Fixes: 0840ea9e4457 ("ARM: dts: add GPIO and MPP to MSM8660 PMIC")
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Björn Andersson <bjorn.andersson@linaro.org>
Cc: Ivan T. Ivanov <ivan.ivanov@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Linus Walleij [Fri, 5 Aug 2016 08:38:37 +0000 (10:38 +0200)]

ARM: dts: MSM8064 remove flags from SPMI/MPP IRQs

commit ca88696e8b73a9fa2b1de445747e9235c3a7bd50 upstream.

The Qualcomm PMIC GPIO and MPP lines are problematic: the
are fetched from the main MFD driver with platform_get_irq()
which means that at this point they will all be assigned the
flags set up for the interrupts in the device tree.

That is problematic since these are flagged as rising edge
and an this point the interrupt descriptor is assigned a
rising edge, while the only thing the GPIO/MPP drivers really
do is issue irq_get_irqchip_state() on the line to read it
out and to provide a .to_irq() helper for *other* IRQ
consumers.

If another device tree node tries to flag the same IRQ
for use as something else than rising edge, the kernel
irqdomain core will protest like this:

type mismatch, failed to map hwirq-NN for <FOO>!

Which is what happens when the device tree defines two
contradictory flags for the same interrupt line.

To work around this and alleviate the problem, assign 0
as flag for the interrupts taken by the PM GPIO and MPP
drivers. This will lead to the flag being unset, and a
second consumer requesting rising, falling, both or level
interrupts will be respected. This is what the qcom-pm*.dtsi
files already do.

Switched to using the symbolic name IRQ_TYPE_NONE so that
we get this more readable.

Fixes: bce360469676 ("ARM: dts: apq8064: add pm8921 mpp support")
Fixes: 874443fe9e33 ("ARM: dts: apq8064: Add pm8921 mfd and its gpio node")
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Björn Andersson <bjorn.andersson@linaro.org>
Cc: Ivan T. Ivanov <ivan.ivanov@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Grzegorz Jaszczyk [Thu, 4 Aug 2016 10:14:08 +0000 (12:14 +0200)]

ARM: dts: mvebu: armada-390: add missing compatibility string and bracket

commit 061492cfad9f11dbc32df741a7164f307b69b6e6 upstream.

The armada-390.dtsi was broken since the first patch which adds Device Tree
files for Armada 39x SoC was introduced.

Signed-off-by: Grzegorz Jaszczyk <jaz@semihalf.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Fixes 538da83 ("ARM: mvebu: add Device Tree files for Armada 39x SoC and board")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>

commit | commitdiff | tree

Russell King [Wed, 5 Oct 2016 22:40:43 +0000 (23:40 +0100)]

ARM: fix delays

commit fb833b1fbb68461772dbf5e91bddea5e839187e9 upstream.

Commit 215e362dafed ("ARM: 8306/1: loop_udelay: remove bogomips value
limitation") tried to increase the bogomips limitation, but in doing
so messed up udelay such that it always gives about a 5% error in the
delay, even if we use a timer.

The calculation is:

loops = UDELAY_MULT * us_delay * ticks_per_jiffy >> UDELAY_SHIFT

Originally, UDELAY_MULT was ((UL(2199023) * HZ) >> 11) and UDELAY_SHIFT
30. Assuming HZ=100, us_delay of 1000 and ticks_per_jiffy of 1660000
(eg, 166MHz timer, 1ms delay) this would calculate:

((UL(2199023) * HZ) >> 11) * 1000 * 1660000 >> 30
=> 165999

With the new values of 2047 * HZ + 483648 * HZ / 1000000 and 31, we get:

(2047 * HZ + 483648 * HZ / 1000000) * 1000 * 1660000 >> 31
=> 158269

which is incorrect. This is due to a typo - correcting it gives:

(2147 * HZ + 483648 * HZ / 1000000) * 1000 * 1660000 >> 31
=> 165999

i.o.w, the original value.

Fixes: 215e362dafed ("ARM: 8306/1: loop_udelay: remove bogomips value limitation")
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Josh Poimboeuf [Thu, 18 Aug 2016 15:59:06 +0000 (10:59 -0500)]

x86/dumpstack: Fix x86_32 kernel_stack_pointer() previous stack access

commit 72b4f6a5e903b071f2a7c4eb1418cbe4eefdc344 upstream.

On x86_32, when an interrupt happens from kernel space, SS and SP aren't
pushed and the existing stack is used.  So pt_regs is effectively two
words shorter, and the previous stack pointer is normally the memory
after the shortened pt_regs, aka '&regs->sp'.

But in the rare case where the interrupt hits right after the stack
pointer has been changed to point to an empty stack, like for example
when call_on_stack() is used, the address immediately after the
shortened pt_regs is no longer on the stack.  In that case, instead of
'&regs->sp', the previous stack pointer should be retrieved from the
beginning of the current stack page.

kernel_stack_pointer() wants to do that, but it forgets to dereference
the pointer.  So instead of returning a pointer to the previous stack,
it returns a pointer to the beginning of the current stack.

Note that it's probably outside of kernel_stack_pointer()'s scope to be
switching stacks at all.  The x86_64 version of this function doesn't do
it, and it would be better for the caller to do it if necessary.  But
that's a patch for another day.  This just fixes the original intent.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
Link: http://lkml.kernel.org/r/472453d6e9f6a2d4ab16aaed4935f43117111566.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Nicolas Iooss [Sat, 10 Sep 2016 18:30:45 +0000 (20:30 +0200)]

x86/mm/pkeys: Do not skip PKRU register if debug registers are not used

commit ba6d018e3d2f6a0fad58a668cadf66b2d1f80f59 upstream.

__show_regs() fails to dump the PKRU state when the debug registers are in
their default state because there is a return statement on the debug
register state.

Change the logic to report PKRU value even when debug registers are in
their default state.

Fixes:c0b17b5bd4b7 ("x86/mm/pkeys: Dump PKRU with other kernel registers")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20160910183045.4618-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Prarit Bhargava [Mon, 3 Oct 2016 17:07:12 +0000 (13:07 -0400)]

arch/x86: Handle non enumerated CPU after physical hotplug

commit 2a51fe083eba7f99cbda72f5ef90cdf2f4df882c upstream.

When a CPU is physically added to a system then the MADT table is not
updated.

If subsequently a kdump kernel is started on that physically added CPU then
the ACPI enumeration fails to provide the information for this CPU which is
now the boot CPU of the kdump kernel.

As a consequence, generic_processor_info() is not invoked for that CPU so
the number of enumerated processors is 0 and none of the initializations,
including the logical package id management, are performed.

We have code which relies on the correctness of the logical package map and
other information which is initialized via generic_processor_info().
Executing such code will result in undefined behaviour or kernel crashes.

This problem applies only to the kdump kernel because a normal kexec will
switch to the original boot CPU, which is enumerated in MADT, before
jumping into the kexec kernel.

The boot code already has a check for num_processors equal 0 in
prefill_possible_map(). We can use that check as an indicator that the
enumeration of the boot CPU did not happen and invoke generic_processor_info()
for it. That initializes the relevant data for the boot CPU and therefore
prevents subsequent failure.

[ tglx: Refined the code and rewrote the changelog ]

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id")
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: dyoung@redhat.com
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1475514432-27682-1-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Denys Vlasenko [Tue, 13 Sep 2016 18:12:32 +0000 (20:12 +0200)]

x86/apic: Get rid of apic_version[] array

commit cff9ab2b291e64259d97add48fe073c081afe4e2 upstream.

The array has a size of MAX_LOCAL_APIC, which can be as large as 32k, so it
can consume up to 128k.

The array has been there forever and was never used for anything useful
other than a version mismatch check which was introduced in 2009.

There is no reason to store the version in an array. The kernel is not
prepared to handle different APIC versions anyway, so the real important
part is to detect a version mismatch and warn about it, which can be done
with a single variable as well.

[ tglx: Massaged changelog ]

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Borislav Petkov <bp@alien8.de>
CC: Brian Gerst <brgerst@gmail.com>
CC: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20160913181232.30815-1-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Andy Shevchenko [Thu, 8 Sep 2016 10:32:32 +0000 (13:32 +0300)]

x86/platform/intel-mid: Keep SRAM powered on at boot

commit f43ea76cf310c3be95cb75ae1350cbe76a8f2380 upstream.

On Penwell SRAM has to be powered on, otherwise it prevents booting.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ca22312dc840 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")
Link: http://lkml.kernel.org/r/20160908103232.137587-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Andy Shevchenko [Thu, 8 Sep 2016 10:32:31 +0000 (13:32 +0300)]

x86/platform/intel-mid: Add Intel Penwell to ID table

commit 8e522e1d321b12829960c9b26668c92f14c68d7f upstream.

Commit:

ca22312dc840 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")

... enabled the PWRMU driver on platforms based on Intel Penwell, but
unfortunately this is not enough.

Add Intel Penwell ID to pci-mid.c driver as well. To avoid confusion in the
future add a comment to both drivers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ca22312dc840 ("x86/platform/intel-mid: Extend PWRMU to support Penwell")
Link: http://lkml.kernel.org/r/20160908103232.137587-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Andy Shevchenko [Tue, 6 Sep 2016 18:42:54 +0000 (21:42 +0300)]

x86/cpu: Rename Merrifield2 to Moorefield

commit f5fbf848303c8704d0e1a1e7cabd08fd0a49552f upstream.

Merrifield2 is actually Moorefield.

Rename it accordingly and drop tail digit from Merrifield1.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160906184254.94440-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dave Hansen [Fri, 7 Oct 2016 16:23:42 +0000 (09:23 -0700)]

x86/pkeys: Make protection keys an "eager" feature

commit d4b05923f579c234137317cdf9a5eb69ddab76d1 upstream.

Our XSAVE features are divided into two categories: those that
generate FPU exceptions, and those that do not.  MPX and pkeys do
not generate FPU exceptions and thus can not be used lazily.  We
disable them when lazy mode is forced on.

We have a pair of masks to collect these two sets of features, but
XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY.
Fix it by moving the feature to XFEATURE_MASK_EAGER.

Note: this only causes problem if you boot with lazy FPU mode
(eagerfpu=off) which is *not* the default.  It also only affects
hardware which is not currently publicly available.  It looks like
eager mode is going away, but we still need this patch applied
to any kernel that has protection keys and lazy mode, which is 4.6
through 4.8 at this point, and 4.9 if the lazy removal isn't sent
to Linus for 4.9.

Fixes: c8df40098451 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures")
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Mika Westerberg [Mon, 3 Oct 2016 10:17:08 +0000 (13:17 +0300)]

x86/irq: Prevent force migration of irqs which are not in the vector domain

commit db91aa793ff984ac048e199ea1c54202543952fe upstream.

When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
affinities related to the CPU in question. The same thing is also done when
the system is suspended to S-states like S3 (mem).

For each IRQ we try to complete any on-going move regardless whether the
IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
its chip_data, assume it is of type struct apic_chip_data and manipulate it
by clearing old_domain mask etc. For irq_chips that are not part of the
x86_vector_domain, like those created by various GPIO drivers, will find
their chip_data being changed unexpectly.

Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
corrupted after resume:

  # cat /sys/kernel/debug/gpio
  gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
   gpio-511 (                    |sysfs               ) in  hi

  # rtcwake -s10 -mmem
  <10 seconds passes>

  # cat /sys/kernel/debug/gpio
  gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
   gpio-511 (                    |sysfs               ) in  ?

Note '?' in the output. It means the struct gpio_chip ->get function is
NULL whereas before suspend it was there.

Fix this by first checking that the IRQ belongs to x86_vector_domain before
we try to use the chip_data as struct apic_chip_data.

Reported-and-tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dan Williams [Wed, 21 Sep 2016 19:50:45 +0000 (12:50 -0700)]

x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation

commit 917db484dc6a69969d317b3e57add4208a8d9d42 upstream.

In commit:

ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type")

Christoph references the original patch I wrote implementing pmem support.
The intent of the 'max_pfn' changes in that commit were to enable persistent
memory ranges to be covered by the struct page memmap by default.

However, that approach was abandoned when Christoph ported the patches [1], and
that functionality has since been replaced by devm_memremap_pages().

In the meantime, this max_pfn manipulation is confusing kdump [2] that
assumes that everything covered by the max_pfn is "System RAM". This
results in kdump hanging or crashing.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-March/000348.html
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1351098

So fix it.

Reported-by: Zhang Yi <yizhan@redhat.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Zhang Yi <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@lists.01.org
Fixes: ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type")
Link: http://lkml.kernel.org/r/147448744538.34910.11287693517367139607.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Mark Rutland [Fri, 23 Sep 2016 16:55:05 +0000 (17:55 +0100)]

arm64: fix dump_backtrace/unwind_frame with NULL tsk

commit b5e7307d9d5a340d2c9fabbe1cee137d4c682c71 upstream.

In some places, dump_backtrace() is called with a NULL tsk parameter,
e.g. in bug_handler() in arch/arm64, or indirectly via show_stack() in
core code. The expectation is that this is treated as if current were
passed instead of NULL. Similar is true of unwind_frame().

Commit a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust") didn't
take this into account. In dump_backtrace() it compares tsk against
current *before* we check if tsk is NULL, and in unwind_frame() we never
set tsk if it is NULL.

Due to this, we won't initialise irq_stack_ptr in either function. In
dump_backtrace() this results in calling dump_mem() for memory
immediately above the IRQ stack range, rather than for the relevant
range on the task stack. In unwind_frame we'll reject unwinding frames
on the IRQ stack.

In either case this results in incomplete or misleading backtrace
information, but is not otherwise problematic. The initial percpu areas
(including the IRQ stacks) are allocated in the linear map, and dump_mem
uses __get_user(), so we shouldn't access anything with side-effects,
and will handle holes safely.

This patch fixes the issue by having both functions handle the NULL tsk
case before doing anything else with tsk.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: a80a0eb70c358f8c ("arm64: make irq_stack_ptr more robust")
Acked-by: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dan Carpenter [Thu, 14 Jul 2016 10:15:46 +0000 (13:15 +0300)]

KVM: PPC: BookE: Fix a sanity check

commit ac0e89bb4744d3882ccd275f2416d9ce22f4e1e7 upstream.

We use logical negate where bitwise negate was intended. It means that
we never return -EINVAL here.

Fixes: ce11e48b7fdd ('KVM: PPC: E500: Add userspace debug stub support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Christoffer Dall [Tue, 27 Sep 2016 16:53:35 +0000 (18:53 +0200)]

KVM: arm/arm64: vgic: Don't flush/sync without a working vgic

commit 0099b7701f5296a758d9e6b945ec96f96847cc2f upstream.

If the vgic hasn't been created and initialized, we shouldn't attempt to
look at its data structures or flush/sync anything to the GIC hardware.

This fixes an issue reported by Alexander Graf when using a userspace
irqchip.

Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
Reported-by: Alexander Graf <agraf@suse.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Christoffer Dall [Tue, 27 Sep 2016 01:51:47 +0000 (18:51 -0700)]

KVM: arm64: Require in-kernel irqchip for PMU support

commit 6fe407f2d18a4f94216263f91cb7d1f08fa5887c upstream.

If userspace creates a PMU for the VCPU, but doesn't create an in-kernel
irqchip, then we end up in a nasty path where we try to take an
uninitialized spinlock, which can lead to all sorts of breakages.

Luckily, QEMU always creates the VGIC before the PMU, so we can
establish this as ABI and check for the VGIC in the PMU init stage.
This can be relaxed at a later time if we want to support PMU with a
userspace irqchip.

Cc: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

James Hogan [Thu, 15 Sep 2016 16:20:06 +0000 (17:20 +0100)]

KVM: MIPS: Drop other CPU ASIDs on guest MMU changes

commit 91e4f1b6073dd680d86cdb7e42d7cccca9db39d8 upstream.

When a guest TLB entry is replaced by TLBWI or TLBWR, we only invalidate
TLB entries on the local CPU. This doesn't work correctly on an SMP host
when the guest is migrated to a different physical CPU, as it could pick
up stale TLB mappings from the last time the vCPU ran on that physical
CPU.

Therefore invalidate both user and kernel host ASIDs on other CPUs,
which will cause new ASIDs to be generated when it next runs on those
CPUs.

We're careful only to do this if the TLB entry was already valid, and
only for the kernel ASID where the virtual address it mapped is outside
of the guest user address range.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Thomas Huth [Wed, 21 Sep 2016 13:06:45 +0000 (15:06 +0200)]

KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register

commit fa73c3b25bd8d0d393dc6109a1dba3c2aef0451e upstream.

The MMCR2 register is available twice, one time with number 785
(privileged access), and one time with number 769 (unprivileged,
but it can be disabled completely). In former times, the Linux
kernel was using the unprivileged register 769 only, but since
commit 8dd75ccb571f3c92c ("powerpc: Use privileged SPR number
for MMCR2"), it uses the privileged register 785 instead.
The KVM-PR code then of course also switched to use the SPR 785,
but this is causing older guest kernels to crash, since these
kernels still access 769 instead. So to support older kernels
with KVM-PR again, we have to support register 769 in KVM-PR, too.

Fixes: 8dd75ccb571f3c92c48014b3dabd3d51a115ab41
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Boris Ostrovsky [Wed, 5 Oct 2016 17:09:33 +0000 (13:09 -0400)]

xen/x86: Update topology map for PV VCPUs

commit a6a198bc60e6c980a56eca24d33dc7f29139f8ea upstream.

Early during boot topology_update_package_map() computes
logical_pkg_ids for all present processors.

Later, when processors are brought up, identify_cpu() updates
these values based on phys_pkg_id which is a function of
initial_apicid. On PV guests the latter may point to a
non-existing node, causing logical_pkg_ids to be set to -1.

Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
to index its arrays and therefore in this case will point to index
65535 (since logical_pkg_id is a u16). This could lead to either a
crash or may actually access random memory location.

As a workaround, we recompute topology during CPU bringup to reset
logical_pkg_id to a valid value.

(The reason for initial_apicid being bogus is because it is
initial_apicid of the processor from which the guest is launched.
This value is CPUID(1).EBX[31:24])

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Uwe Kleine-König [Fri, 29 Jul 2016 19:29:15 +0000 (21:29 +0200)]

mfd: wm8350-i2c: Make sure the i2c regmap functions are compiled

commit 88003fb10f1fc606e1704611c62ceae95fd1d7da upstream.

This fixes a compile failure:

drivers/built-in.o: In function `wm8350_i2c_probe':
core.c:(.text+0x828b0): undefined reference to `__devm_regmap_init_i2c'
Makefile:953: recipe for target 'vmlinux' failed

Fixes: 52b461b86a9f ("mfd: Add regmap cache support for wm8350")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dan Carpenter [Thu, 4 Aug 2016 05:26:56 +0000 (08:26 +0300)]

mfd: 88pm80x: Double shifting bug in suspend/resume

commit 9a6dc644512fd083400a96ac4a035ac154fe6b8d upstream.

set_bit() and clear_bit() take the bit number so this code is really
doing "1 << (1 << irq)" which is a double shift bug. It's done
consistently so it won't cause a problem unless "irq" is more than 4.

Fixes: 70c6cce04066 ('mfd: Support 88pm80x in 80x driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Boris Brezillon [Tue, 6 Sep 2016 12:19:29 +0000 (14:19 +0200)]

mfd: atmel-hlcdc: Do not sleep in atomic context

commit 2c2469bc03d569c49119db2cccb5cb3f0c6a5b33 upstream.

readl_poll_timeout() calls usleep_range(), but
regmap_atmel_hlcdc_reg_write() is called in atomic context (regmap
spinlock held).

Replace the readl_poll_timeout() call by readl_poll_timeout_atomic().

Fixes: ea31c0cf9b07 ("mfd: atmel-hlcdc: Implement config synchronization")
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Lu Baolu [Thu, 11 Aug 2016 02:39:03 +0000 (10:39 +0800)]

mfd: rtsx_usb: Avoid setting ucr->current_sg.status

commit 8dcc5ff8fcaf778bb57ab4448fedca9e381d088f upstream.

Member "status" of struct usb_sg_request is managed by usb core. A
spin lock is used to serialize the change of it. The driver could
check the value of req->status, but should avoid changing it without
the hold of the spinlock. Otherwise, it could cause race or error
in usb core.

This patch could be backported to stable kernels with version later
than v3.14.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Roger Tseng <rogerable@realtek.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Takashi Sakamoto [Sun, 25 Sep 2016 13:00:20 +0000 (22:00 +0900)]

ALSA: usb-line6: use the same declaration as definition in header for MIDI manufacturer ID

commit 8da08ca03b73593d5299893bf29fc08569c3fb5f upstream.

Currently, usb-line6 module exports an array of MIDI manufacturer ID and
usb-pod module uses it. However, the declaration is not the definition in
common header. The difference is explicit length of array. Although
compiler calculates it and everything goes well, it's better to use the
same representation between definition and declaration.

This commit fills the length of array for usb-line6 module. As a small
good sub-effect, this commit suppress below warnings from static analysis
by sparse v0.5.0.

sound/usb/line6/driver.c:274:43: error: cannot size expression
sound/usb/line6/driver.c:275:16: error: cannot size expression
sound/usb/line6/driver.c:276:16: error: cannot size expression
sound/usb/line6/driver.c:277:16: error: cannot size expression

Fixes: 705ececd1c60 ("Staging: add line6 usb driver")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Anssi Hannula [Fri, 23 Sep 2016 03:43:47 +0000 (06:43 +0300)]

ALSA: usb-audio: Extend DragonFly dB scale quirk to cover other variants

commit eb1a74b7bea17eea31915c4f76385cefe69d9795 upstream.

The DragonFly quirk added in 42e3121d90f4 ("ALSA: usb-audio: Add a more
accurate volume quirk for AudioQuest DragonFly") applies a custom dB map
on the volume control when its range is reported as 0..50 (0 .. 0.2dB).

However, there exists at least one other variant (hw v1.0c, as opposed
to the tested v1.2) which reports a different non-sensical volume range
(0..53) and the custom map is therefore not applied for that device.

This results in all of the volume change appearing close to 100% on
mixer UIs that utilize the dB TLV information.

Add a fallback case where no dB TLV is reported at all if the control
range is not 0..50 but still 0..N where N <= 1000 (3.9 dB). Also
restrict the quirk to only apply to the volume control as there is also
a mute control which would match the check otherwise.

Fixes: 42e3121d90f4 ("ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly")
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Reported-by: David W <regulars@d-dub.org.uk>
Tested-by: David W <regulars@d-dub.org.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Takashi Iwai [Wed, 21 Sep 2016 12:38:02 +0000 (14:38 +0200)]

ALSA: ali5451: Fix out-of-bound position reporting

commit db68577966abc1aeae4ec597b3dcfa0d56e92041 upstream.

The pointer callbacks of ali5451 driver may return the value at the
boundary occasionally, and it results in the kernel warning like
snd_ali5451 0000:00:06.0: BUG: , pos = 16384, buffer size = 16384, period size = 1024

It seems that folding the position offset is enough for fixing the
warning and no ill-effect has been seen by that.

Reported-by: Enrico Mioso <mrkiko.rs@gmail.com>
Tested-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Chen-Yu Tsai [Fri, 9 Sep 2016 03:58:18 +0000 (11:58 +0800)]

phy: sun4i-usb: Use spinlock to guard phyctl register access

commit 919ab2524c52e5f801d8873f09145ce822cdd43a upstream.

The musb driver calls into this phy driver to disable/enable squelch
detection. This function was introduced in 24fe86a617c5 ("phy: sun4i-usb:
Add a sunxi specific function for setting squelch-detect"). This
function in turn calls sun4i_usb_phy_write, which uses a mutex to
guard the common access register. Unfortunately musb does this
in atomic context, which results in the following warning with lock
debugging enabled:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
in_atomic(): 1, irqs_disabled(): 128, pid: 96, name: kworker/0:2
CPU: 0 PID: 96 Comm: kworker/0:2 Not tainted 4.8.0-rc4-00181-gd502f8ad1c3e #13
Hardware name: Allwinner sun8i Family
Workqueue: events musb_deassert_reset
[<c010bc01>] (unwind_backtrace) from [<c0109237>] (show_stack+0xb/0xc)
[<c0109237>] (show_stack) from [<c02a669b>] (dump_stack+0x67/0x74)
[<c02a669b>] (dump_stack) from [<c05d68c9>] (mutex_lock+0x15/0x2c)
[<c05d68c9>] (mutex_lock) from [<c02c3589>] (sun4i_usb_phy_write+0x39/0xec)
[<c02c3589>] (sun4i_usb_phy_write) from [<c03e6327>] (musb_port_reset+0xfb/0x184)
[<c03e6327>] (musb_port_reset) from [<c03e4917>] (musb_deassert_reset+0x1f/0x2c)
[<c03e4917>] (musb_deassert_reset) from [<c012ecb5>] (process_one_work+0x129/0x2b8)
[<c012ecb5>] (process_one_work) from [<c012f5e3>] (worker_thread+0xf3/0x424)
[<c012f5e3>] (worker_thread) from [<c0132dbd>] (kthread+0xa1/0xb8)
[<c0132dbd>] (kthread) from [<c0105f31>] (ret_from_fork+0x11/0x20)

Since the register access is mmio, we can use a spinlock to guard this
specific access, rather than the mutex that guards the entire phy.

Fixes: ba4bdc9e1dc0 ("PHY: sunxi: Add driver for sunxi usb phy")
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Lu Baolu [Fri, 9 Sep 2016 04:51:27 +0000 (12:51 +0800)]

usb: dwc3: fix Clear Stall EP command failure

commit 5e6c88d28ccbe72bedee1fbf4f9fea4764208598 upstream.

Commit 50c763f8c1bac ("usb: dwc3: Set the ClearPendIN bit on Clear
Stall EP command") sets ClearPendIN bit for all IN endpoints of
v2.60a+ cores. This causes ClearStall command fails on 2.60+ cores
operating in HighSpeed mode.

In page 539 of 2.60a specification:

"When issuing Clear Stall command for IN endpoints in SuperSpeed
mode, the software must set the "ClearPendIN" bit to '1' to
clear any pending IN transcations, so that the device does not
expect any ACK TP from the host for the data sent earlier."

It's obvious that we only need to apply this rule to those IN
endpoints that currently operating in SuperSpeed mode.

Fixes: 50c763f8c1bac ("usb: dwc3: Set the ClearPendIN bit on Clear Stall EP command")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

John Stultz [Wed, 5 Oct 2016 02:55:48 +0000 (19:55 -0700)]

timekeeping: Fix __ktime_get_fast_ns() regression

commit 58bfea9532552d422bde7afa207e1a0f08dffa7d upstream.

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
swapper     0 [000]   253.427536:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.426573:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.426687:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.426800:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.426905:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.427022:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.427127:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.427239:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.427346:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   254.427463:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]   255.426572:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
swapper     0 [000]    39.953768:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.064839:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.175956:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.287103:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.398217:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.509324:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.620437:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.731546:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.842654:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    40.953772:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
swapper     0 [000]    41.064881:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg <bgregg@netflix.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-and-reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Heiner Kallweit [Wed, 3 Aug 2016 19:46:47 +0000 (21:46 +0200)]

usb: storage: fix runtime pm issue in usb_stor_probe2

commit a094760b9a77f81ee3cbeff323ee77c928f41106 upstream.

Since commit 71723f95463d "PM / runtime: print error when activating a
child to unactive parent" I see the following error message:

scsi host2: usb-storage 1-3:1.0
scsi host2: runtime PM trying to activate child device host2 but parent
(1-3:1.0) is not active

Digging into it it seems to be related to the problem described in the
commit message for cd998ded5c12 "i2c: designware: Prevent runtime
suspend during adapter registration" as scsi_add_host also calls
device_add and after the call to device_add the parent device is
suspended.

Fix this by using the approach from the mentioned commit and getting
the runtime pm reference before calling scsi_add_host.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Greg Kroah-Hartman [Fri, 7 Oct 2016 13:03:33 +0000 (15:03 +0200)]

Linux 4.8.1

commit | commitdiff | tree

Takashi Iwai [Tue, 27 Sep 2016 14:44:49 +0000 (16:44 +0200)]

ALSA: hda - Add the top speaker pin config for HP Spectre x360

commit 0eec880966e77bdbee0112989a2be67d92e39929 upstream.

HP Spectre x360 with CX20724 codec has two speaker outputs while the
BIOS sets up only the bottom one (NID 0x17) and disables the top one
(NID 0x1d).

This patch adds a fixup simply defining the proper pincfg for NID 0x1d
so that the top speaker works as is.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=169071
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Hui Wang [Sun, 11 Sep 2016 03:26:16 +0000 (11:26 +0800)]

ALSA: hda - Fix headset mic detection problem for several Dell laptops

commit 3f640970a41429f0a076c01270bbd014c9eae61c upstream.

One of the laptops has the codec ALC256 on it, applying the
ALC255_FIXUP_DELL1_MIC_NO_PRESENCE can fix the problem, the rest
of laptops have the codec ALC295 on them, they are similar to machines
with ALC225, applying the ALC269_FIXUP_DELL1_MIC_NO_PRESENCE can fix
the problem.

Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Hui Wang [Mon, 26 Sep 2016 02:59:38 +0000 (10:59 +0800)]

ALSA: hda - Adding one more ALC255 pin definition for headset problem

commit 392c9da24a994f238c5d7ea611c6245be4617014 upstream.

We have two new Dell laptop models, they have the same ALC255 pin
definition, but not in the pin quirk table yet, as a result, the
headset microphone can't work. After adding the definition in the
table, the headset microphone works well.

Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Greg Kroah-Hartman [Wed, 28 Sep 2016 09:48:44 +0000 (11:48 +0200)]

Revert "usbtmc: convert to devm_kzalloc"

commit ab21b63e8aedfc73565dd9cdd51eb338341177cb upstream.

This reverts commit e6c7efdcb76f11b04e3d3f71c8d764ab75c9423b.

Turns out it was totally wrong. The memory is supposed to be bound to
the kref, as the original code was doing correctly, not the
device/driver binding as the devm_kzalloc() would cause.

This fixes an oops when read would be called after the device was
unbound from the driver.

Reported-by: Ladislav Michl <ladis@linux-mips.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Kyle Jones [Fri, 23 Sep 2016 18:28:37 +0000 (13:28 -0500)]

USB: serial: cp210x: Add ID for a Juniper console

commit decc5360f23e9efe0252094f47f57f254dcbb3a9 upstream.

Signed-off-by: Kyle Jones <kyle@kf5jwc.us>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Nicolas Iooss [Tue, 23 Aug 2016 15:13:29 +0000 (17:13 +0200)]

usb: usbip: vudc: fix left shift overflow

commit 238b7bd91b16d5a08326f858db42229b212e53d8 upstream.

In v_recv_cmd_submit(), urb_p->urb->pipe has the type unsigned int
(which is 32-bit long on x86_64) but 11<<30 results in a 34-bit integer.
Therefore the 2 leading bits are truncated and

    urb_p->urb->pipe &= ~(11 << 30);

has the same meaning as

    urb_p->urb->pipe &= ~(3 << 30);

This second statement seems to be how the code was intended to be
written, as PIPE_ constants have values between 0 and 3.

The overflow has been detected with a clang warning:

    drivers/usb/usbip/vudc_rx.c:145:27: warning: signed shift result
    (0x2C0000000) requires 35 bits to represent, but 'int' only has 32
    bits [-Wshift-overflow]
            urb_p->urb->pipe &= ~(11 << 30);
                                  ~~ ^  ~~

Fixes: 79c02cb1fd5c ("usbip: vudc: Add vudc_rx")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Ksenija Stanojevic [Sun, 2 Oct 2016 15:42:35 +0000 (17:42 +0200)]

Staging: fbtft: Fix bug in fbtft-core

commit fc1e2c8ea85e109acf09e74789e9b852f6eed251 upstream.

Commit 367e8560e8d7a62d96e9b1d644028a3816e04206 introduced a bug
in fbtft-core where fps is always 0, this is because variable
update_time is not assigned correctly.

Signed-off-by: Ksenija Stanojevic <ksenija.stanojevic@gmail.com>
Fixes: 367e8560e8d7 ("Staging: fbtbt: Replace timespec with ktime_t")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Greg Kroah-Hartman [Mon, 19 Sep 2016 18:09:51 +0000 (19:09 +0100)]

usb: misc: legousbtower: Fix NULL pointer deference

commit 2fae9e5a7babada041e2e161699ade2447a01989 upstream.

This patch fixes a NULL pointer dereference caused by a race codition in
the probe function of the legousbtower driver. It re-structures the
probe function to only register the interface after successfully reading
the board's firmware ID.

The probe function does not deregister the usb interface after an error
receiving the devices firmware ID. The device file registered
(/dev/usb/legousbtower%d) may be read/written globally before the probe
function returns. When tower_delete is called in the probe function
(after an r/w has been initiated), core dev structures are deleted while
the file operation functions are still running. If the 0 address is
mappable on the machine, this vulnerability can be used to create a
Local Priviege Escalation exploit via a write-what-where condition by
remapping dev->interrupt_out_buffer in tower_write. A forged USB device
and local program execution would be required for LPE. The USB device
would have to delay the control message in tower_probe and accept
the control urb in tower_open whilst guest code initiated a write to the
device file as tower_delete is called from the error in tower_probe.

This bug has existed since 2003. Patch tested by emulated device.

Reported-by: James Patrick-Evans <james@jmp-e.com>
Tested-by: James Patrick-Evans <james@jmp-e.com>
Signed-off-by: James Patrick-Evans <james@jmp-e.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Linus Torvalds [Tue, 4 Oct 2016 04:03:48 +0000 (21:03 -0700)]

Using BUG_ON() as an assert() is _never_ acceptable

commit 21f54ddae449f4bdd9f1498124901d67202243d9 upstream.

That just generally kills the machine, and makes debugging only much
harder, since the traces may long be gone.

Debugging by assert() is a disease. Don't do it. If you can continue,
you're much better off doing so with a live machine where you have a
much higher chance that the report actually makes it to the system logs,
rather than result in a machine that is just completely dead.

The only valid situation for BUG_ON() is when continuing is not an
option, because there is massive corruption. But if you are just
verifying that something is true, you warn about your broken assumptions
(preferably just once), and limp on.

Fixes: 22f2ac51b6d6 ("mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()")
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Will Deacon [Fri, 26 Aug 2016 10:36:39 +0000 (11:36 +0100)]

arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP

commit 3a402a709500c5a3faca2111668c33d96555e35a upstream.

When TIF_SINGLESTEP is set for a task, the single-step state machine is
enabled and we must take care not to reset it to the active-not-pending
state if it is already in the active-pending state.

Unfortunately, that's exactly what user_enable_single_step does, by
unconditionally setting the SS bit in the SPSR for the current task.
This causes failures in the GDB testsuite, where GDB ends up missing
expected step traps if the instruction being stepped generates another
trap, e.g. PTRACE_EVENT_FORK from an SVC instruction.

This patch fixes the problem by preserving the current state of the
stepping state machine when TIF_SINGLESTEP is set on the current thread.

Cc: <stable@vger.kernel.org>
Reported-by: Yao Qi <yao.qi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Oct 2016 23:24:33 +0000 (16:24 -0700)]

Linux 4.8

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Oct 2016 22:23:00 +0000 (15:23 -0700)]

Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"Three relatively small fixes for ARM:

   - Roger noticed that dma_max_pfn() was calculating the upper limit
     wrongly, by adding the PFN offset of memory twice.

   - A fix from Robin to correct parsing of MPIDR values when the
     address size is larger than one BE32 unit.

   - A fix from Srinivas to ensure that we do not rely on the boot
     loader (or previous Linux kernel) setting the translation table
     base register a certain way in the decompressor, which can lead to
     crashes"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
  ARM: 8617/1: dma: fix dma_max_pfn()
  ARM: 8616/1: dt: Respect property size when parsing CPUs

commit | commitdiff | tree

Srinivas Ramana [Fri, 30 Sep 2016 14:03:31 +0000 (15:03 +0100)]

ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7

If the bootloader uses the long descriptor format and jumps to
kernel decompressor code, TTBCR may not be in a right state.
Before enabling the MMU, it is required to clear the TTBCR.PD0
field to use TTBR0 for translation table walks.

The commit dbece45894d3a ("ARM: 7501/1: decompressor:
reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but
doesn't consider all the bits for the size of TTBCR.N.

Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to
indicate the use of TTBR0 and the correct base address width.

Fixes: dbece45894d3 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Oct 2016 18:04:29 +0000 (11:04 -0700)]

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"The last regression fixes for 4.8 final:

   - Two patches addressing the fallout of the CR4 optimizations which
     caused CR4-less machines to fail.

   - Fix the VDSO build on big endian machines

   - Take care of FPU initialization if no CPUID is available otherwise
     task struct size ends up being zero

   - Fix up context tracking in case load_gs_index fails"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Fix context tracking state warning when load_gs_index fails
  x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
  x86/vdso: Fix building on big endian host
  x86/boot: Fix another __read_cr4() case on 486
  x86/init: Fix cr4_init_shadow() on CR4-less machines

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Oct 2016 17:53:38 +0000 (10:53 -0700)]

Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"Another round of fixes:

   - CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systems
   - CPS: Avoid BUG() when offlining pre-r6 CPUs
   - DEC: Avoid gas warnings due to suspicious instruction scheduling by
     manually expanding assembler macros.
   - FTLB: Fix configuration by moving confiuguratoin after probing
   - FTLB: clear execution hazard after changing FTLB enable
   - Highmem: Fix detection of unsupported highmem with cache aliases
   - I6400: Don't touch FTLBP chicken bits
   - microMIPS: Fix BUILD_ROLLBACK_PROLOGUE
   - Malta: Fix IOCU disable switch read for MIPS64
   - Octeon: Fix probing of devices attached to GPIO lines
   - uprobes: Misc small fixes"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systems
  MIPS: Fix detection of unsupported highmem with cache aliases
  MIPS: Malta: Fix IOCU disable switch read for MIPS64
  MIPS: Fix BUILD_ROLLBACK_PROLOGUE for microMIPS
  MIPS: clear execution hazard after changing FTLB enable
  MIPS: Configure FTLB after probing TLB sizes from config4
  MIPS: Stop setting I6400 FTLBP
  MIPS: DEC: Avoid la pseudo-instruction in delay slots
  MIPS: Octeon: mark GPIO controller node not populated after IRQ init.
  MIPS: uprobes: fix use of uninitialised variable
  MIPS: uprobes: remove incorrect set_orig_insn
  MIPS: fix uretprobe implementation
  MIPS: smp-cps: Avoid BUG() when offlining pre-r6 CPUs

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Oct 2016 17:42:26 +0000 (10:42 -0700)]

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:

1) Fix section mismatches in some builds, from Paul Gortmaker.

2) Need to count huge zero page mappings when doing TSB sizing, from
    Mike Kravetz.

3) Fix handing of cpu_possible_mask when nr_cpus module option is
    specified, from Atish Patra.

4) Don't allocate irq stacks until nr_irqs has been processed, also
    from Atish Patra.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix non-SMP build.
  sparc64: Fix irq stack bootmem allocation.
  sparc64: Fix cpu_possible_mask if nr_cpus is set
  sparc64 mm: Fix more TSB sizing issues
  sparc64: fix section mismatch in find_numa_latencies_for_group

commit | commitdiff | tree

Linus Torvalds [Sun, 2 Oct 2016 17:36:41 +0000 (10:36 -0700)]

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix wrong TCP checksums on MTU probing when checksum offloading is
    disabled, from Douglas Caetano dos Santos.

2) Fix qdisc backlog updates in qfq and sfb schedulers, from Cong Wang.

3) Route lookup flow key protocol value is wrong in ip6gre_xmit_other(),
    fix from Lance Richardson.

4) Scheduling while atomic in multicast routing code of ipv4 and ipv6,
    fix from Nikolay Aleksandrov.

5) Fix packet alignment in fec driver, from Eric Nelson.

6) Fix perf regression in sctp due to struct layout and cache misses,
    from Xin Long.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock
  sctp: change to check peer prsctp_capable when using prsctp polices
  sctp: remove prsctp_param from sctp_chunk
  sctp: move sent_count to the memory hole in sctp_chunk
  tg3: Avoid NULL pointer dereference in tg3_io_error_detected()
  act_ife: Fix false encoding
  act_ife: Fix external mac header on encode
  VSOCK: Don't dec ack backlog twice for rejected connections
  Revert "net: ethernet: bcmgenet: use phydev from struct net_device"
  net: fec: align IP header in hardware
  net: fec: remove QUIRK_HAS_RACC from i.mx27
  net: fec: remove QUIRK_HAS_RACC from i.mx25
  ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
  ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
  tcp: fix a compile error in DBGUNDO()
  tcp: fix wrong checksum calculation on MTU probing
  sch_sfb: keep backlog updated with qlen
  sch_qfq: keep backlog updated with qlen
  can: dev: fix deadlock reported after bus-off

commit | commitdiff | tree

Paul Burton [Fri, 30 Sep 2016 16:25:01 +0000 (17:25 +0100)]

MIPS: CM: Fix mips_cm_max_vp_width for non-MT kernels on MT systems

When discovering the number of VPEs per core, smp_num_siblings will be
incorrect for kernels built without support for the MIPS MultiThreading
(MT) ASE running on systems which implement said ASE. This leads to
accesses to VPEs in secondary cores being performed incorrectly since
mips_cm_vp_id calculates the wrong ID to write to the local "other"
registers. Fix this by examining the number of VPEs in the core as
reported by the CM.

This patch presumes that the number of VPEs will be the same in each
core of the system. As this path only applies to systems with CM version
2.5 or lower, and this property is true of all such known systems, this
is likely to be fine but is described in a comment for good measure.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14338/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

commit | commitdiff | tree

Linus Torvalds [Sat, 1 Oct 2016 14:37:15 +0000 (07:37 -0700)]

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"One final fix before 4.8.

  There was a memory leak triggered by turning scsi mq off due to the
  fact that we assume on host release that the already running hosts
  weren't mq based because that's the state of the global flag (even
  though they were).

  Fix it by tracking this on a per host host basis"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: Avoid that toggling use_blk_mq triggers a memory leak

commit | commitdiff | tree

Linus Torvalds [Sat, 1 Oct 2016 04:25:09 +0000 (21:25 -0700)]

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fix from Dmitry Torokhov:
"One small change to make joydev (which is used by older games) to bind
to devices that export Z axis but not X or Y (such as TRC rudder)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: joydev - recognize devices with Z axis as joysticks

commit | commitdiff | tree

Linus Torvalds [Fri, 30 Sep 2016 22:51:10 +0000 (15:51 -0700)]

Merge branch 'akpm' (patches from Andrew)

Merge more fixes from Andrew Morton:
"Three fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  include/linux/property.h: fix typo/compile error
  ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()

commit | commitdiff | tree

John Youn [Fri, 30 Sep 2016 22:11:35 +0000 (15:11 -0700)]

include/linux/property.h: fix typo/compile error

This fixes commit d76eebfa175e ("include/linux/property.h: fix build
issues with gcc-4.4.4").

With that commit we get the following compile error when using the
PROPERTY_ENTRY_INTEGER_ARRAY macro.

include/linux/property.h:201:39: error: `u32_data' undeclared (first
                 use in this function)
  PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u32, _val_)
                                       ^
include/linux/property.h:193:17: note: in definition of macro
                 `PROPERTY_ENTRY_INTEGER_ARRAY'
  { .pointer = { _type_##_data = _val_ } },  \
                 ^

This needs a '.' to reference the union member.  It seems this was just
overlooked here since it is done correctly in similar constructs in
other parts of the original commit.

This fix is in preparation of upcoming commits that will use this macro.

Fixes: commit d76eebfa175e ("include/linux/property.h: fix build issues with gcc-4.4.4")
Link: http://lkml.kernel.org/r/2de3b929290d88a723ed829a3e3cbd02044714df.1475114627.git.johnyoun@synopsys.com
Signed-off-by: John Youn <johnyoun@synopsys.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Eric Ren [Fri, 30 Sep 2016 22:11:32 +0000 (15:11 -0700)]

ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.

In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
there are 2 process repeatedly performing the following operations
respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
ftruncate(fd, CLUSTER_SIZE) again and again.

This is the backtrace when the deadlock happens:

   __wait_on_bit_lock+0x50/0xa0
   __lock_page+0xb7/0xc0
   ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
   ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
   do_page_mkwrite+0x66/0xc0
   handle_mm_fault+0x685/0x1350
   __do_page_fault+0x1d8/0x4d0
   trace_do_page_fault+0x37/0xf0
   do_async_page_fault+0x19/0x70
   async_page_fault+0x28/0x30

In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
disk space for this write; ocfs2_try_to_free_truncate_log() will be
called if -ENOSPC is returned; if we're lucky to get enough clusters,
which is usually the case, we start over again.

But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
will deadlock when trying to grab the target page again.

Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
Another deadlock will happen in __do_page_mkwrite() if
ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
locked target page.

These two errors fail on the same path, so fix them by unlocking the
target page manually before ocfs2_free_write_ctxt().

Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
cause.

Changes since v1:
1. Also put ENOMEM error case into consideration.

Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: He Gang <ghe@suse.com>
Acked-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Johannes Weiner [Fri, 30 Sep 2016 22:11:29 +0000 (15:11 -0700)]

mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()

Antonio reports the following crash when using fuse under memory pressure:

  kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: all of them
  CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
  Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
  task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
  RIP: shadow_lru_isolate+0x181/0x190
  Call Trace:
    __list_lru_walk_one.isra.3+0x8f/0x130
    list_lru_walk_one+0x23/0x30
    scan_shadow_nodes+0x34/0x50
    shrink_slab.part.40+0x1ed/0x3d0
    shrink_zone+0x2ca/0x2e0
    kswapd+0x51e/0x990
    kthread+0xd8/0xf0
    ret_from_fork+0x3f/0x70

which corresponds to the following sanity check in the shadow node
tracking:

  BUG_ON(node->count & RADIX_TREE_COUNT_MASK);

The workingset code tracks radix tree nodes that exclusively contain
shadow entries of evicted pages in them, and this (somewhat obscure)
line checks whether there are real pages left that would interfere with
reclaim of the radix tree node under memory pressure.

While discussing ways how fuse might sneak pages into the radix tree
past the workingset code, Miklos pointed to replace_page_cache_page(),
and indeed there is a problem there: it properly accounts for the old
page being removed - __delete_from_page_cache() does that - but then
does a raw raw radix_tree_insert(), not accounting for the replacement
page.  Eventually the page count bits in node->count underflow while
leaving the node incorrectly linked to the shadow node LRU.

To address this, make sure replace_page_cache_page() uses the tracked
page insertion code, page_cache_tree_insert().  This fixes the page
accounting and makes sure page-containing nodes are properly unlinked
from the shadow node LRU again.

Also, make the sanity checks a bit less obscure by using the helpers for
checking the number of pages and shadows in a radix tree node.

Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org> [3.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Javi Merino [Fri, 30 Sep 2016 12:14:28 +0000 (13:14 +0100)]

MAINTAINERS: Switch to email address for Javi Merino

Change my email address to my kernel.org account instead of the ARM one.

Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Wanpeng Li [Fri, 30 Sep 2016 01:01:06 +0000 (09:01 +0800)]

x86/entry/64: Fix context tracking state warning when load_gs_index fails

This warning:

WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50
CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13
Call Trace:
  dump_stack+0x99/0xd0
  __warn+0xd1/0xf0
  warn_slowpath_null+0x1d/0x20
  enter_from_user_mode+0x32/0x50
  error_entry+0x6d/0xc0
  ? general_protection+0x12/0x30
  ? native_load_gs_index+0xd/0x20
  ? do_set_thread_area+0x19c/0x1f0
  SyS_set_thread_area+0x24/0x30
  do_int80_syscall_32+0x7c/0x220
  entry_INT80_compat+0x38/0x50

... can be reproduced by running the GS testcase of the ldt_gdt test unit in
the x86 selftests.

do_int80_syscall_32() will call enter_form_user_mode() to convert context
tracking state from user state to kernel state. The load_gs_index() call
can fail with user gsbase, gsbase will be fixed up and proceed if this
happen.

However, enter_from_user_mode() will be called again in the fixed up path
though it is context tracking kernel state currently.

This patch fixes it by just fixing up gsbase and telling lockdep that IRQs
are off once load_gs_index() failed with user gsbase.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Andy Lutomirski [Wed, 28 Sep 2016 23:06:33 +0000 (16:06 -0700)]

x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID

Otherwise arch_task_struct_size == 0 and we die. While we're at it,
set X86_FEATURE_ALWAYS, too.

Reported-by: David Saggiorato <david@saggiorato.net>
Tested-by: David Saggiorato <david@saggiorato.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86")
Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Segher Boessenkool [Thu, 29 Sep 2016 11:51:00 +0000 (11:51 +0000)]

x86/vdso: Fix building on big endian host

We need to call GET_LE to read hdr->e_type.

Fixes: 57f90c3dfc75 ("x86/vdso: Error out if the vDSO isn't a valid DSO")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-next@vger.kernel.org
Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit | commitdiff | tree

Andy Lutomirski [Thu, 29 Sep 2016 19:48:11 +0000 (12:48 -0700)]

x86/boot: Fix another __read_cr4() case on 486

The condition for reading CR4 was wrong: there are some CPUs with
CPUID but not CR4. Rather than trying to make the condition exact,
use __read_cr4_safe().

Fixes: 18bc7bd523e0 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly")
Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit | commitdiff | tree

Xin Long [Wed, 28 Sep 2016 18:55:44 +0000 (02:55 +0800)]

sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock

When sctp dumps all the ep->assocs, it needs to lock_sock first,
but now it locks sock in rcu_read_lock, and lock_sock may sleep,
which would break rcu_read_lock.

This patch is to get and hold one sock when traversing the list.
After that and get out of rcu_read_lock, lock and dump it. Then
it will traverse the list again to get the next one until all
sctp socks are dumped.

For sctp_diag_dump_one, it fixes this issue by holding asoc and
moving cb() out of rcu_read_lock in sctp_transport_lookup_process.

Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Fri, 30 Sep 2016 06:07:10 +0000 (02:07 -0400)]

Merge branch 'sctp-fixes'

Xin Long says:

====================
sctp: a bunch of fixes for prsctp polices

This patchset is to fix 2 issues for prsctp polices:

  1. patch 1 and 2 fix "netperf-Throughput_Mbps -37.2% regression" issue
     when overloading the CPU.

  2. patch 3 fix "prsctp polices should check both sides' prsctp_capable,
     instead of only local side".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xin Long [Wed, 28 Sep 2016 18:37:28 +0000 (02:37 +0800)]

sctp: change to check peer prsctp_capable when using prsctp polices

Now before using prsctp polices, sctp uses asoc->prsctp_enable to
check if prsctp is enabled. However asoc->prsctp_enable is set only
means local host support prsctp, sctp should not abandon packet if
peer host doesn't enable prsctp.

So this patch is to use asoc->peer.prsctp_capable to check if prsctp
is enabled on both side, instead of asoc->prsctp_enable, as asoc's
peer.prsctp_capable is set only when local and peer both enable prsctp.

Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xin Long [Wed, 28 Sep 2016 18:37:27 +0000 (02:37 +0800)]

sctp: remove prsctp_param from sctp_chunk

Now sctp uses chunk->prsctp_param to save the prsctp param for all the
prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
and reuse msg->expires_at for TTL policy, as the prsctp polices and old
expires policy are mutual exclusive.

This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
polices.

Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
as it needs a u64 variables to save the expires_at time.

This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
issue.

Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xin Long [Wed, 28 Sep 2016 18:37:26 +0000 (02:37 +0800)]

sctp: move sent_count to the memory hole in sctp_chunk

Now pahole sctp_chunk, it has 2 memory holes:
   struct sctp_chunk {
struct list_head           list;
atomic_t                   refcnt;
/* XXX 4 bytes hole, try to pack */
...
long unsigned int          prsctp_param;
int                        sent_count;
/* XXX 4 bytes hole, try to pack */

This patch is to move up sent_count to fill the 1st one and eliminate
the 2nd one.

It's not just another struct compaction, it also fixes the "netperf-
Throughput_Mbps -37.2% regression" issue when overloading the CPU.

Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Milton Miller [Thu, 29 Sep 2016 16:24:08 +0000 (13:24 -0300)]

tg3: Avoid NULL pointer dereference in tg3_io_error_detected()

While the driver is probing the adapter, an error may occur before the
netdev structure is allocated and attached to pci_dev. In this case,
not only netdev isn't available, but the tg3 private structure is also
not available as it is just math from the NULL pointer, so dereferences
must be skipped.

The following trace is seen when the error is triggered:

  [1.402247] Unable to handle kernel paging request for data at address 0x00001a99
  [1.402410] Faulting instruction address: 0xc0000000007e33f8
  [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
  [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
  [1.402513] Modules linked in:
  [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu
  [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000
  [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0
  [1.402787] REGS: c000001fe4e8b790 TRAP: 0300   Not tainted  (4.4.0-36-generic)
  [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000422  XER: 20000000
  [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1
  GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000
  GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299
  GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006
  GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768
  GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002
  GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000
  [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340
  [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340

This patch avoids the NULL pointer dereference by moving the access after
the netdev NULL pointer check on tg3_io_error_detected(). Also, we add a
check for netdev being NULL on tg3_io_resume() [suggested by Michael Chan].

Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery")
Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error")
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 30 Sep 2016 03:16:57 +0000 (20:16 -0700)]

Merge tag 'drm-fixes-for-v4.8-final' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"drm fixes for final 4.8.

  One big regression fix for udl, along with two amdgpu fixes and two
  nouveau fixes.

  All seems pretty safe and useful"

* tag 'drm-fixes-for-v4.8-final' of git://people.freedesktop.org/~airlied/linux:
  drm/udl: fix line iterator in damage handling
  drm/radeon/si/dpm: add workaround for for Jet parts
  drm/amdgpu: disable CRTCs before teardown
  drm/nouveau: Revert "bus: remove cpu_coherent flag"
  drm/nouveau/fifo/nv04: avoid ramht race against cookie insertion

commit | commitdiff | tree

Linus Torvalds [Thu, 29 Sep 2016 21:59:11 +0000 (14:59 -0700)]

Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:

- Four fixes for "flush hint" support.

   Flush hints are addresses advertised by the ACPI 6+ NFIT (NVDIMM
   Firmware Interface Table) that when written and fenced guarantee that
   writes pending in platform write buffers (outside the cpu) have been
   flushed to media.  They might also be used by hypervisors as a
   trigger condition to flush guest-persistent memory ranges to storage.

    Fix a potential data corruption issue, a broken definition of the
    hint array, a wrong allocation size for the unit test implementation
    of the flush hint table, and missing NULL check in an error path.

    The unit test, while it did not prevent these bugs from being
    merged, at least triggered occasional crashes in advance of
    production usages.

- Fix handling of ACPI DSM error status results.  The DSM mechanism
   allows communication with platform and memory device firmware.  We
   correctly parse known errors, but were silently ignoring others.

   Fix it to consistently fail any command with a non-zero status return
   that we otherwise do not interpret / handle.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm, region: fix flush hint table thinko
  nfit: fail DSMs that return non-zero status by default
  libnvdimm: fix devm_nvdimm_memremap() error path
  tools/testing/nvdimm: fix allocation range for mock flush hint tables
  nvdimm: fix PHYS_PFN/PFN_PHYS mixup

commit | commitdiff | tree

Andy Lutomirski [Wed, 28 Sep 2016 19:34:14 +0000 (12:34 -0700)]

x86/init: Fix cr4_init_shadow() on CR4-less machines

cr4_init_shadow() will panic on 486-like machines without CR4. Fix
it using __read_cr4_safe().

Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
Link: http://lkml.kernel.org/r/43a20f81fb504013bf613913dc25574b45336a61.1475091074.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit | commitdiff | tree

Paul Burton [Fri, 2 Sep 2016 14:17:31 +0000 (15:17 +0100)]

MIPS: Fix detection of unsupported highmem with cache aliases

The paging_init() function contains code which detects that highmem is
in use but unsupported due to dcache aliasing. However this code was
ineffective because it was being run before the caches are probed,
meaning that cpu_has_dc_aliases would always evaluate to false (unless a
platform overrides it to a compile-time constant) and the detection of
the unsupported case is never triggered. The kernel would then go on to
attempt to use highmem & either hit coherency issues or trigger the
BUG_ON in flush_kernel_dcache_page().

Fix this by running paging_init() later than cpu_cache_init(), such that
the cpu_has_dc_aliases macro will evaluate correctly & the unsupported
highmem case will be detected successfully.

This then leads to a formerly hidden issue in that
mem_init_free_highmem() will attempt to free all highmem pages, even
though we're avoiding use of them & don't have valid page structs for
them. This leads to an invalid pointer dereference & a TLB exception.
Avoid this by skipping the loop in mem_init_free_highmem() if
cpu_has_dc_aliases evaluates true.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Rabin Vincent <rabinv@axis.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Jaedon Shin <jaedon.shin@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Cc: Jonas Gorski <jogo@openwrt.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14184/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

commit | commitdiff | tree

Paul Burton [Fri, 2 Sep 2016 15:07:10 +0000 (16:07 +0100)]

MIPS: Malta: Fix IOCU disable switch read for MIPS64

Malta boards used with CPU emulators feature a switch to disable use of
an IOCU. Software has to check this switch & ignore any present IOCU if
the switch is closed. The read used to do this was unsafe for 64 bit
kernels, as it simply casted the address 0xbf403000 to a pointer &
dereferenced it. Whilst in a 32 bit kernel this would access kseg1, in a
64 bit kernel this attempts to access xuseg & results in an address
error exception.

Fix by accessing a correctly formed ckseg1 address generated using the
CKSEG1ADDR macro.

Whilst modifying this code, define the name of the register and the bit
we care about within it, which indicates whether PCI DMA is routed to
the IOCU or straight to DRAM. The code previously checked that bit 0 was
also set, but the least significant 7 bits of the CONFIG_GEN0 register
contain the value of the MReqInfo signal provided to the IOCU OCP bus,
so singling out bit 0 makes little sense & that part of the check is
dropped.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b6d92b4a6bdb ("MIPS: Add option to disable software I/O coherency.")
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14187/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

commit | commitdiff | tree

Paul Burton [Fri, 19 Aug 2016 17:15:40 +0000 (18:15 +0100)]

MIPS: Fix BUILD_ROLLBACK_PROLOGUE for microMIPS

When the kernel is built for microMIPS, branches targets need to be
known to be microMIPS code in order to result in bit 0 of the PC being
set. The branch target in the BUILD_ROLLBACK_PROLOGUE macro was simply
the end of the macro, which may be pointing at padding rather than at
code. This results in recent enough GNU linkers complaining like so:

    mips-img-linux-gnu-ld: arch/mips/built-in.o: .text+0x3e3c: Unsupported branch between ISA modes.
    mips-img-linux-gnu-ld: final link failed: Bad value
    Makefile:936: recipe for target 'vmlinux' failed
    make: *** [vmlinux] Error 1

Fix this by changing the branch target to be the start of the
appropriate handler, skipping over any padding.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14019/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Linux kernel - Deliverables

RSS Atom

This page took 0.082515 seconds and 5 git commands to generate.