J. Bruce Fields [Mon, 12 Sep 2016 20:00:47 +0000 (16:00 -0400)]
nfsd: randomize SETCLIENTID reply to help distinguish servers
NFSv4.1 has built-in trunking support that allows a client to determine
whether two connections to two different IP addresses are actually to
the same server. NFSv4.0 does not, but RFC 7931 attempts to provide
clients a means to do this by suggesting that they perform SETCLIENTIDs
to the two servers and comparing the clientids and verifiers.
Linux clients since 05f4c350ee02 "NFS: Discover NFSv4 server trunking
when mounting" implement this suggestion. It is possible that other
clients do to.
knfsd generates the 64-bit clientid by concatenating the 32-bit boot
time (in seconds) and a counter. This makes collisions between
clientids generated by the same server extremely unlikely. But
collisions are very likely between clientids generated by servers that
boot at the same time, and it's quite common for multiple servers to
boot at the same time. The verifier is generated in the same way, so
has the same problem.
Therefore recent NFSv4.0 clients may decide two different servers are
really the same, and mount a filesystem from the wrong server.
The fault is really with RFC 7931, and needs a client fix, but it's not
clear what that fix will be. In the meantime, mitigate the chance of
these collisions by randomizing the starting value of the counters used
to generate clientids and verifiers.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
netfilter: nf_nat: handle NF_DROP from nfnetlink_parse_nat_setup()
nf_nat_setup_info() returns NF_* verdicts, so convert them to error
codes that is what ctnelink expects. This has passed overlook without
having any impact since this nf_nat_setup_info() has always returned
NF_ACCEPT so far. Since 870190a9ec90 ("netfilter: nat: convert nat bysrc
hash to rhashtable"), this is problem.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nft_numgen: fix race between num generate and store it
After we generate a new number, we still use the priv->counter and
store it to the dreg. This is not correct, another cpu may already
change it to a new number. So we must use the generated number, not
the priv->counter itself.
Fixes: 91dbc6be0a62 ("netfilter: nf_tables: add number generator expression") Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These counters sit in hot path and do show up in perf, this is especially
true for 'found' and 'searched' which get incremented for every packet
processed.
- on busy systems found and searched will overflow every few hours
(these are 32bit integers), other more busy ones every few days.
- for debugging there are better methods, such as iptables' trace target,
the conntrack log sysctls. Nowadays we also have perf tool.
This removes packet path stat counters except those that
are expected to be 0 (or close to 0) on a normal system, e.g.
'insert_failed' (race happened) or 'invalid' (proto tracker rejects).
The insert stat is retained for the ctnetlink case.
The found stat is retained for the tuple-is-taken check when NAT has to
determine if it needs to pick a different source address.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: don't drop IPv6 packets that cannot parse transport
This is overly conservative and not flexible at all, so better let them
go through and let the filtering policy decide what to do with them. We
use skb_header_pointer() all over the place so we would just fail to
match when trying to access fields from malformed traffic.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables_bridge: use nft_set_pktinfo_ipv{4, 6}_validate
Consolidate pktinfo setup and validation by using the new generic
functions so we converge to the netdev family codebase.
We only need a linear IPv4 and IPv6 header from the reject expression,
so move nft_bridge_iphdr_validate() and nft_bridge_ip6hdr_validate()
to net/bridge/netfilter/nft_reject_bridge.c.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These functions are extracted from the netdev family, they initialize
the pktinfo structure and validate that the IPv4 and IPv6 headers are
well-formed given that these functions are called from a path where
layer 3 sanitization did not happen yet.
These functions are placed in include/net/netfilter/nf_tables_ipv{4,6}.h
so they can be reused by a follow up patch to use them from the bridge
family too.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: ensure proper initialization of nft_pktinfo fields
This patch introduces nft_set_pktinfo_unspec() that ensures proper
initialization all of pktinfo fields for non-IP traffic. This is used
by the bridge, netdev and arp families.
This new function relies on nft_set_pktinfo_proto_unspec() to set a new
tprot_set field that indicates if transport protocol information is
available. Remain fields are zeroed.
The meta expression has been also updated to check to tprot_set in first
place given that zero is a valid tprot value. Even a handcrafted packet
may come with the IPPROTO_RAW (255) protocol number so we can't rely on
this value as tprot unset.
Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nft_dynset: allow to invert match criteria
The dynset expression matches if we can fit a new entry into the set.
If there is no room for it, then it breaks the rule evaluation.
This patch introduces the inversion flag so you can add rules to
explicitly drop packets that don't fit into the set. For example:
# nft filter input flow table xyz size 4 { ip saddr timeout 120s counter } overflow drop
This is useful to provide a replacement for connlimit.
For the rule above, every new entry uses the IPv4 address as key in the
set, this entry gets a timeout of 120 seconds that gets refresh on every
packet seen. If we get new flow and our set already contains 4 entries
already, then this packet is dropped.
You can already express this in positive logic, assuming default policy
to drop:
Russell King [Wed, 7 Sep 2016 16:15:23 +0000 (17:15 +0100)]
Merge tag 'arm-plt-optimizations-for-v4.9' of git://git.linaro.org/people/ard.biesheuvel/linux-arm into devel-stable
This series of 4 patches optimizes the ARM PLT generation code that
is invoked at module load time, to get rid of the O(n^2) algorithm
that results in pathological load times of 10 seconds or more for
large modules on certain STB platforms
Pull networking fixes from David Miller:
"Mostly small sets of driver fixes scattered all over the place.
1) Mediatek driver fixes from Sean Wang. Forward port not written
correctly during TX map, missed handling of EPROBE_DEFER, and
mistaken use of put_page() instead of skb_free_frag().
2) Fix socket double-free in KCM code, from WANG Cong.
3) QED driver fixes from Sudarsana Reddy Kalluru, including a fix for
using the dcbx buffers before initializing them.
4) Mellanox Switch driver fixes from Jiri Pirko, including a fix for
double fib removals and an error handling fix in
mlxsw_sp_module_init().
5) Fix kernel panic when enabling LLDP in i40e driver, from Dave
Ertman.
6) Fix padding of TSO packets in thunderx driver, from Sunil Goutham.
7) TCP's rcv_wup not initialized properly when using fastopen, from
Neal Cardwell.
8) Don't use uninitialized flow keys in flow dissector, from Gao
Feng.
9) Use after free in l2tp module unload, from Sabrina Dubroca.
10) Fix interrupt registry ordering issues in smsc911x driver, from
Jeremy Linton.
11) Fix crashes in bonding having to do with enslaving and rx_handler,
from Mahesh Bandewar.
12) AF_UNIX deadlock fixes from Linus.
13) In mlx5 driver, don't read skb->xmit_mode after it might have been
freed from the TX reclaim path. From Tariq Toukan.
14) Fix a bug from 2015 in TCP Yeah where the congestion window does
not increase, from Artem Germanov.
15) Don't pad frames on receive in NFP driver, from Jakub Kicinski.
16) Fix chunk fragmenting in SCTP wrt. GSO, from Marcelo Ricardo
Leitner.
17) Fix deletion of VRF routes, from Mark Tomlinson.
18) Fix device refcount leak when DAD fails in ipv6, from Wei Yongjun"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
net/mlx4_en: Fix panic on xmit while port is down
net/mlx4_en: Fixes for DCBX
net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()
net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()
net: ethernet: renesas: sh_eth: add POST registers for rz
drivers: net: phy: mdio-xgene: Add hardware dependency
dwc_eth_qos: do not register semi-initialized device
sctp: identify chunks that need to be fragmented at IP level
mlxsw: spectrum: Set port type before setting its address
mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init
nfp: don't pad frames on receive
nfp: drop support for old firmware ABIs
nfp: remove linux/version.h includes
tcp: cwnd does not increase in TCP YeAH
net/mlx5e: Fix parsing of vlan packets when updating lro header
net/mlx5e: Fix global PFC counters replication
net/mlx5e: Prevent casting overflow
net/mlx5e: Move an_disable_cap bit to a new position
net/mlx5e: Fix xmit_more counter race issue
tcp: fastopen: avoid negative sk_forward_alloc
...
Merge branches 'pnp' and 'device-properties' into linux-next
* pnp:
PNP: isapnp: make core more explicitly non-modular
* device-properties:
serial: 8250_dw: Add quirk for APM X-Gene SoC
ACPI / LPSS: Provide build-in properties of the UART
ACPI / APD: Provide build-in properties of the UART
driver core: Don't leak secondary fwnode on device removal
Merge branches 'pm-cpufreq' and 'pm-cpufreq-sched' into linux-next
* pm-cpufreq:
cpufreq: Drop unnecessary check from cpufreq_policy_alloc()
cpufreq-SCPI: Delete unnecessary assignment for the field "owner"
cpufreq: dt: Add exynos5433 compatible to use generic cpufreq driver
* pm-cpufreq-sched:
cpufreq / sched: ignore SMT when determining max cpu capacity
cpufreq / sched: Pass runqueue pointer to cpufreq_update_util()
cpufreq / sched: Pass flags to cpufreq_update_util()
Merge branches 'acpi-button', 'acpi-tables' and 'acpi-battery' into linux-next
* acpi-button:
ACPI / button: Add document for ACPI control method lid device restrictions
ACPI / button: Fix an issue in button.lid_init_state=ignore mode
* acpi-tables:
ACPI / tables: do not report the number of entries ignored by acpi_parse_entries()
ACPI / tables: fix acpi_parse_entries_array() so it traverses all subtables
ACPI / tables: fix incorrect counts returned by acpi_parse_entries_array()
Merge branches 'acpi-ec' and 'acpi-cppc' into linux-next
* acpi-ec:
ACPI / EC: Fix issues related to boot_ec
ACPI / EC: Fix a gap that ECDT EC cannot handle EC events
ACPI / EC: Fix a memory leakage issue in acpi_ec_add()
ACPI / EC: Cleanup first_ec/boot_ec code
ACPI / EC: Enable event freeze mode to improve event handling for suspend process
ACPI / EC: Add PM operations to improve event handling for suspend process
ACPI / EC: Add PM operations to improve event handling for resume process
ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled
ACPI / EC: Add EC_FLAGS_QUERY_ENABLED to reveal a hidden logic
ACPI / EC: Add PM operations for suspend/resume noirq stage
* acpi-cppc:
ACPI / CPPC: Add prefix cppc to cpudata structure name
ACPI / CPPC: Add support for functional fixed hardware address
ACPI / CPPC: Don't return on CPPC probe failure
ACPI / CPPC: Allow build with ACPI_CPU_FREQ_PSS config
ACPI / CPPC: check for error bit in PCC status field
ACPI / CPPC: move all PCC related information into pcc_data
ACPI / CPPC: add sysfs support to compute delivered performance
ACPI / CPPC: set a non-zero value for transition_latency
ACPI / CPPC: support for batching CPPC requests
ACPI / CPPC: acquire pcc_lock only while accessing PCC subspace
ACPI / CPPC: restructure read/writes for efficient sys mapped reg ops
mailbox: pcc: Support HW-Reduced Communication Subspace type 2