From e379e5f385f874adb0b414f917adb1fc50e20de9 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Thu, 12 Dec 2019 11:56:06 -0800 Subject: [PATCH] i386: Align branches within a fixed boundary Add 3 command-line options to align branches within a fixed boundary with segment prefixes or NOPs: 1. -malign-branch-boundary=NUM aligns branches within NUM byte boundary. 2. -malign-branch=TYPE[+TYPE...] specifies types of branches to align. The supported branches are: a. Conditional jump. b. Fused conditional jump. c. Unconditional jump. d. Call. e. Ret. f. Indirect jump and call. 3. -malign-branch-prefix-size=NUM aligns branches with NUM segment prefixes per instruction. 3 new rs_machine_dependent frag types are added: 1. BRANCH_PADDING. The variable size frag to insert NOP before branch. 2. BRANCH_PREFIX. The variable size frag to insert segment prefixes to an instruction. The choices of prefixes are: a. Use the existing segment prefix if there is one. b. Use CS segment prefix in 64-bit mode. c. In 32-bit mode, use SS segment prefix with ESP/EBP base register and use DS segment prefix without ESP/EBP base register. 3. FUSED_JCC_PADDING. The variable size frag to insert NOP before fused conditional jump. The new rs_machine_dependent frags aren't inserted if the previous item is a prefix or a constant directive, which may be used to hardcode an instruction, since there is no clear instruction boundary. Segment prefixes and NOP padding are disabled before relaxable TLS relocations and tls_get_addr calls to keep TLS instruction sequence unchanged. md_estimate_size_before_relax() and i386_generic_table_relax_frag() are used to handled BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags. i386_generic_table_relax_frag() grows or shrinks sizes of segment prefix and NOP to align the next branch frag: 1. First try to add segment prefixes to instructions before a branch. 2. If there is no sufficient room to add segment prefixes, NOP will be inserted before a branch. * config/tc-i386.c (_i386_insn): Add has_gotpc_tls_reloc. (tls_get_addr): New. (last_insn): New. (align_branch_power): New. (align_branch_kind): New. (align_branch_bit): New. (align_branch): New. (MAX_FUSED_JCC_PADDING_SIZE): New. (align_branch_prefix_size): New. (BRANCH_PADDING): New. (BRANCH_PREFIX): New. (FUSED_JCC_PADDING): New. (i386_generate_nops): Support BRANCH_PADDING and FUSED_JCC_PADDING. (md_begin): Abort if align_branch_prefix_size < MAX_FUSED_JCC_PADDING_SIZE. (md_assemble): Set last_insn. (maybe_fused_with_jcc_p): New. (add_fused_jcc_padding_frag_p): New. (add_branch_prefix_frag_p): New. (add_branch_padding_frag_p): New. (output_insn): Generate a BRANCH_PADDING, FUSED_JCC_PADDING or BRANCH_PREFIX frag and terminate each frag to align branches. (output_disp): Set i.has_gotpc_tls_reloc to TRUE for GOTPC and relaxable TLS relocations. (output_imm): Likewise. (i386_next_non_empty_frag): New. (i386_next_jcc_frag): New. (i386_classify_machine_dependent_frag): New. (i386_branch_padding_size): New. (i386_generic_table_relax_frag): New. (md_estimate_size_before_relax): Handle COND_JUMP_PADDING, FUSED_JCC_PADDING and COND_JUMP_PREFIX frags. (md_convert_frag): Handle BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags. (OPTION_MALIGN_BRANCH_BOUNDARY): New. (OPTION_MALIGN_BRANCH_PREFIX_SIZE): New. (OPTION_MALIGN_BRANCH): New. (md_longopts): Add -malign-branch-boundary=, -malign-branch-prefix-size= and -malign-branch=. (md_parse_option): Handle -malign-branch-boundary=, -malign-branch-prefix-size= and -malign-branch=. (md_show_usage): Display -malign-branch-boundary=, -malign-branch-prefix-size= and -malign-branch=. (i386_target_format): Set tls_get_addr. (i386_cons_align): New. * config/tc-i386.h (i386_cons_align): New. (md_cons_align): New. (i386_generic_table_relax_frag): New. (md_generic_table_relax_frag): New. (i386_tc_frag_data): Add u, padding_address, length, max_prefix_length, prefix_length, default_prefix, cmp_size, classified and branch_type. (TC_FRAG_INIT): Initialize u, padding_address, length, max_prefix_length, prefix_length, default_prefix, cmp_size, classified and branch_type. * doc/c-i386.texi: Document -malign-branch-boundary=, -malign-branch= and -malign-branch-prefix-size=. --- gas/ChangeLog | 60 +++ gas/config/tc-i386.c | 1046 +++++++++++++++++++++++++++++++++++++++++- gas/config/tc-i386.h | 31 ++ gas/doc/c-i386.texi | 26 ++ 4 files changed, 1160 insertions(+), 3 deletions(-) diff --git a/gas/ChangeLog b/gas/ChangeLog index ef7369cee9..7b83a0abd1 100644 --- a/gas/ChangeLog +++ b/gas/ChangeLog @@ -1,3 +1,63 @@ +2019-12-12 H.J. Lu + + * config/tc-i386.c (_i386_insn): Add has_gotpc_tls_reloc. + (tls_get_addr): New. + (last_insn): New. + (align_branch_power): New. + (align_branch_kind): New. + (align_branch_bit): New. + (align_branch): New. + (MAX_FUSED_JCC_PADDING_SIZE): New. + (align_branch_prefix_size): New. + (BRANCH_PADDING): New. + (BRANCH_PREFIX): New. + (FUSED_JCC_PADDING): New. + (i386_generate_nops): Support BRANCH_PADDING and FUSED_JCC_PADDING. + (md_begin): Abort if align_branch_prefix_size < + MAX_FUSED_JCC_PADDING_SIZE. + (md_assemble): Set last_insn. + (maybe_fused_with_jcc_p): New. + (add_fused_jcc_padding_frag_p): New. + (add_branch_prefix_frag_p): New. + (add_branch_padding_frag_p): New. + (output_insn): Generate a BRANCH_PADDING, FUSED_JCC_PADDING or + BRANCH_PREFIX frag and terminate each frag to align branches. + (output_disp): Set i.has_gotpc_tls_reloc to TRUE for GOTPC and + relaxable TLS relocations. + (output_imm): Likewise. + (i386_next_non_empty_frag): New. + (i386_next_jcc_frag): New. + (i386_classify_machine_dependent_frag): New. + (i386_branch_padding_size): New. + (i386_generic_table_relax_frag): New. + (md_estimate_size_before_relax): Handle COND_JUMP_PADDING, + FUSED_JCC_PADDING and COND_JUMP_PREFIX frags. + (md_convert_frag): Handle BRANCH_PADDING, BRANCH_PREFIX and + FUSED_JCC_PADDING frags. + (OPTION_MALIGN_BRANCH_BOUNDARY): New. + (OPTION_MALIGN_BRANCH_PREFIX_SIZE): New. + (OPTION_MALIGN_BRANCH): New. + (md_longopts): Add -malign-branch-boundary=, + -malign-branch-prefix-size= and -malign-branch=. + (md_parse_option): Handle -malign-branch-boundary=, + -malign-branch-prefix-size= and -malign-branch=. + (md_show_usage): Display -malign-branch-boundary=, + -malign-branch-prefix-size= and -malign-branch=. + (i386_target_format): Set tls_get_addr. + (i386_cons_align): New. + * config/tc-i386.h (i386_cons_align): New. + (md_cons_align): New. + (i386_generic_table_relax_frag): New. + (md_generic_table_relax_frag): New. + (i386_tc_frag_data): Add u, padding_address, length, + max_prefix_length, prefix_length, default_prefix, cmp_size, + classified and branch_type. + (TC_FRAG_INIT): Initialize u, padding_address, length, + max_prefix_length, prefix_length, default_prefix, cmp_size, + classified and branch_type. + * doc/c-i386.texi: Document -malign-branch-boundary=, + -malign-branch= and -malign-branch-prefix-size=. + 2019-12-12 H.J. Lu * write.c (md_generic_table_relax_frag): New. Defined to diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index accb634203..41a6a0817a 100644 --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -368,6 +368,9 @@ struct _i386_insn /* Has ZMM register operands. */ bfd_boolean has_regzmm; + /* Has GOTPC or TLS relocation. */ + bfd_boolean has_gotpc_tls_reloc; + /* RM and SIB are the modrm byte and the sib byte where the addressing modes of this insn are encoded. */ modrm_byte rm; @@ -562,6 +565,8 @@ static enum flag_code flag_code; static unsigned int object_64bit; static unsigned int disallow_64bit_reloc; static int use_rela_relocations = 0; +/* __tls_get_addr/___tls_get_addr symbol for TLS. */ +static const char *tls_get_addr; #if ((defined (OBJ_MAYBE_COFF) && defined (OBJ_MAYBE_AOUT)) \ || defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF) \ @@ -622,6 +627,21 @@ static int omit_lock_prefix = 0; "lock addl $0, (%{re}sp)". */ static int avoid_fence = 0; +/* Type of the previous instruction. */ +static struct + { + segT seg; + const char *file; + const char *name; + unsigned int line; + enum last_insn_kind + { + last_insn_other = 0, + last_insn_directive, + last_insn_prefix + } kind; + } last_insn; + /* 1 if the assembler should generate relax relocations. */ static int generate_relax_relocations @@ -635,6 +655,44 @@ static enum check_kind } sse_check, operand_check = check_warning; +/* Non-zero if branches should be aligned within power of 2 boundary. */ +static int align_branch_power = 0; + +/* Types of branches to align. */ +enum align_branch_kind + { + align_branch_none = 0, + align_branch_jcc = 1, + align_branch_fused = 2, + align_branch_jmp = 3, + align_branch_call = 4, + align_branch_indirect = 5, + align_branch_ret = 6 + }; + +/* Type bits of branches to align. */ +enum align_branch_bit + { + align_branch_jcc_bit = 1 << align_branch_jcc, + align_branch_fused_bit = 1 << align_branch_fused, + align_branch_jmp_bit = 1 << align_branch_jmp, + align_branch_call_bit = 1 << align_branch_call, + align_branch_indirect_bit = 1 << align_branch_indirect, + align_branch_ret_bit = 1 << align_branch_ret + }; + +static unsigned int align_branch = (align_branch_jcc_bit + | align_branch_fused_bit + | align_branch_jmp_bit); + +/* The maximum padding size for fused jcc. CMP like instruction can + be 9 bytes and jcc can be 6 bytes. Leave room just in case for + prefixes. */ +#define MAX_FUSED_JCC_PADDING_SIZE 20 + +/* The maximum number of prefixes added for an instruction. */ +static unsigned int align_branch_prefix_size = 5; + /* Optimization: 1. Clear the REX_W bit with register operand if possible. 2. Above plus use 128bit vector instruction to clear the full vector @@ -738,12 +796,19 @@ int x86_cie_data_alignment; /* Interface to relax_segment. There are 3 major relax states for 386 jump insns because the different types of jumps add different sizes to frags when we're - figuring out what sort of jump to choose to reach a given label. */ + figuring out what sort of jump to choose to reach a given label. + + BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING are used to align + branches which are handled by md_estimate_size_before_relax() and + i386_generic_table_relax_frag(). */ /* Types. */ #define UNCOND_JUMP 0 #define COND_JUMP 1 #define COND_JUMP86 2 +#define BRANCH_PADDING 3 +#define BRANCH_PREFIX 4 +#define FUSED_JCC_PADDING 5 /* Sizes. */ #define CODE16 1 @@ -1384,6 +1449,12 @@ i386_generate_nops (fragS *fragP, char *where, offsetT count, int limit) case rs_fill_nop: case rs_align_code: break; + case rs_machine_dependent: + /* Allow NOP padding for jumps and calls. */ + if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING + || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING) + break; + /* Fall through. */ default: return; } @@ -1528,7 +1599,7 @@ i386_generate_nops (fragS *fragP, char *where, offsetT count, int limit) return; } } - else + else if (fragP->fr_type != rs_machine_dependent) fragP->fr_var = count; if ((count / max_single_nop_size) > max_number_of_nops) @@ -3011,6 +3082,11 @@ md_begin (void) x86_dwarf2_return_column = 8; x86_cie_data_alignment = -4; } + + /* NB: FUSED_JCC_PADDING frag must have sufficient room so that it + can be turned into BRANCH_PREFIX frag. */ + if (align_branch_prefix_size > MAX_FUSED_JCC_PADDING_SIZE) + abort (); } void @@ -4539,6 +4615,17 @@ md_assemble (char *line) /* We are ready to output the insn. */ output_insn (); + + last_insn.seg = now_seg; + + if (i.tm.opcode_modifier.isprefix) + { + last_insn.kind = last_insn_prefix; + last_insn.name = i.tm.name; + last_insn.file = as_where (&last_insn.line); + } + else + last_insn.kind = last_insn_other; } static char * @@ -8196,11 +8283,206 @@ encoding_length (const fragS *start_frag, offsetT start_off, return len - start_off + (frag_now_ptr - frag_now->fr_literal); } +/* Return 1 for test, and, cmp, add, sub, inc and dec which may + be macro-fused with conditional jumps. */ + +static int +maybe_fused_with_jcc_p (void) +{ + /* No RIP address. */ + if (i.base_reg && i.base_reg->reg_num == RegIP) + return 0; + + /* No VEX/EVEX encoding. */ + if (is_any_vex_encoding (&i.tm)) + return 0; + + /* and, add, sub with destination register. */ + if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25) + || i.tm.base_opcode <= 5 + || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d) + || ((i.tm.base_opcode | 3) == 0x83 + && ((i.tm.extension_opcode | 1) == 0x5 + || i.tm.extension_opcode == 0x0))) + return (i.types[1].bitfield.class == Reg + || i.types[1].bitfield.instance == Accum); + + /* test, cmp with any register. */ + if ((i.tm.base_opcode | 1) == 0x85 + || (i.tm.base_opcode | 1) == 0xa9 + || ((i.tm.base_opcode | 1) == 0xf7 + && i.tm.extension_opcode == 0) + || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d) + || ((i.tm.base_opcode | 3) == 0x83 + && (i.tm.extension_opcode == 0x7))) + return (i.types[0].bitfield.class == Reg + || i.types[0].bitfield.instance == Accum + || i.types[1].bitfield.class == Reg + || i.types[1].bitfield.instance == Accum); + + /* inc, dec with any register. */ + if ((i.tm.cpu_flags.bitfield.cpuno64 + && (i.tm.base_opcode | 0xf) == 0x4f) + || ((i.tm.base_opcode | 1) == 0xff + && i.tm.extension_opcode <= 0x1)) + return (i.types[0].bitfield.class == Reg + || i.types[0].bitfield.instance == Accum); + + return 0; +} + +/* Return 1 if a FUSED_JCC_PADDING frag should be generated. */ + +static int +add_fused_jcc_padding_frag_p (void) +{ + /* NB: Don't work with COND_JUMP86 without i386. */ + if (!align_branch_power + || now_seg == absolute_section + || !cpu_arch_flags.bitfield.cpui386 + || !(align_branch & align_branch_fused_bit)) + return 0; + + if (maybe_fused_with_jcc_p ()) + { + if (last_insn.kind == last_insn_other + || last_insn.seg != now_seg) + return 1; + if (flag_debug) + as_warn_where (last_insn.file, last_insn.line, + _("`%s` skips -malign-branch-boundary on `%s`"), + last_insn.name, i.tm.name); + } + + return 0; +} + +/* Return 1 if a BRANCH_PREFIX frag should be generated. */ + +static int +add_branch_prefix_frag_p (void) +{ + /* NB: Don't work with COND_JUMP86 without i386. Don't add prefix + to PadLock instructions since they include prefixes in opcode. */ + if (!align_branch_power + || !align_branch_prefix_size + || now_seg == absolute_section + || i.tm.cpu_flags.bitfield.cpupadlock + || !cpu_arch_flags.bitfield.cpui386) + return 0; + + /* Don't add prefix if it is a prefix or there is no operand in case + that segment prefix is special. */ + if (!i.operands || i.tm.opcode_modifier.isprefix) + return 0; + + if (last_insn.kind == last_insn_other + || last_insn.seg != now_seg) + return 1; + + if (flag_debug) + as_warn_where (last_insn.file, last_insn.line, + _("`%s` skips -malign-branch-boundary on `%s`"), + last_insn.name, i.tm.name); + + return 0; +} + +/* Return 1 if a BRANCH_PADDING frag should be generated. */ + +static int +add_branch_padding_frag_p (enum align_branch_kind *branch_p) +{ + int add_padding; + + /* NB: Don't work with COND_JUMP86 without i386. */ + if (!align_branch_power + || now_seg == absolute_section + || !cpu_arch_flags.bitfield.cpui386) + return 0; + + add_padding = 0; + + /* Check for jcc and direct jmp. */ + if (i.tm.opcode_modifier.jump == JUMP) + { + if (i.tm.base_opcode == JUMP_PC_RELATIVE) + { + *branch_p = align_branch_jmp; + add_padding = align_branch & align_branch_jmp_bit; + } + else + { + *branch_p = align_branch_jcc; + if ((align_branch & align_branch_jcc_bit)) + add_padding = 1; + } + } + else if (is_any_vex_encoding (&i.tm)) + return 0; + else if ((i.tm.base_opcode | 1) == 0xc3) + { + /* Near ret. */ + *branch_p = align_branch_ret; + if ((align_branch & align_branch_ret_bit)) + add_padding = 1; + } + else + { + /* Check for indirect jmp, direct and indirect calls. */ + if (i.tm.base_opcode == 0xe8) + { + /* Direct call. */ + *branch_p = align_branch_call; + if ((align_branch & align_branch_call_bit)) + add_padding = 1; + } + else if (i.tm.base_opcode == 0xff + && (i.tm.extension_opcode == 2 + || i.tm.extension_opcode == 4)) + { + /* Indirect call and jmp. */ + *branch_p = align_branch_indirect; + if ((align_branch & align_branch_indirect_bit)) + add_padding = 1; + } + + if (add_padding + && i.disp_operands + && tls_get_addr + && (i.op[0].disps->X_op == O_symbol + || (i.op[0].disps->X_op == O_subtract + && i.op[0].disps->X_op_symbol == GOT_symbol))) + { + symbolS *s = i.op[0].disps->X_add_symbol; + /* No padding to call to global or undefined tls_get_addr. */ + if ((S_IS_EXTERNAL (s) || !S_IS_DEFINED (s)) + && strcmp (S_GET_NAME (s), tls_get_addr) == 0) + return 0; + } + } + + if (add_padding + && last_insn.kind != last_insn_other + && last_insn.seg == now_seg) + { + if (flag_debug) + as_warn_where (last_insn.file, last_insn.line, + _("`%s` skips -malign-branch-boundary on `%s`"), + last_insn.name, i.tm.name); + return 0; + } + + return add_padding; +} + static void output_insn (void) { fragS *insn_start_frag; offsetT insn_start_off; + fragS *fragP = NULL; + enum align_branch_kind branch = align_branch_none; #if defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF) if (IS_ELF && x86_used_note) @@ -8291,6 +8573,31 @@ output_insn (void) insn_start_frag = frag_now; insn_start_off = frag_now_fix (); + if (add_branch_padding_frag_p (&branch)) + { + char *p; + /* Branch can be 8 bytes. Leave some room for prefixes. */ + unsigned int max_branch_padding_size = 14; + + /* Align section to boundary. */ + record_alignment (now_seg, align_branch_power); + + /* Make room for padding. */ + frag_grow (max_branch_padding_size); + + /* Start of the padding. */ + p = frag_more (0); + + fragP = frag_now; + + frag_var (rs_machine_dependent, max_branch_padding_size, 0, + ENCODE_RELAX_STATE (BRANCH_PADDING, 0), + NULL, 0, p); + + fragP->tc_frag_data.branch_type = branch; + fragP->tc_frag_data.max_bytes = max_branch_padding_size; + } + /* Output jumps. */ if (i.tm.opcode_modifier.jump == JUMP) output_branch (); @@ -8329,6 +8636,41 @@ output_insn (void) i.prefix[LOCK_PREFIX] = 0; } + if (branch) + /* Skip if this is a branch. */ + ; + else if (add_fused_jcc_padding_frag_p ()) + { + /* Make room for padding. */ + frag_grow (MAX_FUSED_JCC_PADDING_SIZE); + p = frag_more (0); + + fragP = frag_now; + + frag_var (rs_machine_dependent, MAX_FUSED_JCC_PADDING_SIZE, 0, + ENCODE_RELAX_STATE (FUSED_JCC_PADDING, 0), + NULL, 0, p); + + fragP->tc_frag_data.branch_type = align_branch_fused; + fragP->tc_frag_data.max_bytes = MAX_FUSED_JCC_PADDING_SIZE; + } + else if (add_branch_prefix_frag_p ()) + { + unsigned int max_prefix_size = align_branch_prefix_size; + + /* Make room for padding. */ + frag_grow (max_prefix_size); + p = frag_more (0); + + fragP = frag_now; + + frag_var (rs_machine_dependent, max_prefix_size, 0, + ENCODE_RELAX_STATE (BRANCH_PREFIX, 0), + NULL, 0, p); + + fragP->tc_frag_data.max_bytes = max_prefix_size; + } + /* Since the VEX/EVEX prefix contains the implicit prefix, we don't need the explicit prefix. */ if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex) @@ -8476,9 +8818,105 @@ output_insn (void) if (j > 15) as_warn (_("instruction length of %u bytes exceeds the limit of 15"), j); + else if (fragP) + { + /* NB: Don't add prefix with GOTPC relocation since + output_disp() above depends on the fixed encoding + length. Can't add prefix with TLS relocation since + it breaks TLS linker optimization. */ + unsigned int max = i.has_gotpc_tls_reloc ? 0 : 15 - j; + /* Prefix count on the current instruction. */ + unsigned int count = i.vex.length; + unsigned int k; + for (k = 0; k < ARRAY_SIZE (i.prefix); k++) + /* REX byte is encoded in VEX/EVEX prefix. */ + if (i.prefix[k] && (k != REX_PREFIX || !i.vex.length)) + count++; + + /* Count prefixes for extended opcode maps. */ + if (!i.vex.length) + switch (i.tm.opcode_length) + { + case 3: + if (((i.tm.base_opcode >> 16) & 0xff) == 0xf) + { + count++; + switch ((i.tm.base_opcode >> 8) & 0xff) + { + case 0x38: + case 0x3a: + count++; + break; + default: + break; + } + } + break; + case 2: + if (((i.tm.base_opcode >> 8) & 0xff) == 0xf) + count++; + break; + case 1: + break; + default: + abort (); + } + + if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) + == BRANCH_PREFIX) + { + /* Set the maximum prefix size in BRANCH_PREFIX + frag. */ + if (fragP->tc_frag_data.max_bytes > max) + fragP->tc_frag_data.max_bytes = max; + if (fragP->tc_frag_data.max_bytes > count) + fragP->tc_frag_data.max_bytes -= count; + else + fragP->tc_frag_data.max_bytes = 0; + } + else + { + /* Remember the maximum prefix size in FUSED_JCC_PADDING + frag. */ + unsigned int max_prefix_size; + if (align_branch_prefix_size > max) + max_prefix_size = max; + else + max_prefix_size = align_branch_prefix_size; + if (max_prefix_size > count) + fragP->tc_frag_data.max_prefix_length + = max_prefix_size - count; + } + + /* Use existing segment prefix if possible. Use CS + segment prefix in 64-bit mode. In 32-bit mode, use SS + segment prefix with ESP/EBP base register and use DS + segment prefix without ESP/EBP base register. */ + if (i.prefix[SEG_PREFIX]) + fragP->tc_frag_data.default_prefix = i.prefix[SEG_PREFIX]; + else if (flag_code == CODE_64BIT) + fragP->tc_frag_data.default_prefix = CS_PREFIX_OPCODE; + else if (i.base_reg + && (i.base_reg->reg_num == 4 + || i.base_reg->reg_num == 5)) + fragP->tc_frag_data.default_prefix = SS_PREFIX_OPCODE; + else + fragP->tc_frag_data.default_prefix = DS_PREFIX_OPCODE; + } } } + /* NB: Don't work with COND_JUMP86 without i386. */ + if (align_branch_power + && now_seg != absolute_section + && cpu_arch_flags.bitfield.cpui386) + { + /* Terminate each frag so that we can add prefix and check for + fused jcc. */ + frag_wane (frag_now); + frag_new (0); + } + #ifdef DEBUG386 if (flag_debug) { @@ -8588,6 +9026,7 @@ output_disp (fragS *insn_start_frag, offsetT insn_start_off) if (!object_64bit) { reloc_type = BFD_RELOC_386_GOTPC; + i.has_gotpc_tls_reloc = TRUE; i.op[n].imms->X_add_number += encoding_length (insn_start_frag, insn_start_off, p); } @@ -8599,6 +9038,27 @@ output_disp (fragS *insn_start_frag, offsetT insn_start_off) insn, and that is taken care of in other code. */ reloc_type = BFD_RELOC_X86_64_GOTPC32; } + else if (align_branch_power) + { + switch (reloc_type) + { + case BFD_RELOC_386_TLS_GD: + case BFD_RELOC_386_TLS_LDM: + case BFD_RELOC_386_TLS_IE: + case BFD_RELOC_386_TLS_IE_32: + case BFD_RELOC_386_TLS_GOTIE: + case BFD_RELOC_386_TLS_GOTDESC: + case BFD_RELOC_386_TLS_DESC_CALL: + case BFD_RELOC_X86_64_TLSGD: + case BFD_RELOC_X86_64_TLSLD: + case BFD_RELOC_X86_64_GOTTPOFF: + case BFD_RELOC_X86_64_GOTPC32_TLSDESC: + case BFD_RELOC_X86_64_TLSDESC_CALL: + i.has_gotpc_tls_reloc = TRUE; + default: + break; + } + } fixP = fix_new_exp (frag_now, p - frag_now->fr_literal, size, i.op[n].disps, pcrel, reloc_type); @@ -8740,6 +9200,7 @@ output_imm (fragS *insn_start_frag, offsetT insn_start_off) reloc_type = BFD_RELOC_X86_64_GOTPC32; else if (size == 8) reloc_type = BFD_RELOC_X86_64_GOTPC64; + i.has_gotpc_tls_reloc = TRUE; i.op[n].imms->X_add_number += encoding_length (insn_start_frag, insn_start_off, p); } @@ -10365,6 +10826,362 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol, offsetT fr_var) } #endif +/* Return the next non-empty frag. */ + +static fragS * +i386_next_non_empty_frag (fragS *fragP) +{ + /* There may be a frag with a ".fill 0" when there is no room in + the current frag for frag_grow in output_insn. */ + for (fragP = fragP->fr_next; + (fragP != NULL + && fragP->fr_type == rs_fill + && fragP->fr_fix == 0); + fragP = fragP->fr_next) + ; + return fragP; +} + +/* Return the next jcc frag after BRANCH_PADDING. */ + +static fragS * +i386_next_jcc_frag (fragS *fragP) +{ + if (!fragP) + return NULL; + + if (fragP->fr_type == rs_machine_dependent + && (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) + == BRANCH_PADDING)) + { + fragP = i386_next_non_empty_frag (fragP); + if (fragP->fr_type != rs_machine_dependent) + return NULL; + if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == COND_JUMP) + return fragP; + } + + return NULL; +} + +/* Classify BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags. */ + +static void +i386_classify_machine_dependent_frag (fragS *fragP) +{ + fragS *cmp_fragP; + fragS *pad_fragP; + fragS *branch_fragP; + fragS *next_fragP; + unsigned int max_prefix_length; + + if (fragP->tc_frag_data.classified) + return; + + /* First scan for BRANCH_PADDING and FUSED_JCC_PADDING. Convert + FUSED_JCC_PADDING and merge BRANCH_PADDING. */ + for (next_fragP = fragP; + next_fragP != NULL; + next_fragP = next_fragP->fr_next) + { + next_fragP->tc_frag_data.classified = 1; + if (next_fragP->fr_type == rs_machine_dependent) + switch (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype)) + { + case BRANCH_PADDING: + /* The BRANCH_PADDING frag must be followed by a branch + frag. */ + branch_fragP = i386_next_non_empty_frag (next_fragP); + next_fragP->tc_frag_data.u.branch_fragP = branch_fragP; + break; + case FUSED_JCC_PADDING: + /* Check if this is a fused jcc: + FUSED_JCC_PADDING + CMP like instruction + BRANCH_PADDING + COND_JUMP + */ + cmp_fragP = i386_next_non_empty_frag (next_fragP); + pad_fragP = i386_next_non_empty_frag (cmp_fragP); + branch_fragP = i386_next_jcc_frag (pad_fragP); + if (branch_fragP) + { + /* The BRANCH_PADDING frag is merged with the + FUSED_JCC_PADDING frag. */ + next_fragP->tc_frag_data.u.branch_fragP = branch_fragP; + /* CMP like instruction size. */ + next_fragP->tc_frag_data.cmp_size = cmp_fragP->fr_fix; + frag_wane (pad_fragP); + /* Skip to branch_fragP. */ + next_fragP = branch_fragP; + } + else if (next_fragP->tc_frag_data.max_prefix_length) + { + /* Turn FUSED_JCC_PADDING into BRANCH_PREFIX if it isn't + a fused jcc. */ + next_fragP->fr_subtype + = ENCODE_RELAX_STATE (BRANCH_PREFIX, 0); + next_fragP->tc_frag_data.max_bytes + = next_fragP->tc_frag_data.max_prefix_length; + /* This will be updated in the BRANCH_PREFIX scan. */ + next_fragP->tc_frag_data.max_prefix_length = 0; + } + else + frag_wane (next_fragP); + break; + } + } + + /* Stop if there is no BRANCH_PREFIX. */ + if (!align_branch_prefix_size) + return; + + /* Scan for BRANCH_PREFIX. */ + for (; fragP != NULL; fragP = fragP->fr_next) + { + if (fragP->fr_type != rs_machine_dependent + || (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) + != BRANCH_PREFIX)) + continue; + + /* Count all BRANCH_PREFIX frags before BRANCH_PADDING and + COND_JUMP_PREFIX. */ + max_prefix_length = 0; + for (next_fragP = fragP; + next_fragP != NULL; + next_fragP = next_fragP->fr_next) + { + if (next_fragP->fr_type == rs_fill) + /* Skip rs_fill frags. */ + continue; + else if (next_fragP->fr_type != rs_machine_dependent) + /* Stop for all other frags. */ + break; + + /* rs_machine_dependent frags. */ + if (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype) + == BRANCH_PREFIX) + { + /* Count BRANCH_PREFIX frags. */ + if (max_prefix_length >= MAX_FUSED_JCC_PADDING_SIZE) + { + max_prefix_length = MAX_FUSED_JCC_PADDING_SIZE; + frag_wane (next_fragP); + } + else + max_prefix_length + += next_fragP->tc_frag_data.max_bytes; + } + else if ((TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype) + == BRANCH_PADDING) + || (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype) + == FUSED_JCC_PADDING)) + { + /* Stop at BRANCH_PADDING and FUSED_JCC_PADDING. */ + fragP->tc_frag_data.u.padding_fragP = next_fragP; + break; + } + else + /* Stop for other rs_machine_dependent frags. */ + break; + } + + fragP->tc_frag_data.max_prefix_length = max_prefix_length; + + /* Skip to the next frag. */ + fragP = next_fragP; + } +} + +/* Compute padding size for + + FUSED_JCC_PADDING + CMP like instruction + BRANCH_PADDING + COND_JUMP/UNCOND_JUMP + + or + + BRANCH_PADDING + COND_JUMP/UNCOND_JUMP + */ + +static int +i386_branch_padding_size (fragS *fragP, offsetT address) +{ + unsigned int offset, size, padding_size; + fragS *branch_fragP = fragP->tc_frag_data.u.branch_fragP; + + /* The start address of the BRANCH_PADDING or FUSED_JCC_PADDING frag. */ + if (!address) + address = fragP->fr_address; + address += fragP->fr_fix; + + /* CMP like instrunction size. */ + size = fragP->tc_frag_data.cmp_size; + + /* The base size of the branch frag. */ + size += branch_fragP->fr_fix; + + /* Add opcode and displacement bytes for the rs_machine_dependent + branch frag. */ + if (branch_fragP->fr_type == rs_machine_dependent) + size += md_relax_table[branch_fragP->fr_subtype].rlx_length; + + /* Check if branch is within boundary and doesn't end at the last + byte. */ + offset = address & ((1U << align_branch_power) - 1); + if ((offset + size) >= (1U << align_branch_power)) + /* Padding needed to avoid crossing boundary. */ + padding_size = (1U << align_branch_power) - offset; + else + /* No padding needed. */ + padding_size = 0; + + /* The return value may be saved in tc_frag_data.length which is + unsigned byte. */ + if (!fits_in_unsigned_byte (padding_size)) + abort (); + + return padding_size; +} + +/* i386_generic_table_relax_frag() + + Handle BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags to + grow/shrink padding to align branch frags. Hand others to + relax_frag(). */ + +long +i386_generic_table_relax_frag (segT segment, fragS *fragP, long stretch) +{ + if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING + || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING) + { + long padding_size = i386_branch_padding_size (fragP, 0); + long grow = padding_size - fragP->tc_frag_data.length; + + /* When the BRANCH_PREFIX frag is used, the computed address + must match the actual address and there should be no padding. */ + if (fragP->tc_frag_data.padding_address + && (fragP->tc_frag_data.padding_address != fragP->fr_address + || padding_size)) + abort (); + + /* Update the padding size. */ + if (grow) + fragP->tc_frag_data.length = padding_size; + + return grow; + } + else if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PREFIX) + { + fragS *padding_fragP, *next_fragP; + long padding_size, left_size, last_size; + + padding_fragP = fragP->tc_frag_data.u.padding_fragP; + if (!padding_fragP) + /* Use the padding set by the leading BRANCH_PREFIX frag. */ + return (fragP->tc_frag_data.length + - fragP->tc_frag_data.last_length); + + /* Compute the relative address of the padding frag in the very + first time where the BRANCH_PREFIX frag sizes are zero. */ + if (!fragP->tc_frag_data.padding_address) + fragP->tc_frag_data.padding_address + = padding_fragP->fr_address - (fragP->fr_address - stretch); + + /* First update the last length from the previous interation. */ + left_size = fragP->tc_frag_data.prefix_length; + for (next_fragP = fragP; + next_fragP != padding_fragP; + next_fragP = next_fragP->fr_next) + if (next_fragP->fr_type == rs_machine_dependent + && (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype) + == BRANCH_PREFIX)) + { + if (left_size) + { + int max = next_fragP->tc_frag_data.max_bytes; + if (max) + { + int size; + if (max > left_size) + size = left_size; + else + size = max; + left_size -= size; + next_fragP->tc_frag_data.last_length = size; + } + } + else + next_fragP->tc_frag_data.last_length = 0; + } + + /* Check the padding size for the padding frag. */ + padding_size = i386_branch_padding_size + (padding_fragP, (fragP->fr_address + + fragP->tc_frag_data.padding_address)); + + last_size = fragP->tc_frag_data.prefix_length; + /* Check if there is change from the last interation. */ + if (padding_size == last_size) + { + /* Update the expected address of the padding frag. */ + padding_fragP->tc_frag_data.padding_address + = (fragP->fr_address + padding_size + + fragP->tc_frag_data.padding_address); + return 0; + } + + if (padding_size > fragP->tc_frag_data.max_prefix_length) + { + /* No padding if there is no sufficient room. Clear the + expected address of the padding frag. */ + padding_fragP->tc_frag_data.padding_address = 0; + padding_size = 0; + } + else + /* Store the expected address of the padding frag. */ + padding_fragP->tc_frag_data.padding_address + = (fragP->fr_address + padding_size + + fragP->tc_frag_data.padding_address); + + fragP->tc_frag_data.prefix_length = padding_size; + + /* Update the length for the current interation. */ + left_size = padding_size; + for (next_fragP = fragP; + next_fragP != padding_fragP; + next_fragP = next_fragP->fr_next) + if (next_fragP->fr_type == rs_machine_dependent + && (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype) + == BRANCH_PREFIX)) + { + if (left_size) + { + int max = next_fragP->tc_frag_data.max_bytes; + if (max) + { + int size; + if (max > left_size) + size = left_size; + else + size = max; + left_size -= size; + next_fragP->tc_frag_data.length = size; + } + } + else + next_fragP->tc_frag_data.length = 0; + } + + return (fragP->tc_frag_data.length + - fragP->tc_frag_data.last_length); + } + return relax_frag (segment, fragP, stretch); +} + /* md_estimate_size_before_relax() Called just before relax() for rs_machine_dependent frags. The x86 @@ -10381,6 +11198,14 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol, offsetT fr_var) int md_estimate_size_before_relax (fragS *fragP, segT segment) { + if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING + || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PREFIX + || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING) + { + i386_classify_machine_dependent_frag (fragP); + return fragP->tc_frag_data.length; + } + /* We've already got fragP->fr_subtype right; all we have to do is check for un-relaxable symbols. On an ELF system, we can't relax an externally visible symbol, because it may be overridden by a @@ -10514,6 +11339,106 @@ md_convert_frag (bfd *abfd ATTRIBUTE_UNUSED, segT sec ATTRIBUTE_UNUSED, unsigned int extension = 0; offsetT displacement_from_opcode_start; + if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING + || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING + || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PREFIX) + { + /* Generate nop padding. */ + unsigned int size = fragP->tc_frag_data.length; + if (size) + { + if (size > fragP->tc_frag_data.max_bytes) + abort (); + + if (flag_debug) + { + const char *msg; + const char *branch = "branch"; + const char *prefix = ""; + fragS *padding_fragP; + if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) + == BRANCH_PREFIX) + { + padding_fragP = fragP->tc_frag_data.u.padding_fragP; + switch (fragP->tc_frag_data.default_prefix) + { + default: + abort (); + break; + case CS_PREFIX_OPCODE: + prefix = " cs"; + break; + case DS_PREFIX_OPCODE: + prefix = " ds"; + break; + case ES_PREFIX_OPCODE: + prefix = " es"; + break; + case FS_PREFIX_OPCODE: + prefix = " fs"; + break; + case GS_PREFIX_OPCODE: + prefix = " gs"; + break; + case SS_PREFIX_OPCODE: + prefix = " ss"; + break; + } + if (padding_fragP) + msg = _("%s:%u: add %d%s at 0x%llx to align " + "%s within %d-byte boundary\n"); + else + msg = _("%s:%u: add additional %d%s at 0x%llx to " + "align %s within %d-byte boundary\n"); + } + else + { + padding_fragP = fragP; + msg = _("%s:%u: add %d%s-byte nop at 0x%llx to align " + "%s within %d-byte boundary\n"); + } + + if (padding_fragP) + switch (padding_fragP->tc_frag_data.branch_type) + { + case align_branch_jcc: + branch = "jcc"; + break; + case align_branch_fused: + branch = "fused jcc"; + break; + case align_branch_jmp: + branch = "jmp"; + break; + case align_branch_call: + branch = "call"; + break; + case align_branch_indirect: + branch = "indiret branch"; + break; + case align_branch_ret: + branch = "ret"; + break; + default: + break; + } + + fprintf (stdout, msg, + fragP->fr_file, fragP->fr_line, size, prefix, + (long long) fragP->fr_address, branch, + 1 << align_branch_power); + } + if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PREFIX) + memset (fragP->fr_opcode, + fragP->tc_frag_data.default_prefix, size); + else + i386_generate_nops (fragP, (char *) fragP->fr_opcode, + size, 0); + fragP->fr_fix += size; + } + return; + } + opcode = (unsigned char *) fragP->fr_opcode; /* Address we want to reach in file space. */ @@ -11072,6 +11997,9 @@ const char *md_shortopts = "qnO::"; #define OPTION_MFENCE_AS_LOCK_ADD (OPTION_MD_BASE + 24) #define OPTION_X86_USED_NOTE (OPTION_MD_BASE + 25) #define OPTION_MVEXWIG (OPTION_MD_BASE + 26) +#define OPTION_MALIGN_BRANCH_BOUNDARY (OPTION_MD_BASE + 27) +#define OPTION_MALIGN_BRANCH_PREFIX_SIZE (OPTION_MD_BASE + 28) +#define OPTION_MALIGN_BRANCH (OPTION_MD_BASE + 29) struct option md_longopts[] = { @@ -11107,6 +12035,9 @@ struct option md_longopts[] = {"mfence-as-lock-add", required_argument, NULL, OPTION_MFENCE_AS_LOCK_ADD}, {"mrelax-relocations", required_argument, NULL, OPTION_MRELAX_RELOCATIONS}, {"mevexrcig", required_argument, NULL, OPTION_MEVEXRCIG}, + {"malign-branch-boundary", required_argument, NULL, OPTION_MALIGN_BRANCH_BOUNDARY}, + {"malign-branch-prefix-size", required_argument, NULL, OPTION_MALIGN_BRANCH_PREFIX_SIZE}, + {"malign-branch", required_argument, NULL, OPTION_MALIGN_BRANCH}, {"mamd64", no_argument, NULL, OPTION_MAMD64}, {"mintel64", no_argument, NULL, OPTION_MINTEL64}, {NULL, no_argument, NULL, 0} @@ -11117,7 +12048,7 @@ int md_parse_option (int c, const char *arg) { unsigned int j; - char *arch, *next, *saved; + char *arch, *next, *saved, *type; switch (c) { @@ -11495,6 +12426,80 @@ md_parse_option (int c, const char *arg) as_fatal (_("invalid -mrelax-relocations= option: `%s'"), arg); break; + case OPTION_MALIGN_BRANCH_BOUNDARY: + { + char *end; + long int align = strtoul (arg, &end, 0); + if (*end == '\0') + { + if (align == 0) + { + align_branch_power = 0; + break; + } + else if (align >= 16) + { + int align_power; + for (align_power = 0; + (align & 1) == 0; + align >>= 1, align_power++) + continue; + /* Limit alignment power to 31. */ + if (align == 1 && align_power < 32) + { + align_branch_power = align_power; + break; + } + } + } + as_fatal (_("invalid -malign-branch-boundary= value: %s"), arg); + } + break; + + case OPTION_MALIGN_BRANCH_PREFIX_SIZE: + { + char *end; + int align = strtoul (arg, &end, 0); + /* Some processors only support 5 prefixes. */ + if (*end == '\0' && align >= 0 && align < 6) + { + align_branch_prefix_size = align; + break; + } + as_fatal (_("invalid -malign-branch-prefix-size= value: %s"), + arg); + } + break; + + case OPTION_MALIGN_BRANCH: + align_branch = 0; + saved = xstrdup (arg); + type = saved; + do + { + next = strchr (type, '+'); + if (next) + *next++ = '\0'; + if (strcasecmp (type, "jcc") == 0) + align_branch |= align_branch_jcc_bit; + else if (strcasecmp (type, "fused") == 0) + align_branch |= align_branch_fused_bit; + else if (strcasecmp (type, "jmp") == 0) + align_branch |= align_branch_jmp_bit; + else if (strcasecmp (type, "call") == 0) + align_branch |= align_branch_call_bit; + else if (strcasecmp (type, "ret") == 0) + align_branch |= align_branch_ret_bit; + else if (strcasecmp (type, "indirect") == 0) + align_branch |= align_branch_indirect_bit; + else + as_fatal (_("invalid -malign-branch= option: `%s'"), arg); + type = next; + } + while (next != NULL); + free (saved); + break; + case OPTION_MAMD64: intel64 = 0; break; @@ -11747,6 +12752,17 @@ md_show_usage (FILE *stream) fprintf (stream, _("\ generate relax relocations\n")); fprintf (stream, _("\ + -malign-branch-boundary=NUM (default: 0)\n\ + align branches within NUM byte boundary\n")); + fprintf (stream, _("\ + -malign-branch=TYPE[+TYPE...] (default: jcc+fused+jmp)\n\ + TYPE is combination of jcc, fused, jmp, call, ret,\n\ + indirect\n\ + specify types of branches to align\n")); + fprintf (stream, _("\ + -malign-branch-prefix-size=NUM (default: 5)\n\ + align branches with NUM prefixes per instruction\n")); + fprintf (stream, _("\ -mamd64 accept only AMD64 ISA [default]\n")); fprintf (stream, _("\ -mintel64 accept only Intel64 ISA\n")); @@ -11830,15 +12846,24 @@ i386_target_format (void) { default: format = ELF_TARGET_FORMAT; +#ifndef TE_SOLARIS + tls_get_addr = "___tls_get_addr"; +#endif break; case X86_64_ABI: use_rela_relocations = 1; object_64bit = 1; +#ifndef TE_SOLARIS + tls_get_addr = "__tls_get_addr"; +#endif format = ELF_TARGET_FORMAT64; break; case X86_64_X32_ABI: use_rela_relocations = 1; object_64bit = 1; +#ifndef TE_SOLARIS + tls_get_addr = "__tls_get_addr"; +#endif disallow_64bit_reloc = 1; format = ELF_TARGET_FORMAT32; break; @@ -11955,6 +12980,21 @@ s_bss (int ignore ATTRIBUTE_UNUSED) #endif +/* Remember constant directive. */ + +void +i386_cons_align (int ignore ATTRIBUTE_UNUSED) +{ + if (last_insn.kind != last_insn_directive + && (bfd_section_flags (now_seg) & SEC_CODE)) + { + last_insn.seg = now_seg; + last_insn.kind = last_insn_directive; + last_insn.name = "constant directive"; + last_insn.file = as_where (&last_insn.line); + } +} + void i386_validate_fix (fixS *fixp) { diff --git a/gas/config/tc-i386.h b/gas/config/tc-i386.h index b02a25671f..28f03cf2db 100644 --- a/gas/config/tc-i386.h +++ b/gas/config/tc-i386.h @@ -210,12 +210,19 @@ if ((n) \ #define MAX_MEM_FOR_RS_ALIGN_CODE (alignment ? ((1 << alignment) - 1) : 1) +extern void i386_cons_align (int); +#define md_cons_align(nbytes) i386_cons_align (nbytes) + void i386_print_statistics (FILE *); #define tc_print_statistics i386_print_statistics extern unsigned int i386_frag_max_var (fragS *); #define md_frag_max_var i386_frag_max_var +extern long i386_generic_table_relax_frag (segT, fragS *, long); +#define md_generic_table_relax_frag(segment, fragP, stretch) \ + i386_generic_table_relax_frag (segment, fragP, stretch) + #define md_number_to_chars number_to_chars_littleendian enum processor_type @@ -250,10 +257,24 @@ extern i386_cpu_flags cpu_arch_isa_flags; struct i386_tc_frag_data { + union + { + fragS *padding_fragP; + fragS *branch_fragP; + } u; + addressT padding_address; enum processor_type isa; i386_cpu_flags isa_flags; enum processor_type tune; unsigned int max_bytes; + unsigned char length; + unsigned char last_length; + unsigned char max_prefix_length; + unsigned char prefix_length; + unsigned char default_prefix; + unsigned char cmp_size; + unsigned int classified : 1; + unsigned int branch_type : 3; }; /* We need to emit the right NOP pattern in .align frags. This is @@ -264,10 +285,20 @@ struct i386_tc_frag_data #define TC_FRAG_INIT(FRAGP, MAX_BYTES) \ do \ { \ + (FRAGP)->tc_frag_data.u.padding_fragP = NULL; \ + (FRAGP)->tc_frag_data.padding_address = 0; \ (FRAGP)->tc_frag_data.isa = cpu_arch_isa; \ (FRAGP)->tc_frag_data.isa_flags = cpu_arch_isa_flags; \ (FRAGP)->tc_frag_data.tune = cpu_arch_tune; \ (FRAGP)->tc_frag_data.max_bytes = (MAX_BYTES); \ + (FRAGP)->tc_frag_data.length = 0; \ + (FRAGP)->tc_frag_data.last_length = 0; \ + (FRAGP)->tc_frag_data.max_prefix_length = 0; \ + (FRAGP)->tc_frag_data.prefix_length = 0; \ + (FRAGP)->tc_frag_data.default_prefix = 0; \ + (FRAGP)->tc_frag_data.cmp_size = 0; \ + (FRAGP)->tc_frag_data.classified = 0; \ + (FRAGP)->tc_frag_data.branch_type = 0; \ } \ while (0) diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi index 589b4260f0..74296e61f6 100644 --- a/gas/doc/c-i386.texi +++ b/gas/doc/c-i386.texi @@ -423,6 +423,32 @@ R_X86_64_REX_GOTPCRELX, in 64-bit mode. relocations. The default can be controlled by a configure option @option{--enable-x86-relax-relocations}. +@cindex @samp{-malign-branch-boundary=} option, i386 +@cindex @samp{-malign-branch-boundary=} option, x86-64 +@item -malign-branch-boundary=@var{NUM} +This option controls how the assembler should align branches with segment +prefixes or NOP. @var{NUM} must be a power of 2. It should be 0 or +no less than 16. Branches will be aligned within @var{NUM} byte +boundary. @option{-malign-branch-boundary=0}, which is the default, +doesn't align branches. + +@cindex @samp{-malign-branch=} option, i386 +@cindex @samp{-malign-branch=} option, x86-64 +@item -malign-branch=@var{TYPE}[+@var{TYPE}...] +This option specifies types of branches to align. @var{TYPE} is +combination of @samp{jcc}, which aligns conditional jumps, +@samp{fused}, which aligns fused conditional jumps, @samp{jmp}, +which aligns unconditional jumps, @samp{call} which aligns calls, +@samp{ret}, which aligns rets, @samp{indirect}, which aligns indirect +jumps and calls. The default is @option{-malign-branch=jcc+fused+jmp}. + +@cindex @samp{-malign-branch-prefix-size=} option, i386 +@cindex @samp{-malign-branch-prefix-size=} option, x86-64 +@item -malign-branch-prefix-size=@var{NUM} +This option specifies the maximum number of prefixes on an instruction +to align branches. @var{NUM} should be between 0 and 5. The default +@var{NUM} is 5. + @cindex @samp{-mx86-used-note=} option, i386 @cindex @samp{-mx86-used-note=} option, x86-64 @item -mx86-used-note=@var{no} -- 2.34.1