-
Notifications
You must be signed in to change notification settings - Fork 58.4k
correct rtl_op_set_tim and rtl_update_beacon implementation missing #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
torvalds
pushed a commit
that referenced
this pull request
Nov 12, 2013
As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.644008] smpboot: Booting Node 1, Processors: #8 #9 #10 #11 #12 #13 #14 #15 OK [ 1.245006] smpboot: Booting Node 2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK [ 1.864005] smpboot: Booting Node 3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <[email protected]> Cc: Libin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
torvalds
pushed a commit
that referenced
this pull request
Nov 12, 2013
Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15 [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23 [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
May 19, 2014
also fix spellos WARNING: line over 80 characters torvalds#54: FILE: mm/rmap.c:988: + * pte lock(a spinlock) is held, which implies preemtion disabled. total: 0 errors, 1 warnings, 45 lines checked ./patches/mm-add-comment-for-__mod_zone_page_stat.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Hugh Dickins <[email protected]> Cc: Jianyu Zhan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
May 20, 2014
also fix spellos WARNING: line over 80 characters torvalds#54: FILE: mm/rmap.c:988: + * pte lock(a spinlock) is held, which implies preemtion disabled. total: 0 errors, 1 warnings, 45 lines checked ./patches/mm-add-comment-for-__mod_zone_page_stat.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Hugh Dickins <[email protected]> Cc: Jianyu Zhan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
shr-buildhost
pushed a commit
to shr-distribution/linux
that referenced
this pull request
May 25, 2014
If an edge wake up interrupt happens during suspend path, the flow handler will mask it and set as pending. This lead to suspend sequence abort when the check for all pending edge interrupt happens. Do not disable the interrupt (wake up interrupt to CPU0 in krait SS, torvalds#54) during suspend. CRs-fixed: 583318 Change-Id: I3a5afec98d445da2dfc1f16fc20d9f3e96acc17f Signed-off-by: Murali Nalajala <[email protected]>
tom3q
pushed a commit
to tom3q/linux
that referenced
this pull request
May 26, 2014
also fix spellos WARNING: line over 80 characters torvalds#54: FILE: mm/rmap.c:988: + * pte lock(a spinlock) is held, which implies preemtion disabled. total: 0 errors, 1 warnings, 45 lines checked ./patches/mm-add-comment-for-__mod_zone_page_stat.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Hugh Dickins <[email protected]> Cc: Jianyu Zhan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
May 28, 2014
also fix spellos WARNING: line over 80 characters torvalds#54: FILE: mm/rmap.c:988: + * pte lock(a spinlock) is held, which implies preemtion disabled. total: 0 errors, 1 warnings, 45 lines checked ./patches/mm-add-comment-for-__mod_zone_page_stat.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Hugh Dickins <[email protected]> Cc: Jianyu Zhan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Jun 23, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mina86 mina86 230616126 Jun 17 15:39 vmlinux.before*
-rwx------ 1 mina86 mina86 230614861 Jun 17 14:36 vmlinux.after*
1265 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Gnurou
pushed a commit
to Gnurou/linux
that referenced
this pull request
Jun 27, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
JoonsooKim
pushed a commit
to JoonsooKim/linux
that referenced
this pull request
Jul 4, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
Jul 25, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
aryabinin
pushed a commit
to aryabinin/linux
that referenced
this pull request
Aug 14, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
andy-shev
pushed a commit
to andy-shev/linux
that referenced
this pull request
Aug 26, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
aryabinin
pushed a commit
to aryabinin/linux
that referenced
this pull request
Sep 3, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Sep 3, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
andy-shev
pushed a commit
to andy-shev/linux
that referenced
this pull request
Sep 5, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
aryabinin
pushed a commit
to aryabinin/linux
that referenced
this pull request
Sep 10, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Rustad, Mark D" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
antonblanchard
pushed a commit
to antonblanchard/linux
that referenced
this pull request
Sep 23, 2014
Firmware-assisted dump (fadump) kernel code is not LE compliant. The below patch tries to fix this issue. Tested this patch with upstream kernel. Did some sanity testing for the LE fadump vmcore generated. Below output shows crash tool successfully opening LE fadump vmcore. # crash $vmlinux vmcore crash 7.0.5 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. crash: /boot/vmlinux-3.16.0-rc7-7-default+: no .gnu_debuglink section GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-unknown-linux-gnu"... KERNEL: /boot/vmlinux-3.16.0-rc7-7-default+ DUMPFILE: vmcore CPUS: 16 DATE: Sun Aug 24 14:31:28 2014 UPTIME: 00:02:57 LOAD AVERAGE: 0.05, 0.08, 0.04 TASKS: 256 NODENAME: linux-dhr2 RELEASE: 3.16.0-rc7-7-default+ VERSION: torvalds#54 SMP Mon Aug 18 14:08:23 EDT 2014 MACHINE: ppc64le (4116 Mhz) MEMORY: 40 GB PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details) PID: 2234 COMMAND: "bash" TASK: c0000009652e4a30 [THREAD_INFO: c00000096777c000] CPU: 2 STATE: TASK_RUNNING (PANIC) crash> Signed-off-by: Hari Bathini <[email protected]> Reviewed-by: Mahesh Salgaonkar <[email protected]>
koct9i
pushed a commit
to koct9i/linux
that referenced
this pull request
Sep 23, 2014
It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:
$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })
int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }
$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.
Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:
$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }
$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn torvalds#54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after
530 bytes reduction.
Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Hagen Paul Pfeifer <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Rustad, Mark D" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
patch found at 'http://www.spinics.net/lists/linux-wireless/msg96128.html'