quick benchmark comparison of gcc compilation flags
Posted: 2012-03-09T19:07:52-07:00
Here is a quick and dirty comparison of various compile ImageMagick
options, with some very shaky conclusions (only one task was used for
the comparison).
-------
Summary
-------
On a 6 month old mid-line consumer laptop with bleeding edge
ImageMagick takes clear under 2 seconds to produce a 40x40 thumbnail
from a 4000x3000 jpeg (using -distort resize, which is not as fast as
plain -resize, but which generally gives better looking results).
------
Caveat
------
I have not tested older versions of ImageMagick. The tests were not
done on a perfectly quiet system. My test machine only has 2 cores, so
I cannot really check "multi-core parallel scaling".
--------
Hardware
--------
ThinkPad T520i
Intel Core i3-2310M CPU @ 2.10GHz
4GiB SODIMM DDR3 Synchronous 1333 MHz (0.8 ns)
ATA Disk HITACHI HTS72323 buffered disk reads: 91.46 MB/sec
L1 cache: 64KiB
L2 cache: 256KiB
L3 cache: 3MiB
--------
Software
--------
Linux Mint 12 (= Ubuntu 11.11 = Debian Unstable ???) with all updates.
Jupiter power management set to maximum performance.
All relevant software straight from the package manager except ImageMagick.
gcc version: 4.6.1
-----------
ImageMagick
-----------
Version 7.0.0, the most current svn (bleeding edge), compiled from source.
----
Task
----
Produce a 40x40 thumbnail directly from big.jpg, a 4000x3000 sRGB jpeg with
file size 1.3MB.
Command:
time ( convert big.jpg -filter Robidoux -distort resize 40x40^ -gravity center -extent 40x40 -quality 75 -sampling-factor 2x2 thumb.jpg )
(Some of the flags are not needed because they are the default. I like explicit.)
Real time is reported below.
Comment: Using -crop instead of -extent adds something like 1% to the
run time. It may make more of a difference with larger images. -extent
is recommended over -crop here:
http://www.imagemagick.org/Usage/resize/#fill
-------------------------------------
Results with a 1.3MB JPEG (4000x3000)
-------------------------------------
1.764 seconds with plain vanilla config/compilation: OpenMP, Q16, no arch:
./configure
1.754 seconds with -march=native:
./configure --with-gcc-arch=native
or
CFLAGS="-march=native" ./configure
1.761 seconds with -march=native, debugging (-g) off, adding -fomit-frame-pointer:
./configure CFLAGS="-fopenmp -fomit-frame-pointer -O2 -Wall -march=native -pthread" CXXFLAGS="-O2 -pthread"
(P.S. With more careful tests, I found out that removing -g and adding -fomit-frame-pointer speeds things up by about 2%.)
2.931 seconds with OpenMP turned off:
./configure --disable-openmp
2.907 seconds with -march=native and OpenMP turned off:
./configure --with-gcc-arch=native --disable-openmp
(-fomit-frame-pointer is suggested here: http://www.gentoo.org/doc/en/gcc-optimization.xml)
(Warning: make clean before make if you want your new configure parameters to stick: viewtopic.php?f=3&t=20496.)
-----------------
Results with HDRI
-----------------
2.026 seconds with -march=native, debugging (-g) off, adding -fomit-frame-pointer:
./configure CFLAGS="-fopenmp -fomit-frame-pointer -O2 -Wall -march=native -pthread" CXXFLAGS="-O2 -pthread"
-----------------------------------
Results with a 2MB JPEG (4000x3000)
-----------------------------------
1.776 seconds with -march=native, debugging (-g) off, adding -fomit-frame-pointer:
./configure CFLAGS="-fopenmp -fomit-frame-pointer -O2 -Wall -march=native -pthread" CXXFLAGS="-O2 -pthread"
-------------
"Conclusions"
-------------
(Don't quote me on this.)
The input file size does not matter very much, it's mostly the size in pixels.
Fancy compile flags do not make much of a difference.
HDRI does not cost much more, even with large images.
At least on a fairly quiet 2-core machine, OpenMP wins.
options, with some very shaky conclusions (only one task was used for
the comparison).
-------
Summary
-------
On a 6 month old mid-line consumer laptop with bleeding edge
ImageMagick takes clear under 2 seconds to produce a 40x40 thumbnail
from a 4000x3000 jpeg (using -distort resize, which is not as fast as
plain -resize, but which generally gives better looking results).
------
Caveat
------
I have not tested older versions of ImageMagick. The tests were not
done on a perfectly quiet system. My test machine only has 2 cores, so
I cannot really check "multi-core parallel scaling".
--------
Hardware
--------
ThinkPad T520i
Intel Core i3-2310M CPU @ 2.10GHz
4GiB SODIMM DDR3 Synchronous 1333 MHz (0.8 ns)
ATA Disk HITACHI HTS72323 buffered disk reads: 91.46 MB/sec
L1 cache: 64KiB
L2 cache: 256KiB
L3 cache: 3MiB
--------
Software
--------
Linux Mint 12 (= Ubuntu 11.11 = Debian Unstable ???) with all updates.
Jupiter power management set to maximum performance.
All relevant software straight from the package manager except ImageMagick.
gcc version: 4.6.1
-----------
ImageMagick
-----------
Version 7.0.0, the most current svn (bleeding edge), compiled from source.
----
Task
----
Produce a 40x40 thumbnail directly from big.jpg, a 4000x3000 sRGB jpeg with
file size 1.3MB.
Command:
time ( convert big.jpg -filter Robidoux -distort resize 40x40^ -gravity center -extent 40x40 -quality 75 -sampling-factor 2x2 thumb.jpg )
(Some of the flags are not needed because they are the default. I like explicit.)
Real time is reported below.
Comment: Using -crop instead of -extent adds something like 1% to the
run time. It may make more of a difference with larger images. -extent
is recommended over -crop here:
http://www.imagemagick.org/Usage/resize/#fill
-------------------------------------
Results with a 1.3MB JPEG (4000x3000)
-------------------------------------
1.764 seconds with plain vanilla config/compilation: OpenMP, Q16, no arch:
./configure
1.754 seconds with -march=native:
./configure --with-gcc-arch=native
or
CFLAGS="-march=native" ./configure
1.761 seconds with -march=native, debugging (-g) off, adding -fomit-frame-pointer:
./configure CFLAGS="-fopenmp -fomit-frame-pointer -O2 -Wall -march=native -pthread" CXXFLAGS="-O2 -pthread"
(P.S. With more careful tests, I found out that removing -g and adding -fomit-frame-pointer speeds things up by about 2%.)
2.931 seconds with OpenMP turned off:
./configure --disable-openmp
2.907 seconds with -march=native and OpenMP turned off:
./configure --with-gcc-arch=native --disable-openmp
(-fomit-frame-pointer is suggested here: http://www.gentoo.org/doc/en/gcc-optimization.xml)
(Warning: make clean before make if you want your new configure parameters to stick: viewtopic.php?f=3&t=20496.)
-----------------
Results with HDRI
-----------------
2.026 seconds with -march=native, debugging (-g) off, adding -fomit-frame-pointer:
./configure CFLAGS="-fopenmp -fomit-frame-pointer -O2 -Wall -march=native -pthread" CXXFLAGS="-O2 -pthread"
-----------------------------------
Results with a 2MB JPEG (4000x3000)
-----------------------------------
1.776 seconds with -march=native, debugging (-g) off, adding -fomit-frame-pointer:
./configure CFLAGS="-fopenmp -fomit-frame-pointer -O2 -Wall -march=native -pthread" CXXFLAGS="-O2 -pthread"
-------------
"Conclusions"
-------------
(Don't quote me on this.)
The input file size does not matter very much, it's mostly the size in pixels.
Fancy compile flags do not make much of a difference.
HDRI does not cost much more, even with large images.
At least on a fairly quiet 2-core machine, OpenMP wins.