| This is a patched version of zlib, modified to use |
| Pentium-Pro-optimized assembly code in the deflation algorithm. The |
| files changed/added by this patch are: |
| |
| README.686 |
| match.S |
| |
| The speedup that this patch provides varies, depending on whether the |
| compiler used to build the original version of zlib falls afoul of the |
| PPro's speed traps. My own tests show a speedup of around 10-20% at |
| the default compression level, and 20-30% using -9, against a version |
| compiled using gcc 2.7.2.3. Your mileage may vary. |
| |
| Note that this code has been tailored for the PPro/PII in particular, |
| and will not perform particuarly well on a Pentium. |
| |
| If you are using an assembler other than GNU as, you will have to |
| translate match.S to use your assembler's syntax. (Have fun.) |
| |
| Brian Raiter |
| breadbox@muppetlabs.com |
| April, 1998 |
| |
| |
| Added for zlib 1.1.3: |
| |
| The patches come from |
| http://www.muppetlabs.com/~breadbox/software/assembly.html |
| |
| To compile zlib with this asm file, copy match.S to the zlib directory |
| then do: |
| |
| CFLAGS="-O3 -DASMV" ./configure |
| make OBJA=match.o |
| |
| |
| Update: |
| |
| I've been ignoring these assembly routines for years, believing that |
| gcc's generated code had caught up with it sometime around gcc 2.95 |
| and the major rearchitecting of the Pentium 4. However, I recently |
| learned that, despite what I believed, this code still has some life |
| in it. On the Pentium 4 and AMD64 chips, it continues to run about 8% |
| faster than the code produced by gcc 4.1. |
| |
| In acknowledgement of its continuing usefulness, I've altered the |
| license to match that of the rest of zlib. Share and Enjoy! |
| |
| Brian Raiter |
| breadbox@muppetlabs.com |
| April, 2007 |