My little Delphi Memory Replacement
Test
2004-03-02
The Candidates
| Name | Distributable(1) |
Version |
Where to get? | Cost |
| Pure Delphi | Yes |
D6 |
Build In | included |
| HPMM | ? |
2001-09-19 - 1.2 |
from ex-www.optimalcode.com, today down | free |
| RecyclerMM | ? (No) |
2004-01-29 |
http://glscene.sourceforge.net/index.php | free |
| BigBrain | ? (Yes) |
2004-02-16 |
http://www.digitaltundra.com/ | $50 |
| NexusDB Memory Manager | Yes |
2004-01-19 |
http://www.nexusdb.com/ | $70 |
| QMemory | ? |
2001-01-24 - 2.01 |
via DSP | free |
(1) read as: you can distribute your compiled commercial application royalty free with it. ? means that no license information can be found or I didn't understand it.
The Test
A 'data compiler' that merges and translates some huge data definition file from their own format into one common format. Therefor it creates a huge number of cache TLists & TStringlists, does some getmem/freemem, creates tobjects from interpreted data, discards these lists and objects. It is a single threaded console application, no forms-unit, less than 12.000 lines of code, most of the time it is busy with string manipulation, .create and .free. It creates 100% load.
HPMM has been used to collect some data about the 'data compiler' (of course not during the test). I simply added some counting into the getmem, freemem and reallocmem procedures.
Getmem-Calls: 47.915.159, Reallocmem-Calls: 6.348.579, Freemem-Calls: 47.915.060
Blocksize <16 <32 <64 <128 <512 <1k <10k <100k <500k otherGetmem 28.317.517 10.947.741 8.400.639 248.320 904 2 30 5 0 1Reallocmem 2.606.869 2.677.423 490.131 473.327 89.195 6.224 5.388 22 0 0(Read <32 as: 17...31) The "pure delphi" version was run two times, one at the beginning and one at the end. This should give an idea about the time measuring accuracy. The average of both runs has been used as 100%.
Peak memory usage is queried by GetProcessMemoryInfo, execution time is measured by gettickcount.
I took the Memory Manager 'out of the box', no modification, no compiler switch changes. Nothing. I just added the unit into the 'uses' list at first position, build, copy exe into test folder, next memory manager into the uses, build, copy and so on.
The Test Results
System1: a AMD Duron 1200 MHz 100 MHz FSB, L1 64k, L2 64k, 512 MB/1
2004-02-28 Absolute numbers AAA
Name ExeSize Execution Time Peak Memory UsagePure Delphi 198.144 2.107,671
2.096,295 149.680.128
149.680.128HPMM 215.040 2.801,258 151.879.680RecycleMM 2004-01-29 216.064 3.208,183 155.672.576BigBrain 2004-02-16 204.800 2.428,402 237.596.672NexusDB Memory Manager 2004-01-19 206.336 2.215,226 152.059.904QMemory 202.752 2.720,803 152.743.9362004-02-28 Relative numbers AAA
Name ExeSize Execution Time Peak Memory UsagePure Delphi =0 100,27%
99,73% =0HPMM 16896 133,27% 2.199.552RecycleMM 2004-01-29 17920 152,63% 5.992.448BigBrain 2004-02-16 6656 115,53% 87.916.544NexusDB Memory Manager 2004-01-19 8192 105,39% 2.379.776QMemory 4608 129,44% 3.063.808System2: a AMD Barton 2500 MHz 166 MHz FSB, L1 64k, L2 512k, 1 GB/2
2004-03-02 Absolute numbers AAA
Name ExeSize Execution Time Peak Memory UsagePure Delphi 198.144 1.356,952
1.322,251 149.684.224
149.684.224HPMM 215.040 1.782,473 151.883.776RecycleMM 2004-01-29 216.064 2.048,465 155.676.672BigBrain 2004-02-16 204.800 1.575,846 237.600.768NexusDB Memory Manager 2004-01-19 206.336 1.372,413 152.055.808QMemory 202.752 1.372,834 152.748.0322004-03-02 Relative numbers AAA
Name ExeSize Execution Time Peak Memory UsagePure Delphi =0 101,30%
98,70% =0HPMM 16896 133,27% 2.199.552RecycleMM 2004-01-29 17920 152,92% 5.992.448BigBrain 2004-02-16 6656 117,64% 87.916.544NexusDB Memory Manager 2004-01-19 8192 102,45% 2.371.584QMemory 4608 102,48% 3.063.808
Conclusion
I'm surprised.
All memory manager vendors advertise with better speed and memory usage. Some vendors even explicitly state that they feel that their MM is the fastest, nothing can be faster even in single thread environments.
All my tests show that every memory manager is slower and uses more memory.
Again, I'm surprised.
After running the second test I'm even more surprise: while 2 MMs showed the results I assumed (almost no change), 2 MMs had a speed change of about 2% that could be within measuring accuracy. And one had a change at more than 25%.
Again, I'm even more surprised.
Addendum (2004-06-13):
One of the problems with multi threaded application: what percentage of memory management is done in what thread? Are threads really competing in memory requests or most of the time idle (waiting for the next job)?
You might not see that much speed up for 'idle mode' applications. And other benefits might be eaten by slow implementations and more global memory usage (that is what this test shows).
So for a 'better' MM I expect to have at least the same performance and memory usage in single threaded situations the Borland MM has.
Explanations
The difference for QMemory is surprising. My only idea is, that it shows the different processor plattform, specially the huge L2 cache. (Yes, I double checked that all the executables are the right ones and run the test multiple times.)
After publishing I got some feed back from vendors.
- One general remark was that the memory managers only show their speed on a multi processor environment and applications with multiple threads. My test application is single threaded and was run on a single processor environment.
- Another note was that I don't publish detailed data about my test program. And it was concerned about 'eliminate all external influences'. I only can answer: all MMs had been tested with the same test program and had to live with the same external influences (whatever these influences are, system was doing nothing beside the test application, double running pure delphi shows the accuracy). So the calculation time for the non-memory-manager-related part is all the same for all tests. The difference is MM related.
The test application *is* memory stressing. What else should it be to test the quality of a MM?
Motivation
My 'data compiler' has a worse runtime. Some need about two to three days to process one file. So every speed up, even a small percentage of speed up is a benefit. So while running my small test 'data compilers' I had time to surf the net and look for those speed ups.
Btw.: algorithm changes had of course been the biggest benefit. The 'data compiler' started with estimated runtimes being around at multiple weeks.
The 'data compiler' has been tested with MemCeck (Vincent.Mahon) and Range/Stack/Var-checking.