OK lets move onto the where to measure the data. The answer is every where and any where it is produced. But specifically, lets look at unit and module tests. Unit tests are generally run very regularly, many times a day (especially for XP developments). One may use unit tests as profiling and run the occasional test looking for which unit tests take the longest. But there is more information to be gleaned from them... if you care to log it .. imagine a database being populated in the following manner
- Test suite resources utilisation: Contiguously (well every 5 minutes) on the test suite, the CPU, memory, network, disk utilisation of all the systems was also logged and the results automatically loaded into a database
- Test results: If when every unit tests were run, the result was automatically uploaded to a database with, date, time, package, build no, time taken to run each test and number of times or loops in the test
- Test suite configuration: On a weekly basis the test suite uploaded it's configuration, CPU type & speed, total memory, OS and version, patches, java/c/c++/c# version, number of disks and so on
- Firstly we can track how each test is performing over time, from it's first run 'till now
- Secondly we can spot any step changes (either up or down) in the way a single test is performing
- Lastly, we could pick some specific tests and use these are our KPI, a standard unit of performance, that future versions of the code can be compared to. This could be simple or a string of system states
- Correlate software measurements such as throughput, latency with resource utilisation, CPU, disk IO etc.
- look at resource utilisation over time.
So how would all this work on a day-to-day basis.
Bedding in: After an initial period of development of the infrastructure and tools, data would be collected. At first it will seem complicated, and far too much detail.
gaining acceptance: Then after a while, the general data patterns will be recognisable, and some general rules of thumb may develop (e.g. each unit test adds about 1-2ms to total test time, each unit test generates 20 IO, 15 read, 5 write and so on).
acceptance: Then deeper patterns will start to emerge. What will also become apparent is when things start to differ. If a new unit test (note new, not existing) takes more than 5ms, look carefully. Or what about a unit test that used to take 1ms to run is now consistently taking 3 ms to run. Once these features are being spotted, the performance of the system is far more quantitative, and far less subjective.
predictive: When the results are being used as engineering data. Performance of module x (as determined by one or more unit test results) is degrading. It will become a bottleneck. redesign activity required. Or even, a better disk subsystem would improve the performance of the tests by 2/3 and mean the unit tests run 1 minute faster.
historical: imagine being able to give quantitative answers to questions like "Architecture have decided to re-arrange the system layout like so (insert diagram here), what will be the effect of this layout compared to the current?". Unusually it would require re-coding the system under the various architectures and performing tests, but all that is now required is to understand the average effects of each of the components, and a good engineering estimate can be arrived at. Taking this a step further imagine "The system's performance indicates that the following architecture would be best like so (insert diagram here), and we/I believe that it would allow 20% more throughput".
Taking stock... Once upon a time tests were done infrequently, but it was realised that testing is the only sure fire way of ensuring features remain stable and bug free. The reduction of the time from bug introduction to bug detection is crucial. Why stop at bugs? Why not use the same approach to performance? When is it easiest to change the software architecture? at the beginning, when is it hardest, at the end? Why change software architecture, there are many reasons, but performance is one.
Lastly the quantification of the software performance is a key feature, because this is the only way to truly make engineering decisions. Developing software is full of contradictions and pro's and cons, it is only be quantifying them that you can confidently take the decisions .