Home‎ > ‎Architecture‎ > ‎

How to investigate performance issues in software?

posted Sep 29, 2018, 1:18 PM by Jageshwar Tripathi   [ updated Dec 1, 2018, 10:06 PM ]
Performance is one of the most important quality attribute of software systems. Often issues related to performance are reported/detected late and in that situation finding the root cause is very important. Without a clear approach often project teams keep looking at random places and try to apply random solutions which don't work. 

Few common types of performance issues reported

First of all understanding difference of following is very important which is often given name of performance issue in general:

Response time

When response of the application is so much time taking that it becomes difficult for a user to perform his tasks efficiently. Definition of good response time if not specified in requirement specification leads to argument. Expecting any response time is not reasonable. 

Some of the companies for their internal web applications adapt 2 second for login page, 9 seconds for other normal pages, 2 seconds for data base queries to execute at maximum. This is not a world wide standard and it can't be because an application may be self contained and independent but another may be dependent on other systems beyond its boundary to complete its own responsibilities. 

Important thing to consider is factors like network latency, response time from a dependent system or database before blaming an application for poor performance. 

Reasons behind poor response time

  • Network
  • Hardware
  • Software (which may be due to wrong architecture, design or code algorithm)

Scalability

This is the case when an application works fine in normal user load but it starts taking more time in response, stops responding or crashes when user load increases. 

Sometimes teams want to solve this problem by increasing the resources like memory or adding more node (scaling horizontally) but this may not solve the problem forever if there is something wrong in the architecture, design or code. 

Reasons behind scalability issues

  • Hardware
  • Bad architecture & design 
  • Code

Availability

System crashes at certain point of time or after some time. Some times is is just robustness issue which may expressed as performance issue by the users but some times it is not due to robustness but due to performance issues. 

Reasons behind availability issues

  • Memory leakage in code
  • Robustness of application due to issues in code
  • Hardware
  • Network

How to investigate performance issues in software application?

Before moving to conduct investigation there are few questions which need to be clearly asked and answered because often the performance issue reported by an end user gets diluted by the time it reaches an architect or developer. 
As part of the investigation too couple of questions needs to be answered. 

Questions to answer

  • How many users can work on the application with comfort?
  • What is the maximum, minimum and average user load? What is the number of concurrent as well as total in a day or so.
  • Which use-cases/scenarios have performance problem?
  • What is performance problem?  Examples are: Long initial loading, response on subsequent requests or server not able to cater to high volume of users?
  • What are these numbers (takes 5 minutes to load or subsequent requests take 3 minute, server crashes if users reach to 25 concurrent?
  • What are the server statistics right now? Total and used during memory during peak load. What is the processor utilization during peak load? 
  • What is the performance, application is designed for?
Most of the performance issues lie in database and network round trip. The network round trip may be from application layer to database or application communicating with other systems beyond its boundary. Hence any investigations should start with one of these areas (assuming capacity and latency related investigations are already conducted and it is sufficient)

Investigation for response time

Step 1: Database investigation

What to do? 

  • Database profiling
  • Static review of the DB code (automated + architect)

How to do?

  • Static  Analysis of code (DB) 
  • Database Architecture  review e.g. db queries/PLSQL, indexes, configuration.
  • DB Profiler /Tracing on. Tool available with MSSQL or Oracle

Examples checks

  • Time taken by queries to return the results in single user and concurrent user scenarios.
  • Checking whether right type of indexes are used properly. 
  • Queries are not going in full table scan. Cartesian products are not the result of queries where we intentionally don't want it.
  • Indexes are not made in a table column which is frequently updated. If such an index is created every now and then re-indexing will be done by DB and it will impact the performance of the DB.
  • Select * is not happening without need. (It must be avoided as much as possible)

Example recommendations

  • Use indexes on columns which are used in queries
  • Use joins instead of simply merging two queries which are making Cartesian product. 
  • Avoid too many indexes which are not used
  • Avoid select *

Step 2: Web/App Server side investigation

What to do?

  • Static review of the code for architecture design and code
  • Performance testing per scenario
  • Memory profiling 
  • Statistics of the system (processor utilization, memory utilizations etc)

How to do?

  • Static Analysis Tools
  • Profiler
  • Architect review

Example checks

  • Remote calls in loops
  • Common data is fetched each time from the database server or kept in user sessions which is bad.
  • Lots of data is kept in session scope
  • Pagination technique is used properly.
  • Chatty remote interfaces accessed from client side code (e.g. Angular layer accessing remote Rest services which are just wrapper of fine grained objects making couple of round trips for an operation.  
  • Fine grained access of database from server side code is minimum

Example recommendations

  • For coarse grained access from client, server side can provide Synthetic coarse grained objects wrapping fine grained objects
  • Avoiding remote method invocations as much as possible by software architecture changes.
  •  Make use of batch queries, prepared statements etc
  • Use caching for common data across application
  • Avoid heavy session state
  • Use pagination techniques (but don't load all the data to paginate in one go and don't keep it in session)
  • Co-located app and db service and having multiple nodes of such servers if we don't have any other choice left. In co-located scenario horizontal scalability is not possible (multiple nodes of collocated servers).

Step 3 : Client side investigation

What to do?

  • Static review of the code 
  • Network communication 

How to do?

  • Static Analysis Tools
  • Browser console/tools
  • Architect review

Example checks

  • Check if client side code is making multiple calls to remote services for a single operation.
  • In Angular: Watchers, $watch(), ng-repeat are expensive 

Example recommendations 

  • Java webstart in place of applets to avoid every time loading of the applets if this is the case (again even converting applets to standalone client will be a tricky task). These are ages old scenarios and hard to find in today's world
  • To avoid network round trip, fine grained access can be converted in to coarse grained access of the server side (if there is any such opportunity, in most of the cases this is already taken care of)
  • Thin client with RIA frameworks. 
  • If you are using UI/UX framework you must need to refer to best practices related to the framework for example  prefer $watchCollection() instead of $watch(),

Investigation for scalability

What to do?

  • Load Testing
  • After load test, architect review and even static analysis may be required to find out the root cause of scalability limitations found by load testing. 

How to do?

  • Load Runner or similar
  • Jmeter (open source)
  • OpenSTA
  • Manual testing scenario for small user load (when purpose is initial investigation only)
  • Architect review

Example recommendations

  • Architecture changes
  • Adding more nodes (clustering and load balancing etc.)
  • Adding more resources and going to 64 bit if 32 bit architecture is the limitation due to which we cannot add more resources.

Fact collection

Collection of facts is important for an architect to find the solution of performance and provide recommendations. Here are some sample formats of fact collection templates:

Define performance objectives

Please describe in measurable units, goals or objectives of performance:
Normally this should be part of non functional requirements of the specification. These are the needs of application owner/users.

Expected Response Time

 S.No.  Scenario/page   Response time in seconds
 1    
 2    

Throughput

 S.No.  Req/second  Transaction/second  bytes/second
 1      
 2      

Resource utilization

 S.No.  % CPU Utilisation  % Memory Utilisation  Network IO
 1      
 2      

Workload

 S.No.  Data volume  Transaction volume  Types of transactions
 1      
 2      

Key scenarios

 S.No.  Scenario  Description
 1    
 2    

Gather inputs from various teams

Application development/management team

S.No. Questions Answer
1 What is the performance problem?
2 Is it initial loading of the page or application which is slow?
3 Is every request to a page or scenario too slow? Which scenarios?
4 Is server crashing/going down/unavailable often?
5 If answer to Q4 is yes, with how many users server crashes?
6 If answer to Q4 is yes, does it happen when specific page/scenario is accesed?
7 Is caching used in DB access layer of the application?
8 Is caching used in application layer?
9 Is application using an ORM Tool?
10 Is pagination implemented in application?
11 If pagination done, is it getting page data every time from database or session?
12 Is remoting (RMI, RemoteEJB, .NET Remoting) used in application?
13 Is application server colocated with database server in the same machine/subnetwork?
14 Is loadbalancing used in the system? Hardware or software load balancing?
15 Is table partitioning used?
16 Are there long running transactions? Payment gateway or accessing many systems?
17 Are there federated transactions?

IT operations team

Inputs related to user load
S.No. Load type Concurrent Per day (any time in a day) Dominating scenario/usecase
 1  Maximum no of users      
 2 Minimum number of users    
 3  Average number of users (most common)      
 4  Maximum no of concurrent user load beyond which performance problems appear      

Inputs related to data volume

Inputs related to various systems

S.No. Web/App/DB Server Software
/Application deployed
Application technology Physical Memory Processor Type of Hard Disc Memory Utilisation Processor Utilisation

Average Peak/Max Average Peak
 1                    
 2          

Business users

Information about poor performing scenarios
S.No. Use Case/Scenario Problem Time to complete scenario
Max Most Common Min
1
 2          

Testing team

Results of performance testing of various scenarios/screens/transaction
S.No. Page/Screen/Transaction Time to load/Response time in second
Max Average Min
 1        
 2        

Database management team (DBAs)


S.No. Usecase Query/Procedure Query/Proc execution time in second
Max Average Min
 1     
 2     

Interesting resources

  1. https://ieeexplore.ieee.org/document/5752531
  2. https://cdn.oreillystatic.com/en/assets/1/event/134/Forensic%20tools%20for%20in-depth%20performance%20investigation%20Presentation.pdf
  3. https://www.datadoghq.com/blog/monitoring-101-investigation/
  4. https://support.solarwinds.com/Success_Center/Server_Application_Monitor_(SAM)/SAM_Documentation/Server_Application_Monitor_Getting_Started_Guide/040_Monitor/Investigate_application_performance_with_Performance_Analysis
  5. https://techbeacon.com/perfguild-5-insights-your-performance-testing-team
  6. https://www.comparitech.com/net-admin/application-performance-management/
Comments