For when will it be and why so late? This is the question IT people know all too well. How to find the answer? From this article, you will learn about traditional, proven estimation techniques (some are more than 40 years old!), and what the future holds for this topic.
Estimation techniques in based on historical data
The approach based on past data is very accurate, but only under certain conditions. If you are implementing recurring projects and tasks, it is worth using this technique. Since in the last few projects the team delivered a typical CRUD in 2 days, it's likely that this turnaround time will be repeated in subsequent similar tasks.
To get started, you need reliable data. Preferably from more than one project. On top of that, the project you price must be similar to previous ones in terms of:
- Scope to be accomplished,
- Complexity of the field,
- The skills of the team.
If any of these parameters are fundamentally different, then historical data will not be helpful and can lead to sizable estimation errors.
You can perform estimation based on previous projects at different levels of detail. For example:
- You estimate that the new project is roughly half of the previous one. So you estimate its duration and budget at 50% of what it was previously.
- You analyze historical data in detail, which shows that it takes X amount of time to build one screen displaying data, one report is Y hours, and so on. With such data, all you have to do is estimate the number of similar elements in the new system and perform a simple multiplication. The result will be much more accurate than with the previous approach.
Techniques using algorithms
If there is a lack of historical data, an approach using proven algorithms may be the solution. And proven over the years, because one such technique - Functional Points - was developed as early as 1979 by Allan Albresh.
Estimation using function points
Determining the complexity of a project z based on function points requires several stages of calculations.
First, the specialist estimates the number of system elements assigned to five categories, which include external inputs and outputs, interfaces, external queries and internal file types.
Then, for each element, he determines the complexity - low, medium or high, after which he assigns a point value derived from the element category and its complexity (according to predefined tables).
The next step is to estimate the impact of fourteen design factors, whose value is determined from 0 (no impact) to 5 (critical impact). The sum of the points thus collected is substituted into the appropriate formulas to obtain the final score.
AFP, or Adjusted Function Points, is the final value of a project's complexity, taking into account both its expected technical implementation and project-wide factors such as ease of installation or intensity of system operations.
For details of function point calculation, see for example the study: "Function Point Based Estimation of Effort and Cost in Agile Software Development." or in this article.
The AFP value does not directly imply the hours of work or dollars needed to fund the project. It is necessary to determine the conversion rate, which is different for different technologies or work methodologies. Such a conversion rate can be created based on previous projects. You can also use publicly available values based on projects from different industries and companies.
For example, in the article: "What Is the Cost of One IFPUG Method Function Point?- *Case Study"[*13] you will find that one function point is 128 lines of code in C, but only 53 lines in Java.
The calculation of function points is also described in detail in the ISO/IEC 20926:2009 standard prepared by the IFPUG organization.
Time and cost estimation using COCOMO II
TheCOCOMO model was developed in 1981, with its latest version, COCOMO II, based on data from 161 projects.
The standard entry point for estimation using COCOMO II is the expected single lines of code, but function points (FPs) described above can also be used.
When estimating the number of lines of code, you should:
- Include only code related to the product being built. Tools that test or support in some other way are not included.
- Count only code written by the team working on the project. Code generated by tools is not considered.
- Omit comments.
The next step in pricing according to COCOMO II is to define levels for factors that determine the scale of the project, such as:
- Experience in building similar solutions (Precedentedness).
- Flexibility, i.e. how rigidly the requirements and end result have been defined (Development Flexibility).
- Architecture Risk, i.e. how much the technical solution used has been analyzed (Architecture / Risk Resolution).
- Team Cohesion - how well the team is aligned and effectively managed (Team Cohesion).
- Maturity of the process (Process Maturity).
The next step is to determine the value of each of the seventeen factors affecting project costs. These relate to:
- Team (experience, analytical skills, programming skills, etc.).
- Product (complexity, expected reliability, documentation to be prepared, etc.).
- The environment (limitations of disk space, response times, etc.).
- Design (e.g., use of appropriate tools).
By substituting these values into the appropriate formulas, you end up with the duration of the project and the work to be done (i.e. the number of hours, which translates into the cost of the project).
The exact values and formulas would be impossible to describe in a short article. If you want to know how to estimate using COCOMO II then:
- Look online for studies of the "COCOMO II Model Definition Manual "4 (about 100 pages of description).
- Use off-the-shelf calculators that will do the calculations for you after substituting data (for example, http://softwarecost.org/tools/COCOMO/).
Other methods using algorithms
Other techniques that use algorithms to estimate project time consumption are:
- Puntam Model,
- Object Points,
- Use Case Points.
Estimation techniques using expert knowledge
Another group of estimation techniques, are expert estimates. In this situation, an expert or group of experts estimates how long the work will take and how large a team will be needed to complete it.
Several approaches to such estimations can be distinguished.
Estimating by an expert
In this case, a single expert familiarizes himself with the requirements and constraints of the project, then determines the estimated duration and cost of the project based on his experience.
The correctness of these estimates therefore depends very much on the experience of the person preparing the calculations. Estimates are also susceptible to ordinary human error.
What is even more important, the efficiency of an expert is completely different from a novice programmer. Therefore, adjustments should be made to the calculations prepared in this way, based on the average speed of the team.
The origins of this technique date back to the 1950s of the previous century, when it was noticed that evaluations developed by a single expert are often wrong. So they began to invite groups of experts to do joint estimating. An important element of the Delphi method is the anonymity of the voting, so that the results given by high-level managers or people of high authority do not influence the rest of the group.
Before voting, everyone is familiarized with the topic presented, for example, in the form of requirements documentation. An anonymous vote is then conducted. After it is completed, each participant is informed of the remaining results, but without information on who is the author of which evaluation.
If the results are so close across the group, then the next issue can be moved on. If there are major discrepancies, a discussion of the topic follows. The moderator then conducts another round of voting. The process is repeated until there is no consensus on the value of the estimate.
The traditional Delphi method was time-consuming and required intensive involvement by the moderator. James Grenning developed the plannig poker technique in 2002 for estimating the time consumption of IT projects. Something that is often a standard tool of today's teams.
The process should involve the entire team involved in the project. Voting is done using cards. These can be special plannig poker cards, traditional playing cards or dedicated virtual tools. One person presents the topic to be estimated. Then each player assigns an estimate to the topic, but does not reveal it to the other players. Only when everyone is ready, then the results are compared.
If the results coincide, the final score is determined. Usually, as the most popular value (to avoid fractions). If there is a significant discrepancy, the highest and lowest scorers present their arguments first. After the discussion, the participants can change the rating or repeat the entire voting process. If a concordance of results is obtained, the next task is analyzed.
The main advantage of this approach is the collection of information from a wide group of people. As a result, individual mistakes are eliminated, and the average value of estimates is often close to the actual value. According to one study, the average estimation error using planning poker was twice as small as when the estimation was done by a single expert (7.1% versus 14.8%).
Techniques such as COCOMO II give estimates in hours and days of operation. Using planning poker, you can estimate both time consumption in hours/days and relative dependencies between tasks. The latter approach is much more common.
In relative estimation, only the relationships between tasks are determined. That is, a team using the planning poker technique, for example, determines that task 1 is twice as complex as task 2, and task 3 is five times as complex as task 1, and so on.
Relationships can be written in points (called Story Points) or using common T-Shirt sizes, viz: S, M, L, XL, etc. To then get the duration of tasks, you need to:
- For estimation in Story Points, determine the speed of the team in delivering Story Points per sprint. You can do this by, for example, planning the first sprint exactly, and at the end counting how many Story Points "came in" for the sprint so planned.
- For estimation in T-Shirt size, the team should determine how much time, on average, it will take to deliver one S, M, and so on. Then the total expected project duration will come out from multiplying and adding up.
Importantly, the relative estimates before converting them to time units are independent of the team and the technology. Relative task complexity is a fact, not an assessment. Therefore, this solution works well if the final shape of the team is not known, or changes frequently.
Estimation using machine learning
Natural language processing by machine learning mechanisms is used for many purposes. Including for time-consuming estimation. In a study prepared by researchers from Australia and the United States, the effects of using various machine learning algorithms to prepare relative estimates in sixteen projects are presented. For three projects, the average estimation error was below unity (0.64; 0.68; 0.74). That is, the estimate prepared by the automaton differed on average by no more than 1 SP from the estimates prepared by a human.
The result obtained is really promising, since relative estimations by definition have some error. And even the same team estimating the number of Story Points after a few months would get a measurement error greater than zero.
On the Internet you will find many studies on the application of machine learning to estimate the time consumption of IT projects. For example, a list of various techniques and articles on them is in the paper *"Software Effort Estimation Accuracy Prediction of Machine Learning Techniques: A Systematic Performance Evaluation"[*11].
In our country, a number of papers on estimating the time consumption of IT projects using machine learning have been published by Dr. Przemyslaw Pospieszny, such as in this paper already from 2015: *"Application of data mining techniques for estimation of labor intensity of IT projects"[*12].
Undoubtedly, the future in the area of project estimation will belong to tools that, based on huge data sets combined with expert support, will be able to predict the completion date and budget of projects with high accuracy.
A summary of the various techniques for estimating project time consumption
If you have historical data from previous projects and your environment is fairly stable (similar projects, low turnover in the team), your best bet is to estimate based on information from previous projects. You can do this more or less formally. And if you use tools that use Machine Learning for this, the estimations will be very precise.
When there is a lack of past data and you need to develop an accurate valuation, then algorithm-based techniques like COCOMO II can come to the rescue. You can use function points or the estimated number of lines of code as a measure of complexity. Such estimates take a long time to prepare and are not very resistant to change. That is, after major changes in scope, you need to repeat the calculations.
In a situation where you need a rough estimate and expect a lot of changes during the project, then relative estimates will work well. A tool that will help you gather these estimates quickly and accurately is planning poker.
- "Function Point Based Estimation of Effort and Cost in Agile Software Development.", Anupam Yadava, Ashish Sharma.
- COCOMO II Model Definition Manual, Version 2.1
- "A Review on Software Cost and Effort Estimation Techniques for Agile Development Process," Manju Vyas et al. International Journal of Recent Research Aspects ISSN: 2349-7688, Vol. 5, Issue 1, March 2018.
- "A deep learning model for estimating story points," Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Aditya Ghose, Tim Menzies.
- "Software Effort Estimation Accuracy Prediction of Machine Learning Techniques: A Systematic Performance Evaluation," Yasir Mahmood, Nazri Kamaa, Azri Azmia, Ahmad Salman Khanb, Mazlan Alic.
- "Application of Data Mining Techniques to Estimation of Labor Intensity of IT Projects," Andrzej Kobylinski, Przemyslaw Pospieszny.
- "What Is the Cost of One IFPUG Method Function Point? - A Case Study," Beata Czarnacka-Chrobot