ONTOCOM Cost Drivers

We differentiate among product, project and personnel cost drivers. The product category accounts for the influence of product properties on the overall costs. The project category states the dimensions of the engineering process which are relevant for the cost estimation, while the personnel one emphasizes the role of team experience, ability and continuity for the effort invested in the process.

1. Product Factors

1.1. Cost Drivers for Ontology Building

Complexity of the Domain Analysis: DCPLX

The domain complexity driver states for the efforts additionally arisen in the engineering project by the particularities of the ontology domain and its analysis during ontology building. The decision which concepts will be included and in which form they will be represented in an ontology depends not only on the intrinsic domain to be modeled (e.g., tourism), but rather on the application domain. The latter also involves the technical setting and the characteristics of the application in which the ontology is designed to be integrated to. As a third decision field we introduced the sources which could be eventually used as additional domain descriptions and thus as an aid for the domain analysis and the subsequent conceptualization. The global value for the DCLPX driver is a weighted sum of the aforementioned areas, which are depicted in Table 1.

Cost Driver DCPLX
DOMAIN Complexity
Rating Rating Scale
Very Low narrow scope, common-sense knowledge, low connectivity
Low narrow to moderate scope, common-sense or expert knowledge, low connectivity
Nominal moderate to wide scope, common-sense or expert knowledge, moderate connectivity
High moderate to wide scope, common-sense or expert knowledge, high connectivity
Very High wide scope, expert knowledge, high connectivity

Rating Rating Scale
Very Low few, simple req.
Low small number of non-conflicting req.
Nominal moderate number of req., with few conflicts, few usability req.
High high number of usability req., few conflicting req.
Very High very high number of req. with a high conflicting degree, high number of usability req.

Rating Rating Scale
Very High high number of sources in various forms
High competency questions and text documents available
Nominal some text documents available
Low some unstructured information sources available
Very Low none
Table 1: The Domain Complexity Cost Driver DCPLX
Back to Top

Complexity of the Conceptualization: CCPLX

In order to realistically classify the complexity of the domain analysis phase in terms of the pre-defined ratings we identified characteristics of the three areas which usually influence this measure. For the domain category, we considered the scope (narrow, moderate, wide), the commonality of the knowledge (be that common-sense knowledge or expert knowledge) and the connectivity of the domain. The latter is expressed in the number of interdependencies between domain concepts with ranges again among three levels (low, moderate and high), while the scope is a feature which is related to the generality, but also to the perceived amount of knowledge comprised per default in a certain domain. For example a domain such as some department of an organization is considered narrower than a domain describing a university, while the scope of the economics domain is of course classified as wide. The three criteria are prioritized according to common practices in the ontology engineering area, so that the connectivity of the domain is considered decisive for establishing the rating of this cost factor. The complexity of the requirements which are to be taken into consideration when building an ontology is characterized here by the total number of requirements available in conjunction with the rate of conflicting ones and the rate of usability requirements, since the latter are seen as a fundamental source of complexity for the building process.1 Finally the availability of information sources guiding the engineering team during the building process or offering valuable insights in the domain to be modeled can be a major success factor in ontology engineering. When deciding upon the impact of the information sources on the effort required to perform the domain analysis activity we suggest considering the number, the type and the form of the sources. The conceptualization complexity accounts for the impact of the structure of the conceptual ontology (taxonomy, conceptual graph etc.) and of help techniques such as modeling patterns on the overall engineering costs. On the other side, the existence of certain naming and modeling constraints might cause cost increases (see Table 2).

Cost Driver CCPLX
Rating Rating Scale
Very Low concept list
Low taxonomy, high number of patterns, no constraints
Nominal properties, general pattern available, some constraints
High axioms, few modeling pattern, considerable number of constraints
Very High instances, no patterns, considerable number of constraints
Table 2: The Conceptualization Complexity Cost Driver CCPLX
Back to Top

The Complexity of the Implementation: ICPLX

As mentioned in one of the basic assumptions in ONTOCOM is that the most significant factor for estimating the costs of ontology engineering projects is the size of the conceptual model, while the implementation issue is regarded to be a matter of tools, since a manual encoding of a conceptualization in a particular formal representation language is not common practice. However the original ONTOCOM model did not pay any attention to the semantic differences between the conceptual and the implementation level, differences which might appear in situations in which the usage of a specific representation language is mandatory. In this case the implementation of the ontology requires a non-trivial mapping between the knowledge level of the conceptualization and the paradigms beyond the used representation language. The costs arisen during this mapping are stated in the driver ICPX (implementation complexity), whose ratings are illustrated in Table 3. For simplification reasons we restricted the range of the ratings to 3 (from low to high).

Cost Driver ICPLX
Rating Rating Scale
Low The semantics of the conceptualization compatible to the one of the impl. lang.
Nominal Minor differences between the two
High Major differences between the two
Table 3: The Implementation Complexity Cost Driver ICPLX

To summarize the complexity of the target ontology in ONTOCOM is taken into account by means of three cost drivers, associated with the efforts arisen in the domain analysis, conceptualization and implementation phase. We analyzed features which are responsible for cost increases in these fields - independently of the size of the final ontology, the competence of the team involved or the setting of the current project - and aligned them to ratings from very low to very high for quantification purposes.

Back to Top

Complexity of the Instantiation: DATA

The population of an ontology and the associated testing operations might be related to considerable costs. The measure attempts to capture the effect instance data requirements have on the overall process. In particular the form of the instance data and the method required for its ontological formalization are significant factors for the costs of the engineering process (Table 4).

Cost Driver DATA
Rating Rating Scale
Very Low structured data, same repr. language
Low structured data with formal semantics
Nominal semi-structured data e.g. databases, XML
High semi-structured data in natural language, e. g. similar web pages
Very High unstructured data in natural langauge, free form
Table 4: The Instantiation Complexity Cost Driver DATA

On the basis of a survey of ontology population and learning approaches, we assume that the population of an ontology with available instance data with an unambiguous semantics can be performed more cost-effective than the processing of relational tables or XML-structured data. Further on, the extraction of ontology instances from poorly structured sources like natural language documents is assigned the highest value magnitude, due to the complexity of the task itself and of the pre-processing and post-processing activities. The rating does not take into consideration any costs related to eventual mapping operations which might be required to integrate data from external resources. For example, if the data is provided as instances of a second ontology, be that in the same representation language as the one at hand or not, the estimation of the DATA cost driver should account for the efforts implied by defining a mapping between the source and the target ontology as well. In this case, the parameter is to be multiplied with an increment M (Mapping), as depicted in Table 5 below.

Increment for Cost Driver DATA
Required mapping between source schema and target ontology
Rating Rating Scale
0.0 no mapping necessary
0.2 direct mapping
0.4 concept mapping
0.6 taxonomy mapping
0.8 relation mapping
1.0 axiom mapping
Table 5: M Increment for DATA

The M factor increments the effect of the DATA measure: an 1.0 M increment causes a 100% increase of the DATA measure while an 0.0 one does not have any influence on the final value of DATA.

Back to Top

Required Reusability : REUSE

The measure attempts to capture the effort associated with the development of a reusable ontology. Reusability is a major issue in the ontology engineering community, due to the inherent nature of ontologies, as artifacts for knowledge sharing and reuse. Currently there is no commonly agreed understanding of the criteria required by an ontology in order to increase its reusability. Usually reusability is mentioned in the context of application-independency, in that it is assumed that application-dependent ontologies are likely to imply significant customization costs if reused. Additionally several types of ontologies are often presumed to endue an increased reusability: core ontologies and upper-level ontologies describing general aspects of the world are often used in alignment tasks in order to ensure high-level ontological correctness. The Formal Ontological Analysis of Guarino also mentions 3 levels of generality, which might be associated with different reusability degrees: upper-level ontologies are used as ontological commitment for general purpose domain and task ontologies, while the latter two are combined to realize so-called application ontologies, which are used for particular tasks in information systems. According to these considerations the rating for the REUSE measure is depicted in Table 6.

Cost Driver REUSE
Rating Rating Scale
Very Low for this application
Low for this application type
Nominal application independent domain ontology
High core ontology
Very High upper level ontology
Table 6: Required Reusability Cost Driver  REUSE

Back to Top

Documentation Needs DOCU:

The DOCU measure is intended to state the additional costs caused by detailed documentation requirements. Likewise COCOMOII we differentiate among 5 values from very low (many lifecycle needs uncovered) to very high (very excessive for lifecycle needs) as illustrated in Table 7.

Cost Driver DOCU
Rating Rating Scale
Very Low many lifecycle needs uncovered
Low some lifecycle needs uncovered
Nominal right-sized to lifecycle needs uncovered
High excessive for lifecycle needs
Very High very excessive for lifecycle needs
Table 7: Documentation Needs Cost Driver DOCU

Back to Top

Complexity of the Ontology Evaluation: OE

The cost drivers captures the effort invested in evaluating ontologies, be that testing, reviewing, usability or ontological evaluation. While in a reuse situation the effort required for the evaluation of an ontology was monitored separately as the one implied for its comprehension, in the building case the level of the cost driver is determined autonomously of other cost factors by considering the level of activity required to test a preliminary ontology against its requirements specification document and for documentation purposes.

Cost Driver OE
Rating Rating Scale
Very Low small number of tests, easily generated and reviewed
Low moderate number of tests
Nominal high number of tests
High considerable tests, easy to moderate to generate and review
Very High extensive testing, difficult to generate and review
Table 8: The Ontology Evaluation Cost Driver OE

Back to Top

Complexity of the Ontology Integration: OI

This cost drivers measures the costs produced by integrating different ontologies to a common framework. The integration step is assumed to be performed on ontologies sharing the same representation language - the efforts required for this activity are covered by the OT (Ontology Translation) cost driver (see below) . As criteria influencing its complexity we identified the following:

  • overlapping degree among ontologies to be integrated: it is assumed that this issue is proportional to the effort required by the integration, since it is directly related to the number of mappings between ontological entities.

  • type of mappings between ontological primitives: 1 to 1 mappings are more easily discovered than multiple one (1 to n or n to m)

  • integration quality, in terms of precision (rate of correct mappings) and recall (rate of mappings discovered): higher quality requirements imply automatically increased efforts to perform the integration task.

  • number of ontologies: it is clear that the integration effort is directly proportional to the number of sources to be integrated

According to these considerations the ratings for the OI cost drivers were defined as depicted in Table 9 below.

Cost Driver  OI
Rating Rating Scale
Very Low 1-1 mappings, approx. 50% precision and recall required, barely overlapping, 2 ontologies
Low 1-1 mappings, approx. 60% precision and recall required, barely overlapping, 2 ontologies
Nominal 1-n mappings, approx. 70% precision and recall required, some overlapping, 2 ontologies
High 1-n mappings, approx. 80% precision and recall required, high overlapping, more than 2 ontologies
Very High n-m mappings, approx. 95% precision and recall required, high overlapping, more than 2 ontologies
Table 9: The Ontology Integration Cost Driver OI

Back to Top

1.2. Cost Drivers for Ontology Reuse and Maintenance

Though there is yet no fine-grained methodology to reuse existing ontologies in the Semantic Web community, the main steps and the associated challenges involved in the process are well-accepted by current ontology-based projects. This process is, however, related to significant costs and efforts, which may currently outweigh its benefits. First, as in other engineering disciplines, reusing some existing component implies costs to find, get familiar with, adapt and update the necessary modules in a new context. Second building a new ontology means partially translating between different representation schemes or performing scheme matching or both. For our cost estimation model we assume that relevant ontologies are available to the engineering team and, according to the mentioned top-level approach and to some case studies in ontology reuse we examine the following two phases of the reuse process w.r.t. the corresponding cost drivers:

  • ontology evaluation: get familiar with the ontology and assess its relevance for the target ontology

  • ontology customization: translate the sources to a desired format, eventually extract relevant sub-ontologies and finally integrate them to the target ontology

For the evaluation phase the engineering team is supposed to assess the relevance of a given ontology to particular application requirements. The success of the evaluation depends crucially on the extent to which the ontology is familiar to the assessment team. The customization phase implies the identification/extraction of sub-ontologies which are to be integrated in a direct, translated and modified form, respectively. In the first categories sub-ontologies are included directly to the target ontology. The re-usage of the second category is conditioned by the availability and the appropriate costs of knowledge representation translators, while the last category involves modifications of the original model in form of insertions, deletions or updates at the ontological primitives level.

Back to Top

Ontology Understandability: OU

Reusing an ontology and the associated efforts depend significantly on the ability of the ontologists and domain experts to understand the ontology, which is influenced by two categories of factors: the complexity of the conceptual model and the self-descriptiveness or the clarity of the conceptual model. Additionally, in case of the ontology engineer the comprehensiveness of an ontology depends on his domain experience, while domain experts are assumed to provide this know-how by definition. Factors contributing to the complexity of the model are the size and expressivity of the ontology and the number of imported models together with the complexity of the import dependency graph. The clarity of the model is mainly influenced by the human-perceived readability.

Cost driver  OU
Rating Rating Scale
Very Low complex dependency graph
large domain
complex representation language
no concept names
Low taxonomic dependency graph
large domain
complex representation
language concept names
Nominal taxonomic dependency graph
middle domain
moderate representation language
concept names
High no imports
middle domain
simple representation language
concept names
Very High no imports
small domain
simple representation language
concept names

Rating Rating Scale
Very Low representation language know-how
no comments in naturale language
no metadata
Low representation language know-how
no comments in naturale language
no metadata
Nominal representation language tool
30% comments in naturale language
no metadata
High representation language tool
60% comments in naturale language
no metadata
Very High representation language tool
90% comments in naturale language
Table 10: Complexity and Clarity Levels for Ontology Understanding OU

The complexity of the ontology depends on three factors: the size of the ontology, the expressivity of used representation language and the structure of the import graph - containing imported ontologies. The import graph structure (DG - dependency graph) can be divided into simple, as in taxonomical tree structures and complex, as in non-tree structures. Further on, the complexity of the used syntax (RL in Table 10) is termed to be simple for common taxonomical hierarchies, moderate if further property types are used and complex in the case of restrictions and axioms. The third ontology complexity driver is related to the size of the ontology: small ontologies are supposed to contain up to 100 ontological primitives, middle ontologies contain up to 1000 concepts, while ontologies with more than 1000 concepts are classified as large in our model (see Table 10) The clarity categorization depends on the readability/meaningfulness of ontological primitives, the technical know-how required by the representation language and the availability of natural language comments (comm.) and definitions. The understandability of an ontology can be increased significantly when ontological primitives are given meaningful names in a natural language which is familiar to the ontology engineer and the domain expert respectively. Further on, a self-descriptive representation language does not cause significant impediments in dealing with an ontology, especially when user-friendly tools are available.

Back to Top

Ontologist/Domain Expert Unfamiliarity: UNFM

The effort related to ontology maintenance decreases significantly in situations where the human user works frequently with the particular ontology. This measure accounts for this dependency and distinguishes among 6 levels as depicted in Table 18.

Increment for Cost Driver OU
Rating Rating Scale
0.0 self built
0.2 team built
0.4 every day usage
0.6 occasional usage
0.8 little experience
1.0 completely unfamiliar
Table 11: OU Increment for UNFM

The UNFM factor increments the effect of the Ontology Understanding measure: an 1.0 UNFM increment causes a 100% increase of the OU measure while an 0.0 one does not have any influence on the final value of OU (see Table 11).

Back to Top

Complexity of the Ontology Evaluation: OE

This measure accounts for the real effort needed to evaluate the ontology for reuse purposes (see Table 12). The measure assumes a satisfactory ontology understanding level and is associated solely with the efforts needed in order to decide whether a given ontology satisfies a particular set of requirements and to integrate its description into the overall product description.

Cost Driver OE
Rating Rating Scale
Very Low small number of tests, easily generated and reviewed
Low moderate number of tests
Nominal high number of tests
High considerable tests, easy to moderate to generate and review
Very High extensive testing, difficult to generate and review
Table 12: The Ontology Evaluation Cost Driver

Back to Top

Complexity of the Ontology Modifications: OM

This measure reflects the complexity of the modifications required by the reuse process after the evaluation phase has been completed (Tabel 13).

Cost Driver OM
Rating Rating Scale
Very Low few, simple modifications
Low some, simple modifications
Nominal some, moderate modifications
High considerable modifications
Very High excessive modifications
Table 12: The Ontology Modification Cost Driver OM

Back to Top

Ontology Translation: OT

Translating between knowledge representation languages is an essential part of a reuse process. Depending on the compatibility of the source and target representation languages, as well as on the availability and performance of the translating tools (amount of pre- and post-processing required), we differentiate among 5 values as depicted in Table 13.

Cost Driver  OT
Rating Rating Scale
Very Low direct
Low low manual effort
Nominal some manual effort
High considerable manual effort
Very High manual effort
Table 13: The Ontology Translation Cost Driver OT

Back to Top

2.Personnel Factors

Ontologist/Domain Expert Capability: OCAP/DECAP

The development of an ontology requires the collaboration between a team of ontology engineers (ontologists), usually with an advanced technical background, and a team of domain experts that provide the necessary know-how in the field to be ontologically modeled. These cost drivers account the perceived ability and efficiency of the single actors involved in the process, as well as their teamwork capabilities.

Cost Drivers OCAP/DECAP
Rating Rating Scale
Very Low 15%
Low 35%
Nominal 55%
High 75%
Very High 95%
Table 14: The Capability for the Engineering Team Cost Drivers OCAP/DECAP

Back to Top

Ontologist/Domain Expert Experience: OEXP/DEEXP

These measures take into account the experience of the engineering team consisting of both ontologists and domain experts w.r.t. the ontology engineering process. They are not related to the abilities of single team members, but relate directly to the experience in constructing ontologies and in conceptualizing a specific domain respectively.

Cost Drivers OEXP/DEEXP

Very Low Low Nominal High Very High
OEXP 2 months 6 months 1 year 1.5 years 3 years
DEEXP 6 months 1 year 3 years 5 years 7 years
Table 15: The Ontologists and Domain Experts Experience Cost Drivers OEXP/DEEXP

Back to Top

Language and Tool Experience: LEXP/TEXP

The aim of these cost drivers is to measure the level experience of the project team constructing the ontology w.r.t. the conceptualization language and the ontology management tools respectively. The conceptualization phase requires the usage of knowledge representation languages with appropriate expressivity (such as Description Logics or Prolog), while the concrete implementation is addicted to support tools such as editors, validators and reasoners. The distinction among language and tool experience is justified by the fact that while ontology languages rely on established knowledge representation languages from the Artificial Intelligence field and are thus possibly familiar to the ontology engineer, the tool experience implies explicitly the previous usage of typical ontology management tools and is not directly conditioned by the know-how of the engineering team in the KR field. The maximal time values for the tool experiences are adapted to the ontology management field and are thus lower than the corresponding language experience ratings (see Table 16).

Cost Drivers LEXP/TEXP

Very Low Low Nominal High Very High
LEXP 2 months 6 months 1 year 3 years 6 years
TEXP 2 months 6 months 1 year 1,5 years 3 years
Table 16: The Language and Tool Experience Cost Drivers LEXP/TEXP
Back to Top

Personnel Continuity: PCON

As in other engineering disciplines frequent changes in the project team are a major obstacle for the success of an ontology engineering process within given budget and time constraints. Due to the small size of the project teams we adapted the general ratings of the COCOMO model to a maximal team size of 10 (Table 17).

Cost Driver PCON

Very Low Low Nominal High Very High
PCON 6 years 3 years 1 year 6 months 2 months
Table 17: The Personnel Continuity Cost Driver PCON

Back to Top

3. Project Factors

Tool Support: TOOL

We take account of the different levels of tool support for the different phases of an ontology engineering process (domain analysis, conceptualization, implementation, ontology understanding and evaluation, ontology instantiation, ontology modification, ontology translation, ontology integration and documentation) by means of a single general-purpose cost driver and calculate the final value as the average tool support across the entire process. The ratings for tool support are defined at a general level, as shown in Table 18 below.

Cost Driver TOOL
Rating Rating Scale
Very High High quality tool support, no manual intervention needed
High Few manual processing required
Nominal Basic manual intervention needed
Low Some tool support
Very Low Minimal tool support, mostly manual processing
Table 18: The Tool Support Cost Driver TOOL
Back to Top

The rating of the cost driver should be specified for each of the most prominent process phases, while the importance of the corresponding phase is expressed in terms of weights. The global TOOL value for a specific project is calculated as a normalized sum of the weighted local values.

Back to Top

Multisite Development: SITE

Constructing an ontology requires intensive communication between ontology engineers and domain experts on one hand and between domain experts for consensus achievement purposes on the other hand. This measure involves the assessment of the communication support tools (Table 19).

Cost Driver SITE
Rating Rating Scale
Very Low mail
Low phone, fax
Nominal email
High teleconference, occasional meetings
Very High frequent F2F meetings
Table 19: The Multisite Ontology Development Cost Driver SITE
Back to Top

Required Development Schedule: SCED

This cost driver takes into account the particularities of the engineering process given certain schedule constraints. Accelerated schedules (ratings below 100%, see Table 20) tend to produce more efforts in the refinement and evolution steps due to the lack of time required by an elaborated domain analysis and conceptualization. Stretch-out schedules (over 100%) generate more effort in the earlier phases of the process while the evolution and refinement tasks are best case neglectable.

Cost driver SCED

Very Low Low Nominal High Very High
SCED 75% 85% 100% 130% 160%
Table 20: The Required Development Schedule Cost Driver SCED

For example, a high SCED value of 130% (Table 20) represents a stretch-out of the nominal schedule of 30% and thus more resources in the domain analysis and conceptualization.