ONTOCOM Cost Drivers
We differentiate among product, project and personnel cost drivers. The
product category accounts for the influence of product properties on the
overall costs. The project category states the dimensions of the engineering
process which are relevant for the cost estimation, while the personnel
one emphasizes the role of team experience, ability and continuity for the
effort invested in the process.
1. Product Factors
1.1. Cost Drivers for Ontology Building
Complexity of the Domain Analysis: DCPLX
The domain complexity driver states for the efforts additionally
arisen in the engineering project by the particularities of the ontology
domain and its analysis during ontology building. The decision which concepts
will be included and in which form they will be represented in an ontology
depends not only on the intrinsic domain to be modeled (e.g., tourism),
but rather on the application domain. The latter also involves the technical
setting and the characteristics of the application in which the ontology
is designed to be integrated to. As a third decision field we introduced
the sources which could be eventually used as additional domain descriptions
and thus as an aid for the domain analysis and the subsequent conceptualization.
The global value for the DCLPX driver is a weighted sum of the aforementioned
areas, which are depicted in Table 1.
Cost Driver DCPLX |
DOMAIN Complexity
Rating |
Rating Scale |
Very Low |
narrow scope, common-sense knowledge, low connectivity |
Low |
narrow to moderate scope, common-sense or expert knowledge,
low connectivity |
Nominal |
moderate to wide scope, common-sense or expert knowledge,
moderate connectivity |
High |
moderate to wide scope, common-sense or expert knowledge,
high connectivity |
Very High |
wide scope, expert knowledge, high connectivity |
|
|
REQUIREMENTS complexity
Rating |
Rating Scale |
Very Low |
few, simple req. |
Low |
small number of non-conflicting req. |
Nominal |
moderate number of req., with few conflicts, few usability
req. |
High |
high number of usability req., few conflicting req. |
Very High |
very high number of req. with a high conflicting degree, high
number of usability req. |
|
|
INFORMATION SOURCES availability
Rating |
Rating Scale |
Very High |
high number of sources in various forms |
High |
competency questions and text documents available |
Nominal |
some text documents available |
Low |
some unstructured information sources available |
Very Low |
none |
|
Table 1: The Domain Complexity Cost Driver DCPLX |
Back to Top
Complexity of the Conceptualization: CCPLX
In order to realistically classify the complexity of the
domain analysis phase in terms of the pre-defined ratings we identified
characteristics of the three areas which usually influence this measure.
For the domain category, we considered the scope (narrow, moderate, wide),
the commonality of the knowledge (be that common-sense knowledge or expert
knowledge) and the connectivity of the domain. The latter is expressed in
the number of interdependencies between domain concepts with ranges again
among three levels (low, moderate and high), while the scope is a feature
which is related to the generality, but also to the perceived amount of
knowledge comprised per default in a certain domain. For example a domain
such as some department of an organization is considered narrower than a
domain describing a university, while the scope of the economics domain
is of course classified as wide. The three criteria are prioritized according
to common practices in the ontology engineering area, so that the connectivity
of the domain is considered decisive for establishing the rating of this
cost factor. The complexity of the requirements which are to be taken into
consideration when building an ontology is characterized here by the total
number of requirements available in conjunction with the rate of conflicting
ones and the rate of usability requirements, since the latter are seen as
a fundamental source of complexity for the building process.1 Finally the
availability of information sources guiding the engineering team during
the building process or offering valuable insights in the domain to be modeled
can be a major success factor in ontology engineering. When deciding upon
the impact of the information sources on the effort required to perform
the domain analysis activity we suggest considering the number, the type
and the form of the sources. The conceptualization complexity accounts for
the impact of the structure of the conceptual ontology (taxonomy, conceptual
graph etc.) and of help techniques such as modeling patterns on the overall
engineering costs. On the other side, the existence of certain naming and
modeling constraints might cause cost increases (see Table 2).
Cost Driver CCPLX |
Rating |
Rating Scale |
Very Low |
concept list |
Low |
taxonomy, high number of patterns, no constraints |
Nominal |
properties, general pattern available, some constraints |
High |
axioms, few modeling pattern, considerable number of
constraints |
Very High |
instances, no patterns, considerable number of constraints |
|
Table 2: The Conceptualization Complexity Cost Driver CCPLX |
Back to Top
The Complexity of the Implementation: ICPLX
As mentioned in one of the basic assumptions in ONTOCOM is
that the most significant factor for estimating the costs of ontology engineering
projects is the size of the conceptual model, while the implementation issue
is regarded to be a matter of tools, since a manual encoding of a conceptualization
in a particular formal representation language is not common practice. However
the original ONTOCOM model did not pay any attention to the semantic differences
between the conceptual and the implementation level, differences which might
appear in situations in which the usage of a specific representation language
is mandatory. In this case the implementation of the ontology requires a
non-trivial mapping between the knowledge level of the conceptualization
and the paradigms beyond the used representation language. The costs arisen
during this mapping are stated in the driver ICPX (implementation complexity),
whose ratings are illustrated in Table 3. For simplification reasons we
restricted the range of the ratings to 3 (from low to high).
Cost Driver ICPLX |
Rating |
Rating Scale |
Low |
The semantics of the conceptualization compatible to the one
of the impl. lang. |
Nominal |
Minor differences between the two |
High |
Major differences between the two |
|
Table 3: The Implementation Complexity Cost Driver ICPLX |
To summarize the complexity of the target ontology in ONTOCOM is taken into
account by means of three cost drivers, associated with the efforts arisen
in the domain analysis, conceptualization and implementation phase. We analyzed
features which are responsible for cost increases in these fields - independently
of the size of the final ontology, the competence of the team involved or
the setting of the current project - and aligned them to ratings from very
low to very high for quantification purposes.
Back to Top
Complexity of the Instantiation: DATA
The population of an ontology and the associated testing
operations might be related to considerable costs. The measure attempts
to capture the effect instance data requirements have on the overall process.
In particular the form of the instance data and the method required for
its ontological formalization are significant factors for the costs of the
engineering process (Table 4).
Cost Driver DATA |
Rating |
Rating Scale |
Very Low |
structured data, same repr. language |
Low |
structured data with formal semantics |
Nominal |
semi-structured data e.g. databases, XML |
High |
semi-structured data in natural language, e. g. similar web
pages |
Very High |
unstructured data in natural langauge, free form |
|
Table 4: The Instantiation Complexity Cost Driver DATA |
On the basis of a survey of ontology population and learning approaches,
we assume that the population of an ontology with available instance data
with an unambiguous semantics can be performed more cost-effective than
the processing of relational tables or XML-structured data. Further on,
the extraction of ontology instances from poorly structured sources like
natural language documents is assigned the highest value magnitude, due to
the complexity of the task itself and of the pre-processing and post-processing
activities. The rating does not take into consideration any costs related
to eventual mapping operations which might be required to integrate data
from external resources. For example, if the data is provided as instances
of a second ontology, be that in the same representation language as the
one at hand or not, the estimation of the DATA cost driver should account
for the efforts implied by defining a mapping between the source and the
target ontology as well. In this case, the parameter is to be multiplied
with an increment M (Mapping), as depicted in Table 5 below.
Increment for Cost Driver DATA |
Required mapping between source schema and target ontology |
Rating |
Rating Scale |
0.0 |
no mapping necessary |
0.2 |
direct mapping |
0.4 |
concept mapping |
0.6 |
taxonomy mapping |
0.8 |
relation mapping |
1.0 |
axiom mapping |
|
Table 5: M Increment for DATA |
The M factor increments the effect of the DATA measure: an 1.0 M increment
causes a 100% increase of the DATA measure while an 0.0 one does not have
any influence on the final value of DATA.
Back to Top
Required Reusability : REUSE
The measure attempts to capture the effort associated with
the development of a reusable ontology. Reusability is a major issue in
the ontology engineering community, due to the inherent nature of ontologies,
as artifacts for knowledge sharing and reuse. Currently there is no commonly
agreed understanding of the criteria required by an ontology in order to
increase its reusability. Usually reusability is mentioned in the context
of application-independency, in that it is assumed that application-dependent
ontologies are likely to imply significant customization costs if reused.
Additionally several types of ontologies are often presumed to endue an
increased reusability: core ontologies and upper-level ontologies describing
general aspects of the world are often used in alignment tasks in order
to ensure high-level ontological correctness. The Formal Ontological Analysis
of Guarino also mentions 3 levels of generality, which might be associated
with different reusability degrees: upper-level ontologies are used as ontological
commitment for general purpose domain and task ontologies, while the latter
two are combined to realize so-called application ontologies, which are
used for particular tasks in information systems. According to these considerations
the rating for the REUSE measure is depicted in Table 6.
Cost Driver REUSE |
Rating |
Rating Scale |
Very Low |
for this application |
Low |
for this application type |
Nominal |
application independent domain ontology |
High |
core ontology |
Very High |
upper level ontology |
|
Table 6: Required Reusability Cost Driver REUSE |
Back to Top
Documentation Needs DOCU:
The DOCU measure is intended to state the additional costs
caused by detailed documentation requirements. Likewise COCOMOII we differentiate
among 5 values from very low (many lifecycle needs uncovered) to very high
(very excessive for lifecycle needs) as illustrated in Table 7.
Cost Driver DOCU |
Rating |
Rating Scale |
Very Low |
many lifecycle needs uncovered |
Low |
some lifecycle needs uncovered |
Nominal |
right-sized to lifecycle needs uncovered |
High |
excessive for lifecycle needs |
Very High |
very excessive for lifecycle needs |
|
Table 7: Documentation Needs Cost Driver DOCU |
Back to Top
Complexity of the Ontology Evaluation: OE
The cost drivers captures the effort invested in evaluating
ontologies, be that testing, reviewing, usability or ontological evaluation.
While in a reuse situation the effort required for the evaluation of an
ontology was monitored separately as the one implied for its comprehension,
in the building case the level of the cost driver is determined autonomously
of other cost factors by considering the level of activity required to test
a preliminary ontology against its requirements specification document and
for documentation purposes.
Cost Driver OE |
Rating |
Rating Scale |
Very Low |
small number of tests, easily generated and reviewed |
Low |
moderate number of tests |
Nominal |
high number of tests |
High |
considerable tests, easy to moderate to generate and review |
Very High |
extensive testing, difficult to generate and review |
|
Table 8: The Ontology Evaluation Cost Driver OE |
Back to Top
Complexity of the Ontology Integration: OI
This cost drivers measures the costs produced by integrating
different ontologies to a common framework. The integration step is assumed to
be performed on ontologies sharing the same representation language - the
efforts required for this activity are covered by the OT (Ontology Translation)
cost driver (see below) . As criteria influencing its complexity we identified
the following:
-
overlapping degree among ontologies to be integrated:
it is assumed that this issue is proportional to the effort required by
the integration, since it is directly related to the number of mappings
between ontological entities.
-
type of mappings between ontological primitives: 1 to 1 mappings are
more easily discovered than multiple one (1 to n or n to m)
-
integration quality, in terms of precision (rate of correct mappings)
and recall (rate of mappings discovered): higher quality requirements
imply automatically increased efforts to perform the integration task.
-
number of ontologies: it is clear that the integration effort is directly
proportional to the number of sources to be integrated
According to these considerations the ratings for the OI cost
drivers were defined as depicted in Table 9 below.
Cost Driver OI |
Rating |
Rating Scale |
Very Low |
1-1 mappings, approx. 50% precision and recall required, barely
overlapping, 2 ontologies |
Low |
1-1 mappings, approx. 60% precision and recall required, barely
overlapping, 2 ontologies |
Nominal |
1-n mappings, approx. 70% precision and recall required, some
overlapping, 2 ontologies |
High |
1-n mappings, approx. 80% precision and recall required, high
overlapping, more than 2 ontologies |
Very High |
n-m mappings, approx. 95% precision and recall required, high
overlapping, more than 2 ontologies |
|
Table 9: The Ontology Integration Cost Driver OI |
Back to Top
1.2. Cost Drivers for Ontology Reuse and Maintenance
Though there is yet no fine-grained methodology to reuse
existing ontologies in the Semantic Web community, the main steps and the
associated challenges involved in the process are well-accepted by current
ontology-based projects. This process is, however, related to significant
costs and efforts, which may currently outweigh its benefits. First, as
in other engineering disciplines, reusing some existing component implies
costs to find, get familiar with, adapt and update the necessary modules
in a new context. Second building a new ontology means partially translating
between different representation schemes or performing scheme matching or
both. For our cost estimation model we assume that relevant ontologies are
available to the engineering team and, according to the mentioned top-level
approach and to some case studies in ontology reuse we examine the following
two phases of the reuse process w.r.t. the corresponding cost drivers:
-
ontology evaluation: get familiar with the ontology
and assess its relevance for the target ontology
-
ontology customization: translate the sources to a desired format,
eventually extract relevant sub-ontologies and finally integrate them
to the target ontology
For the evaluation phase the engineering team is supposed
to assess the relevance of a given ontology to particular application requirements.
The success of the evaluation depends crucially on the extent to which the
ontology is familiar to the assessment team. The customization phase implies
the identification/extraction of sub-ontologies which are to be integrated
in a direct, translated and modified form, respectively. In the first categories
sub-ontologies are included directly to the target ontology. The re-usage
of the second category is conditioned by the availability and the appropriate
costs of knowledge representation translators, while the last category involves
modifications of the original model in form of insertions, deletions or
updates at the ontological primitives level.
Back to Top
Ontology Understandability: OU
Reusing an ontology and the associated efforts depend significantly
on the ability of the ontologists and domain experts to understand the ontology,
which is influenced by two categories of factors: the complexity of the
conceptual model and the self-descriptiveness or the clarity of the conceptual
model. Additionally, in case of the ontology engineer the comprehensiveness
of an ontology depends on his domain experience, while domain experts are
assumed to provide this know-how by definition. Factors contributing to
the complexity of the model are the size and expressivity of the ontology
and the number of imported models together with the complexity of the import
dependency graph. The clarity of the model is mainly influenced by the human-perceived
readability.
Cost driver OU |
Complexity
Rating |
Rating Scale |
Very Low |
complex dependency graph
large domain
complex representation language
no concept names |
Low |
taxonomic dependency graph
large domain
complex representation
language concept names |
Nominal |
taxonomic dependency graph
middle domain
moderate representation language
concept names |
High |
no imports
middle domain
simple representation language
concept names |
Very High |
no imports
small domain
simple representation language
concept names |
|
|
Clarity
Rating |
Rating Scale |
Very Low |
representation language know-how
no comments in naturale language
no metadata |
Low |
representation language know-how
no comments in naturale language
no metadata |
Nominal |
representation language tool
30% comments in naturale language
no metadata |
High |
representation language tool
60% comments in naturale language
no metadata |
Very High |
representation language tool
90% comments in naturale language
metadata |
|
Table 10: Complexity and Clarity Levels for Ontology Understanding
OU |
The complexity of the ontology depends on three factors: the size of the
ontology, the expressivity of used representation language and the structure
of the import graph - containing imported ontologies. The import graph structure
(DG - dependency graph) can be divided into simple, as in taxonomical tree
structures and complex, as in non-tree structures. Further on, the complexity
of the used syntax (RL in Table 10) is termed to be simple for common taxonomical
hierarchies, moderate if further property types are used and complex in
the case of restrictions and axioms. The third ontology complexity driver
is related to the size of the ontology: small ontologies are supposed to
contain up to 100 ontological primitives, middle ontologies contain up to
1000 concepts, while ontologies with more than 1000 concepts are classified
as large in our model (see Table 10) The clarity categorization depends
on the readability/meaningfulness of ontological primitives, the technical
know-how required by the representation language and the availability of
natural language comments (comm.) and definitions. The understandability
of an ontology can be increased significantly when ontological primitives
are given meaningful names in a natural language which is familiar to the
ontology engineer and the domain expert respectively. Further on, a self-descriptive
representation language does not cause significant impediments in dealing
with an ontology, especially when user-friendly tools are available.
Back to Top
Ontologist/Domain Expert Unfamiliarity: UNFM
The effort related to ontology maintenance decreases significantly
in situations where the human user works frequently with the particular
ontology. This measure accounts for this dependency and distinguishes among
6 levels as depicted in Table 18.
Increment for Cost Driver OU |
Rating |
Rating Scale |
0.0 |
self built |
0.2 |
team built |
0.4 |
every day usage |
0.6 |
occasional usage |
0.8 |
little experience |
1.0 |
completely unfamiliar |
|
Table 11: OU Increment for UNFM |
The UNFM factor increments the effect of the Ontology Understanding measure:
an 1.0 UNFM increment causes a 100% increase of the OU measure while an
0.0 one does not have any influence on the final value of OU (see Table
11).
Back to Top
Complexity of the Ontology Evaluation: OE
This measure accounts for the real effort needed to evaluate
the ontology for reuse purposes (see Table 12). The measure assumes a satisfactory
ontology understanding level and is associated solely with the efforts needed
in order to decide whether a given ontology satisfies a particular set of
requirements and to integrate its description into the overall product description.
Cost Driver OE |
Rating |
Rating Scale |
Very Low |
small number of tests, easily generated and reviewed |
Low |
moderate number of tests |
Nominal |
high number of tests |
High |
considerable tests, easy to moderate to generate and review |
Very High |
extensive testing, difficult to generate and review |
|
Table 12: The Ontology Evaluation Cost Driver |
Back to Top
Complexity of the Ontology Modifications: OM
This measure reflects the complexity of the modifications required
by the reuse process after the evaluation phase has been completed (Tabel
13).
Cost Driver OM |
Rating |
Rating Scale |
Very Low |
few, simple modifications |
Low |
some, simple modifications |
Nominal |
some, moderate modifications |
High |
considerable modifications |
Very High |
excessive modifications |
|
Table 12: The Ontology Modification Cost Driver OM |
Back to Top
Ontology Translation: OT
Translating between knowledge representation languages is an
essential part of a reuse process. Depending on the compatibility of the
source and target representation languages, as well as on the availability
and performance of the translating tools (amount of pre- and post-processing
required), we differentiate among 5 values as depicted in Table 13.
Cost Driver OT |
Rating |
Rating Scale |
Very Low |
direct |
Low |
low manual effort |
Nominal |
some manual effort |
High |
considerable manual effort |
Very High |
manual effort |
|
Table 13: The Ontology Translation Cost Driver OT |
Back to Top
2.Personnel Factors
Ontologist/Domain Expert Capability: OCAP/DECAP
The development of an ontology requires the collaboration
between a team of ontology engineers (ontologists), usually with an advanced
technical background, and a team of domain experts that provide the necessary
know-how in the field to be ontologically modeled. These cost drivers account
the perceived ability and efficiency of the single actors involved in the
process, as well as their teamwork capabilities.
Cost Drivers OCAP/DECAP |
Rating |
Rating Scale |
Very Low |
15% |
Low |
35% |
Nominal |
55% |
High |
75% |
Very High |
95% |
|
Table 14: The Capability for the Engineering Team Cost Drivers
OCAP/DECAP |
Back to Top
Ontologist/Domain Expert Experience: OEXP/DEEXP
These measures take into account the experience of the engineering
team consisting of both ontologists and domain experts w.r.t. the ontology
engineering process. They are not related to the abilities of single team
members, but relate directly to the experience in constructing ontologies
and in conceptualizing a specific domain respectively.
Cost Drivers OEXP/DEEXP |
|
Very Low |
Low |
Nominal |
High |
Very High |
OEXP |
2 months |
6 months |
1 year |
1.5 years |
3 years |
DEEXP |
6 months |
1 year |
3 years |
5 years |
7 years |
|
Table 15: The Ontologists and Domain Experts Experience Cost Drivers
OEXP/DEEXP |
Back to Top
Language and Tool Experience: LEXP/TEXP
The aim of these cost drivers is to measure the level experience
of the project team constructing the ontology w.r.t. the conceptualization
language and the ontology management tools respectively. The conceptualization
phase requires the usage of knowledge representation languages with appropriate
expressivity (such as Description Logics or Prolog), while the concrete
implementation is addicted to support tools such as editors, validators
and reasoners. The distinction among language and tool experience is justified
by the fact that while ontology languages rely on established knowledge
representation languages from the Artificial Intelligence field and are
thus possibly familiar to the ontology engineer, the tool experience implies
explicitly the previous usage of typical ontology management tools and is
not directly conditioned by the know-how of the engineering team in the
KR field. The maximal time values for the tool experiences are adapted to
the ontology management field and are thus lower than the corresponding
language experience ratings (see Table 16).
Cost Drivers LEXP/TEXP |
|
Very Low |
Low |
Nominal |
High |
Very High |
LEXP |
2 months |
6 months |
1 year |
3 years |
6 years |
TEXP |
2 months |
6 months |
1 year |
1,5 years |
3 years |
|
Table 16: The Language and Tool Experience Cost Drivers LEXP/TEXP |
Back to Top
Personnel Continuity: PCON
As in other engineering disciplines frequent changes in the
project team are a major obstacle for the success of an ontology engineering
process within given budget and time constraints. Due to the small size
of the project teams we adapted the general ratings of the COCOMO model
to a maximal team size of 10 (Table 17).
Cost Driver PCON |
|
Very Low |
Low |
Nominal |
High |
Very High |
PCON |
6 years |
3 years |
1 year |
6 months |
2 months |
|
Table 17: The Personnel Continuity Cost Driver PCON |
Back to Top
3. Project Factors
Tool Support: TOOL
We take account of the different levels of tool support for
the different phases of an ontology engineering process (domain analysis,
conceptualization, implementation, ontology understanding and evaluation,
ontology instantiation, ontology modification, ontology translation, ontology
integration and documentation) by means of a single general-purpose cost
driver and calculate the final value as the average tool support across
the entire process. The ratings for tool support are defined at a general
level, as shown in Table 18 below.
Cost Driver TOOL |
Rating |
Rating Scale |
Very High |
High quality tool support, no manual intervention needed |
High |
Few manual processing required |
Nominal |
Basic manual intervention needed |
Low |
Some tool support |
Very Low |
Minimal tool support, mostly manual processing |
|
Table 18: The Tool Support Cost Driver TOOL |
Back to Top
The rating of the cost driver should be specified for each of the most prominent
process phases, while the importance of the corresponding phase is expressed
in terms of weights. The global TOOL value for a specific project is calculated
as a normalized sum of the weighted local values.
Back to Top
Multisite Development: SITE
Constructing an ontology requires intensive communication
between ontology engineers and domain experts on one hand and between domain
experts for consensus achievement purposes on the other hand. This measure
involves the assessment of the communication support tools (Table 19).
Cost Driver SITE |
Rating |
Rating Scale |
Very Low |
mail |
Low |
phone, fax |
Nominal |
email |
High |
teleconference, occasional meetings |
Very High |
frequent F2F meetings |
|
Table 19: The Multisite Ontology Development Cost Driver SITE |
Back to Top
Required Development Schedule: SCED
This cost driver takes into account the particularities of
the engineering process given certain schedule constraints. Accelerated
schedules (ratings below 100%, see Table 20) tend to produce more efforts
in the refinement and evolution steps due to the lack of time required by
an elaborated domain analysis and conceptualization. Stretch-out schedules
(over 100%) generate more effort in the earlier phases of the process while
the evolution and refinement tasks are best case neglectable.
Cost driver SCED |
|
Very Low |
Low |
Nominal |
High |
Very High |
SCED |
75% |
85% |
100% |
130% |
160% |
|
Table 20: The Required Development Schedule Cost Driver SCED |
For example, a high SCED value of 130% (Table 20) represents a stretch-out
of the nominal schedule of 30% and thus more resources in the domain analysis
and conceptualization.