Data model

A data model is an abstract model that describes how data is represented and accessed. Data models formally define data objects and relationships among data objects for a domain of interest. Some typical applications of database models include supporting the development of databases and enabling the exchange of data for a particular area of interest. Data models are specified in a data modeling language.


Managing large quantities of structured and unstructured data is a primary function of information systems. Data models describe structured data for storage in data management systems such as relational databases. They typically do not describe unstructured data, such as word processing documents, email messages, pictures, digital audio, and video.

The role of data models

Data models support data and computer systems by providing the definition and format of data. If this is done consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data. The results of this are indicated above. However, systems and interfaces often cost more than they should, to build, operate, and maintain. They may also constrain the business rather than support it. A major cause is that the quality of the data models implemented in systems and interfaces is poor.

  • Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces.
  • Entity types are often not identified, or incorrectly identified. This can lead to replication of data, data structure, and functionality, together with the attendant costs of that duplication in development and maintenance.
  • Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25-70% of the cost of current systems.
  • Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data has not been standardised. For example, engineering design data and drawings for process plant are still sometimes exchanged on paper.

The reason for these problems is a lack of standards that will ensure that data models will both meet business needs and be consistent.

Three perspectives

A data model instance may be one of three kinds according to ANSI in 1975:

  • Conceptual schema : describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model.
  • Logical schema : describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things.
  • Physical schema : describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.

The significance of this approach, according to ANSI, is that it allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model. In each case, of course, the structures must remain consistent with the other model. The table/column structure may be different from a direct translation of the entity classes and attributes, but it must ultimately carry out the objectives of the conceptual entity class structure. Early phases of many software development projects emphasize the design of a conceptual data model. Such a design can be detailed into a logical data model. In later stages, this model may be translated into physical data model. However, it is also possible to implement a conceptual model directly.


In the 1960s the concept of management information system (MIS) was initiated. During that time, the information system provided the data and information for management purposes. The first generation database system, called Integrated Data Store (IDS), was designed by Charles Bachman at General Electric. Two famous database models, the network data model and the hierarchical data model, where proposed during this periode of time. Prior to the development of the first database management system (DMMS), access to data was provided by application programs that accessed flat files. The data integrity problem and the inability of such file processing systems to represent logical data relationships lead to the first data model: the hierarchical data model. This model, which was implemented primarily by IBM's Information Management System (IMS) only allows one-to-one or one-to-many relationships between entities. Any entity at the many end of the relationship can be related only to one netity at the one end. End 1960s Edgar F. Codd worked out his theories of data arrangement, and proposed the relational model for database management based on first-order predicate logic.

In the 1970s entity relationship modeling was created as a means of graphically representing data structures. An entity-relationship model (ERM) is an abstract conceptual representation of structured data. Entity-relationship modeling is a relational schema database modeling method, used in software engineering to produce a type of conceptual data model (or semantic data model) of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created using this process are called entity-relationship diagrams, or ER diagrams or ERDs for short. Originally proposed in 1976 by Peter Chen.

In the 1980s, a significantly new approach to data modeling was engineered by G.M. Nijssen. Deemed NIAM, short for “Nijssen’s Information Analysis Methodology,” it has since been re-named ORM, or “object role modeling.” The purpose is to show representations of relationships instead of showing types of entities as relational table analogs. With a focus on the use of language in making data modeling more accessible to a wider audience, ORM has a much higher potential for describing business regulations as well as constraints.

The development of the object-oriented paradigm brought about a fundamental change in the way we look at data and the procedures that operate on data. traditionally, data and procedures have been stored seperately, the data and their relationship in a database, the procedures in and application program. Object orientation, however, combined an entity's procedure with its data.

Types of data models

Database model

A database model is a theory or specification describing how a database is structured and used. Several such models have been suggested. Common models include:

  • Flat model: This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another.
  • Hierarchical model: In this model data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.
  • Network model: This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and sets define one-to-many relationships between records: one owner, many members.
  • Relational model: is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values.

  • Entity-relationship model: is an abstract conceptual representation of structured data, which produce a conceptual data model a system, and its requirements in a top-down fashion.
  • Object-relational model: Similar to a relational database model, but objects, classes and inheritance are directly supported in database schemas and in the query language.
  • Star schema is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema.

Geographic data model

A data model in Geographic information systems is a mathematical construct for representing geographic objects or surfaces as data. For example, the vector data model represents geography as collections of points, lines, and polygons; the raster data model represent geography as cell matrixes that store numeric values; and the Triangulated irregular network (TIN) data model represents geography as sets of contiguous, nonoverlapping triangles.

Generic data model

Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type. Generic data models are developed as an approach to solve some shortcomings of conventional data models. For example, different modelers usually produce different conventional data models of the same domain. This can lead to difficulty in bringing the models of different people together and is an obstacle for data exchange and data integration. Invariably, however, this difference is attributable to different levels of abstraction in the models and differences in the kinds of facts that can be instantiated (the semantic expression capabilities of the models). The modelers need to communicate and agree on certain elements which are to be rendered more concretely, in order to make the differences less significant.

Semantic data model

A semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. A semantic data model is sometimes called a conceptual data model.

The logical data structure of a database management system (DBMS), whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.

Related models

Information model

An Information model is not a type of data model, but more or less an alternative model. Within the field of software engineering both a data model and an information model can be abstract, formal representations of entity types that includes their properties, relationships and the operations that can be performed on them. The entity types in the model may be kinds of real-world objects, such as devices in a network, or they may themselves be abstract, such as for the entities used in a billing system. Typically, they are used to model a constrained domain that can be described by a closed set of entity types, properties, relationships and operations.

According to Lee (1999) an information model in is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. It can provide sharable, stable, and organized structure of information requirements for the domain context. More in general the term information model is used for models of individual things, such as facilities, buildings, process plants, etc. In those cases the concept is specialised to Facility Information Model, Building Information Model, Plant Information Model, etc. Such an information model is an integration of a model of the facility with the data and documents about the facility.

An information model provides formalism to the description of a problem domain without constraining how that description is mapped to an actual implementation in software. There may be many mappings of the information model. Such mappings are called data models, irrespective of whether they are object models (e.g. using UML), entity relationship models or XML schemas.

Method related models

A lot of the existing data modeling methods, software development methodologies, and other modeling languages in the field of computer science have defined their own type of models.

Object model

An object model in computer science is a collection of objects or classes through which a program can examine and manipulate some specific parts of its world. In other words, the object-oriented interface to some service or system. Such an interface is said to be the object model of the represented service or system. For example, the Document Object Model (DOM) is a collection of objects that represent a page in a web browser, used by script programs to examine and dynamically change the page. There is a Microsoft Excel object model for controlling Microsoft Excel from another program, and the ASCOM Telescope Driver is an object model for controlling an astronomical telescope.

In computing the term object model has a distinct second meaning of the general properties of objects in a specific computer programming language, technology, notation or methodology that uses them. For example, the Java object model, the COM object model, or the object model of OMT. Such object models are usually defined using concepts such as class, message, inheritance, polymorphism, and encapsulation. There is an extensive literature on formalized object models as a subset of the formal semantics of programming languages.

Unified Modeling Language models

The Unified Modeling Language (UML) is a standardized general-purpose modeling language in the field of software engineering. It is a graphical language for visualizing, specifying, constructing, and documenting the artifacts of a software-intensive system. The Unified Modeling Language offers a standard way to write a system's blueprints, including:

The UML is offering a mix of functional models, data models, and database models.

Data model topics

Data properties

Some important properties of data for which requirements need to be met are definition related properties:

  • relevance: the usefulness of the data in the context of your business.
  • clarity: the availability of a clear and shared definition for the data.
  • consistency: the compatibility of the same type of data from different sources.

And content related properties such as:

  • timeliness: the availability of data at the time required and how up to date that data is.
  • accuracy: how close to the truth the data is.

And finally related to both are:

  • completeness: how much of the required data is available.
  • accessibility: where, how, and to whom the data is available or not available (e.g. security).
  • cost: the cost incurred in obtaining the data, and making it available for use.

Data Models address the properties related to the definition of data.

Data organization

Another kind of data model describes how to organize data using a database management system or other data management technology. It describes, for example, relational tables and columns or object-oriented classes and attributes. Such a data model is sometimes referred to as the physical data model, but in the original ANSI three schema architecture, it is called "logical". In that architecture, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model is derived from the more conceptual data model described above. It may differ, however, to account for constraints like processing capacity and usage patterns.

While data analysis is a common term for data modeling, the activity actually has more in common with the ideas and methods of synthesis (inferring general concepts from particular instances) than it does with Analysis (identifying component concepts from more general ones). {Presumably we call ourselves systems analysts because no one can say systems synthesists.} Data modeling strives to bring the data structures of interest together into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures with relationships.

A different approach is through the use of adaptive systems such as artificial neural networks that can autonomously create implicit models of data.

Data structure

A data structure is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data. Often a carefully chosen data structure will allow the most efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data type.

A data model describes the structure of the data within a given domain and, by implication, the underlying structure of that domain itself. This means that a data model in fact specifies a dedicated grammar for a dedicated artificial language for that domain. A data model represents classes of entities (kinds of things) about which a company wishes to hold information, the attributes of that information, and relationships among those entities and (often implicit) relationships among those attributes. The model describes the organization of the data to some extent irrespective of how data might be represented in a computer system.

The entities represented by a data model can be the tangible entities, but models that include such concrete entity classes tend to change over time. Robust data models often identify abstractions of such entities. For example, a data model might include an entity class called "Person", representing all the people who interact with an organization. Such an abstract entity class is typically more appropriate than ones called "Vendor" or "Employee", which identify specific roles played by those people.

Data model theory

The term data model can have two meanings:

  1. A data model theory, i.e. a formal description of how data may be structured and accessed.
  2. A data model instance, i.e. applying a data model theory to create a practical data model instance for some particular application.

A data model theory has three main components:

  • The structural part: a collection of data structures which are used to create databases representing the entities or objects modeled by the database.
  • The integrity part: a collection of rules governing the constraints placed on these data structures to ensure structural integrity.
  • The manipulation part: a collection of operators which can be applied to the data structures, to update and query the data contained in the database.

For example, in the relational model, the structural part is based on a modified concept of the mathematical relation; the integrity part is expressed in first-order logic and the manipulation part is expressed using the relational algebra, tuple calculus and domain calculus.

A Data Model Instance is created by applying a Data Model Theory. This is typically done to solve some business enterprise requirement. Business requirements are normally captured by a semantic logical data model. This is transformed into a physical Data Model Instance from which is generated a physical database. For example, a Data modeler may use a data modeling tool to create an Entity-relationship model of the Corporate data repository of some business enterprise. This model is transformed into a relational model, which in turn generates a relational database.

Zachman Framework

In an alternative framework, called the Zachman Framework, a data model instance may be one of six kinds (according to John Zachman, 1987, 1992, 2005, 2007):

  • a contextual data model (list) identifies entity classes (representing things of significance to the organization).
  • a conceptual data model (semantics) defines the meaning of the things in an organization. This consists relationships (assertions about associations between pairs of entity classes).
  • a Logical schema | logical data model (schema) describes the logic representation of the properties without regard to a particular data manipulation technology. This consists of descriptions of the attributes (role a data element plays in relation to the thing (entity) it represents.
  • a Physical schema | physical data model (blueprint) describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.
  • a data definition (configuration) This is the actual language coding of the database schema in the chosen development platform.
  • a data instantiation holds the values of properties applied to the data in the schema.

The significance of this approach, according to John Zachman, is that it allows the six perspectives to be relatively independent of each other and have different contributors, audiences and purposes. In each case, of course, the structures must remain consistent with the other model instances although the details change. The table/column structure may be different from a direct translation of the entity classes, relationships and attributes, but it must ultimately carry out the objectives of the contextual entity class structure and conceptual relationship structure. Zachman regards each perspective a separate and distinct vantage point of the data: his view is not a methodology but rather a way of classifying the parts, however development projects and software tools often proceed from Contextual list, to conceptual data model, followed by the Logical schema|logical data model. In later stages when the data platform is known (whether it be database software or filing cabinets), this model may be translated into a Physical schema|physical data model followed by the data definition. When the database actually stores values and is operational data manipulation can take place.

See also


Further reading

  • David C. Hay (1996). Data Model Patterns: Conventions of Thought. New York:Dorset House Publishers, Inc.
  • Matthew West and Julian Fowler (1999). Developing High Quality Data Models The European Process Industries STEP Technical Liaison Executive (EPISTLE).
  • Len Silverston (2001). The Data Model Resource Book Volume 1/2. John Wiley & Sons.
  • RFC 3444 - On the Difference between Information Models and Data Models

External links

Search another word or see Data_modelon Dictionary | Thesaurus |Spanish
Copyright © 2015, LLC. All rights reserved.
  • Please Login or Sign Up to use the Recent Searches feature