Data Modeling Concepts for Beginners

Data Modeling Concepts for Beginners

Source Node: 2623283
data modeling conceptsdata modeling concepts

The concepts of Data Modeling support a holistic picture of how data moves through a system. Data Modeling can be described as the process of designing a data system or part of a data system. These models can range from storage systems to databases to the organization’s entire data structure. Data models can be used as a design for implementing a new system or as reference material for systems that have already been established. 

A “complete” data model should communicate the types of data that are used and stored within a data system, the formats used, the relationships between data files, and the ways the data can be grouped and organized.

Many businesses develop unique, individual data models (and the resulting unique, individual data systems) built around the organization’s specific needs and requirements. These models can be used to visualize data movement through the system. A data model can attempt to cover all aspects of the data flow through an organization, or specific parameters, such as showing only sales data for research purposes.

A well-designed data model will explain the business rules, as well as the need for regulatory compliance of the data.

There are three phases in the Data Modeling process: the conceptual model, the logical model, and the physical model. Each phase, or stage of the model’s development, serves a specific purpose. Additionally, there are several “types” of models.

Visual data models are similar to an architect’s blueprints and can be supported with linked text to provide guidance when developing or altering the data system. Examples of visual data models can be found here.

The Benefits and Challenges of Data Modeling

Developing a data model provides a map and a communications tool for creating or modifying a data system. Data Modeling concepts make the construction of a data system much, much easier. The newly built database and/or data system should support good organizational communications. It should also support real-time projects, including gathering data on spending patterns, invoices, and other business processes.

The Data Modeling process can be used to identify Data Quality issues, including duplicate, redundant, and missing data.

One difficulty in creating a data model is a lack of understanding of data systems – a problem normally eliminated by building the model. Another problem is that a small change in one area may require significant changes in other areas. Additionally, it can be easy to become so focused on the data system’s structure that the strengths and weaknesses of individual applications are ignored.

Important Questions to Ask

Developing a data model begins with collecting information about the organization’s needs, requirements, and goals. A model of part of the system will require fewer questions than developing a model for an entirely new system. Some basic questions to ask for a model of part of the system are: 

  • What is the purpose or goal of the changes?
  • What types of data is the system currently working with?
  • What data is needed?
  • What tools or software are needed to achieve the goal?
  • Are the tools or software compatible?

A data model should be built around the organization’s needs and are an important factor in developing a new model or adjusting an old one. The questions asked when designing a database, or an entirely new system, often require much more extensive answers. It’s best to incorporate a five-year business plan when answering these questions: 

  • What are the business’s goals (research, sales, apps development, accounting services)? This will determine the best types of software to support the business (NoSQL or graphics for research, SQL for basic sales or accounting, access to various clouds or several cloud services for apps development).
  • What types of software are most appropriate and cost-effective for the organization?
  • How many people will be accessing the system simultaneously?
  • How many departments are there, and how many people are in each department?
  • Will different departments require different kinds of software?
  • Are there any unusual needs that should be considered? 
  • How much data will need to be stored?
  • Is scalability an issue?
  • Will the database connect to business intelligence tools?
  • Are online analytic queries (OLAP), transaction processing (OLTP), or both needed?
  • Will the database integrate with the current tech stack?
  • Will the data’s format need to be transformed?
  • What are your preferred programming languages?
  • Will it be integrated with any machine learning software?

The Three Phases of Data Modeling

Data Modeling became important during the 1960s, when management information systems were first becoming popular. (Before the ’60s, there was little in the way of actual data storage. Computers of that time were basically giant calculators.) 

In terms of Data Modeling concepts, a fully developed data model is often built in three phases: the conceptual model, the logical model, and the physical model. This design process provides a clear understanding of the data system and how the data flows through it. This process also shows how the storage procedures work and helps to ensure that all data objects in the system are represented. (If data is information that has been stored electronically, then a data object is an individual collection of information stored electronically, such as a file or a data table.)

The conceptual data model is typically used to describe the system’s most basic components and how the data moves through the system. The conceptual data model communicates how information moves through one department and on to the next. It shows broad entities (representations of things that exist in reality) and their relationships (associations that exist between two or more entities). Detailed information is generally omitted.

The logical data model normally focuses on the layout and structure of data objects within the model and establishes the relationships between them. It also provides a foundation for building the physical model. The logical data model adds useful information to the conceptual model.

The physical data model is essentially a pre-implementation model and is very detailed and often focused on the database design. It shows the necessary details for developing the database (but can also be used to implement a new part of the system). This Data Modeling concept makes visualizing the data structure much easier by communicating database constraints, column keys, triggers, and other data management features. This model also communicates access profiles, authorizations, primary and foreign keys, etc.

Different Types of Data Models

Below are some examples of the different types of data models.

The hierarchical model is fairly old and was quite popular in the 1960s and ’70s. It organizes the data into tree-like structures. Today, it is used primarily for storing filing systems and geographic information. In the hierarchical model, the data is organized into a one-to-many relationship with the data files.

The network model is similar to the hierarchical model and permits the creation of various relationships with linked records. The network model allows people to construct the model using sets of related records. Each record is associated with multiple files and data objects, promoting and presenting complex relationships.

The entity-relationship model is a graphical representation of data files and entities and their relationships. It attempts to create real-world scenarios. As a data system model, the entity-relationship model develops an entity set, a relationship set, attributes, and constraints. They are often used in designing relational databases.

The graph data model requires determining which entities within your dataset should be designated nodes, which should be designated links, and which ones should be discarded. The graph data model provides a layout of the data’s entities, properties, and relationships. The process is repetitive, relies on trial and error, and can be tedious, but is worth doing right.  

The object-oriented database model focuses on data objects associated with methods and features. It incorporates tables but is not necessarily limited to tables. Data and its relationships are stored together as a single entity (a data object). Data objects represent real-world entities. The object-oriented database model handles a variety of formats and is used for research.

The relational model, often referred to as SQL, is currently the most popular data model. It uses two-dimensional tables for storing data and communicating relationships. All the data of a certain type is stored in rows as part of a table. The tables represent relationships, and joining them establishes the relations between the stored data. The relational database model is a mature model supported by a massive amount of software for a variety of purposes.

The NoSQL data model does not use rows and columns and doesn’t really use any kind of a set structure. Their development and design are typically focused on creating physical data models. Scalability, with its specific quirks and problems, is a significant concern. 

An object-relational database model combines the object-oriented database model with the relational database Model. It stores objects, classes, inheritance, etc., in the same way as an object-oriented model, but also supports tabular structures like the relational database model. This design allows designers to incorporate its features into a table structure.

The Importance of Data Modeling Concepts

Data models are like blueprints, but they define the relationships, entities, and attributes of a database or data system. An organized and well-designed data model is necessary for developing an efficient physical database and data system. A good understanding of Data Modeling concepts is needed to eliminate storage problems and redundancy issues while supporting efficient data retrieval. 

Data Modeling can be a challenge, and it is important to recognize that each type of model comes with its own benefits and drawbacks. 

Image used under license from Shutterstock.com

Time Stamp:

More from DATAVERSITY