2.5.13  Database Design Techniques and Deliverables

2.5.13.1  (04-01-2005)
Introduction

  1. Database design is a technique that involves the analysis, design, description, and specification of data designed for automated business data processing. This technique uses models to enhance communication between developers and customers.

  2. Data models and supporting descriptions are the tools used in database design. These tools become the deliverables that result from applying database design. There are two primary objectives for developing of these deliverables. The first objective is to produce documentation that describes a customer’s perspective of data and the relationships among this data. The second objective is to produce documentation that describes the customer organization's environment, operations and data needs. In accomplishing these objectives, the following deliverables result:

    • Decision Analysis and Description Forms

    • Task Analysis and Description Forms

    • Task/Data Element Usage Matrix

    • Data Models

    • Entity-Attribute Lists

    • Data Definition Lists

    • Physical Database Specifications Document

  3. Consider a database approach if one or more of the following conditions exist in the user environment:

    • A multiple number of applications are to be supported by the system.

    • A multiple number of processes or activities use a multiple number of data sources.

    • A multiple number of data sources are used in the reports produced.

    • The data, from the data definitions, are known to be in existing database(s).

    • The development effort is to enhance the capabilities of an existing database.

  4. If it appears that conditions would support database development, then undertake the activities of logical database analysis and design. When the logical schema and sub schemas are completed they are translated into their physical counterparts. Then the physical sub schemas are supplied as part of the data specifications for program design. The exact boundary between the last stages of logical design and the first stages of physical analysis is difficult to assess because of the lack of standard terminology. However, there seems to be general agreement that logical design encompasses a DBMS-independent view of data and that physical design results in a specification for the database structure, as it is to be physically stored. The design step between these two that produces a schema that can be processed by a DBMS is called implementation design.

  5. Do not limit database development considerations to providing random access or ad hoc query capabilities for the system. However, even if conditions appear to support database development, postpone the decision to implement or not implement a DBMS until after completing a thorough study of the current environment. This study must clarify any alternatives that may or may not be preferable to DBMS implementation.

2.5.13.1.1  (04-01-2003)
Purpose

  1. This manual establishes standards, guidelines, and other controls for documenting database systems. This manual describes techniques for analyzing, designing, and modeling databases. This manual is distributed to promote the development of database documentation that is easy to understand, change, and maintain.

2.5.13.1.2  (04-01-2003)
Scope

  1. The standards, guidelines, and other controls established in the manual apply to all databases developed for the Internal Revenue Service. This development includes that performed by government employees as well as contractors.

2.5.13.1.3  (04-01-2003)
Organization and Content

  1. The first part of this document outlines steps to be followed in performing logical database analysis and design. The second part covers the steps of logical database analysis and design in detail along with suggested guidelines on how they must be accomplished.

  2. The procedures and methodologies suggested in this handbook have been developed based on procedures recommended by several experts in database development and operation. These include S. Atre, Peter Chen, James Fry, David Jefferson, David Kroenke, James Martin, and Toby Teorey. Users who undertake a database design and are unfamiliar with the terminology used or techniques suggested in this handbook should request assistance from the Software Quality Group

2.5.13.2  (04-01-2003)
Logical Database Design

  1. To develop a logical database, analyze the business of the organization that the database would support, how the operations relate to each other, and what data is used in business operations. After this analysis, model the data. This modeling involves studying data usage and grouping data elements into logical units so that a task supported by one or more organizational units is independent of support provided for other tasks. Exhibit 2.5.13-7 provides a glossary of terms for logical database design.

  2. By providing each task with its own data groups, changes in the data requirements of one task will have minimal, if any, impact on data provided for another task. By having data managed as a synthesis, data redundancy is minimized and data consistency among tasks and activities is improved. Figure 2.5.13-1 graphically expresses this point.

    Figure 2.5.13-1

    This image is too large to be displayed in the current screen. Please click the link to view the image.

  3. Logical database design comprises two methods to derive a logical database design. The first method is used to analyze the business performed by an organization. Following this analysis, the second method is used to model the data that supports the business. These methods are:

    1. Business Analysis

    2. Data Modeling

2.5.13.2.1  (04-01-2003)
Business Analysis

  1. Business analysis is a method for analyzing and understanding a customer’s business. In applying this method, the objectives are to:

    • Gain a clear understanding of an organization's objectives and how it performs its mission.

    • Focus the analysis on identifying specific requirements that must be reflected in the database. This involves decision analysis and task analysis. In identifying each decision and task, the analyst focuses on the information requirements and how they are related. The intent is to gain understanding, not to provide a critique of the operations.

    • Identify not only stated data needs but also various indicators such as organizational structure, environmental policies, interaction between functions, etc., which may indicate additional data requirements.

    • Define the scope of the database and the environment that the database will support including any constraints on the database operation.

    • Produce documentation that presents a valid picture of the organization's operation.

  2. Prior to applying this method, acquired and study the following documentation:

    • A high level data flow diagram (DFD) depicting the major applications to be supported and the major data sources and outputs of these applications.

    • Detailed DFDs depicting the functions and tasks performed and a list of the documents, files, and informal references (e.g., memos, verbal communications, etc.) used to perform each function.

  3. Business analysis involves the following steps:

    1. Identify mission, functions and operations;

    2. Identify tasks performed and data usage;

    3. Identify task/data relationships;

    4. Develop list of constraints;

    5. Develop list of potential future changes.

2.5.13.2.1.1  (04-01-2003)
Identify Mission, Functions and Operations

  1. Identify the mission, functions and operations of the organizational element that the database is to support. The purpose of this step is to define the scope of the potential database's current and future needs and develop a reference point for further analysis. This step covers all relevant functional areas and be developed separately from any single application design effort.

  2. In examining an organizational element, which may range in size from a branch to an entire organization, the following may provide sources of information:

    • If available, the organization's "information plan" would be the best source. These plans vary widely in content but must articulate the organization's current and future management information strategy, a discussion of each system's scope, and definitions of the dependencies between major systems (both automated and manual) and groups of data. With this information, it is possible to determine which functional areas must be included within the scope of the design.

    • If an information plan is not available, or this plan does exist but does not contain diagrams of systems and data dependencies, it will be the designer's responsibility to determine the scope. In this case, persons within the relevant functional areas must be interviewed to determine how they relate to the rest of the organization. After the areas to which they relate are determined, additional interviews can be conducted in these newly identified areas to ascertain the extent to which they share data with the application(s) under design.

    • Other potential sources of information are the Requests for Information Services (RIS), mission, functional statements, internal revenue manuals, and senior staff interviews.

    • Future changes to the organization must be considered when defining the scope of the design effort, i.e., any major changes in operating policy, regulations, etc. Each potential change must be identified and further defined to determine whether it could change the definition, usage, or relationships of the data. Where a change could affect the database in the future, the design scope should be expanded to consider the effects of this change.

  3. After determining the scope of the database, construct a high-level DFD to graphically depict the boundaries.

2.5.13.2.1.2  (04-01-2003)
Identify Tasks Performed and Data Usage

  1. Identify the tasks performed in each of the functions and operations. The purpose is to identify tasks performed in each function of the organizational element that the database would support and to identify the data usage or "data needs" of these tasks. The functions and their related tasks can be divided into two categories: operational and control/planning.

  2. Decompose each function into the lowest levels of work that require, on a repetitive basis, unique sets of data. Work at this level is considered a "task", a unique unit of work consisting of a set of steps performed in sequence. All these steps are directed toward a common goal and use and/or create a common set of data.

  3. Once a task has been defined, decompose it into subtasks. This decomposition must occur if one or more of the following conditions exist:

    • More than one person is needed to carry out the task and each of them is required to have a different skill and/or carries out his/her part independently.

    • There are different levels of authorization, i.e., different people authorize different parts of the task.

    • Different frequencies or durations apply to different parts of the task.

    • Input documents are not used uniformly within the task.

    • Totally different documents are used for different parts of the task.

    • Many different operations are carried out within the task.

    • There are different primitive operations which each have separate input/output requirements.

  4. However, when a subtask has been defined, make certain it is limited to that particular task. If it spans two or more tasks, it cannot be considered a subtask.

  5. Collect all information in a precise manner using interviews and documentation techniques. This approach is especially important when identifying operational functions because they provide the basic input to the database design process. These functions and their associated tasks must be identified first. Therefore, begin by identifying the organizational areas within the scope of the design effort which perform the functions essential to conducting business. Once these functional areas have been determined, the persons to be interviewed can be specified. The recommended approach is as follows:

    1. Determine the key individuals within these areas and send out questionnaires requesting: job titles of persons within their areas of responsibility; functions performed in each job; and a brief statement of the objective(s) of each job.

    2. After receiving the results of the questionnaire, develop a document showing job title, functions performed, and the objectives of these functions. Then review and classify each job as either operational or control and planning. Once this is completed, the contact the supervisor of each job which is identified as "operational" and ask to select one, preferably two, persons performing that job who can be interviewed.

    3. Conduct the operational interviews. Keep the following three objectives in mind: identify each operational function; identify the data associated with each of these functions; and identify the implicit and explicit rules determining when and how each function occurs.

  6. When conducting operational interviews, accomplish the following steps during the interviews:

    1. Begin by having each interviewee describe, in detail, the functions and tasks that are performed on a daily or potentially daily basis. Document these major actions, decisions, and interfaces on task analysis and decision analysis forms. See Exhibits 2.5.13-1 and 2.5.13-2. These actions, decisions, and interfaces must also be reflected on a detailed data flow diagram. This documentation can subsequently be used to verify that all operational functions and their sequence are correct. Repeat this same procedure for functions that occur weekly, monthly, quarterly, and annually.

    2. As the functions, tasks, and other activities are defined, determine the documents, files and informal references (memos, verbal communications, etc.) used to perform them and indicate these in a separate numbered list. A task/document usage matrix may also be used specifying a task's inputs and outputs in terms of documents.

    3. Once the person interviewed agrees to the contents of the documentation, discuss more specifically each action, decision, and interface point to determine what specific documents or references are required. Then request a copy of each document that has been discussed.

    4. Finally, identify the data elements actually used or created on each document and compile a list of these elements. Include their definitions and lengths. See Exhibit 2.5.13-6. Any data elements that are not included in the dictionary must be entered.

  7. The second type of information required for conceptual database development involves the organization's control and planning functions and their related data needs. An in-depth investigation of the organization's explicit and implicit operating policies is necessary. Such information can be obtained through interviews with management. Since the nature of the information collected will vary according to the organization and persons involved, there is no rigid format in which the interview must be documented. However, in order to minimize the possibility of losing or missing information, it is recommended that there be two interviewers who could alternate posing questions and taking notes.

  8. Conduct interviews for control and planning functions with persons whose responsibilities include defining the goals and objectives of the organization, formulating strategies to achieve these goals, and managing plans to implement these strategies; and with those persons directly responsible for the performance of one or more operating areas. The objective of these interviews is to gain, where appropriate, an overall understanding of:

    • The basic components of the organization and how they interact with one another.

    • The external environment that affects the organization directly or indirectly (i.e., Congressional directives, Treasury policies, etc.).

    • Explicit or implicit operating policies that determine how the mission is performed; some of these may be identified when discussing the internal and external environment.

    • Information used currently or required to plan organizational activities and measure and control performance. If available, obtain examples.

    • Changes that are forecast that may affect the organization.

  9. The following are steps for conducting control and planning interviews:

    1. Present the designer's perception of functions and operations and seek confirmation and clarification, i.e., clarify which are main functions, support functions, and sub-functions or tasks.

    2. Ask what additional functions, if any, are performed.

    3. Ask what monitoring functions are performed and what critical indicators are used to trigger intervention.

    4. Ask what planning functions are performed and what data is used for planning purposes.

    5. Express appreciation by thanking the person interviewed for his/her time.

    6. If any new data elements are defined during the interviews, make certain they are incorporated in the Enterprise Data Dictionary so that they may be properly cross-referenced.

2.5.13.2.1.3  (04-01-2003)
Identify Task/Data Relationships

  1. Collect information about data usage and identify task/data relationships. Once all functions and tasks are identified as either operational or control and planning and their data usage has been determined, add specificity to the task/data relationships. A task/data relationship is defined as the unique relationship created between data items when they are used to perform a specific task. It is critical that these relationships be carefully and thoughtfully defined.

  2. The process of defining task/data relationships begins with analyzing the documentation developed during the interviews. When identifying a series of unique tasks, follow and apply these rules:

    • A task must be performed within one functional area. Each task must consist of a set of serially performed steps (or serially positioned symbols on a DFD). If a decision point occurs and one path of the decision involves a new action, in effect the current task ends and a new one begins. Each step within a single task must be performed within a reasonable period. If a significant amount of time can elapse between two steps, more than one task must be defined.

    • Each step within the task must use the same set of data. However, if new data is created in one step of the task and used in the next step, they may be considered as the same set of data.

  3. After all the data flows and other documentation have been analyzed and assigned to tasks, compare the tasks for each duplicate interview to determine if the same ones were defined - This is assuming that two persons with the same job title were interviewed in each relevant area. When conflicts are found, compare the two sets of documentation to determine if one is merely more detailed:

    • If the DFDs, etc., appear to be the same and differ only on levels of detail, choose the one that best defines a complete unit of work.

    • If real functional differences are found, review the documents (and notes) associated with each. Sometimes people with similar titles perform different functions due to their seniority or competence. When major differences are found, separate any unique tasks and add them to the list.

    • If differences are found and it is difficult to determine why they exist, request that the appropriate supervisor review the task definitions developed during the interviews. (However, do not include any portions of the interviews that are confidential).

  4. Once any conflicting definitions have been resolved, task/data relationships specifically documented. Because it is likely that redundant tasks have been defined, arrange the documentation already produced by department or area. This method increases the likelihood that redundant tasks will be identified. It is suggested that the documentation of task/data element relationships begin with a table such as the one shown in Figure 2.5.13-2. The documentation must:

    • Numerically identify each task

    • Briefly define each task by a verb-object type command (e.g., fill out error report, request alternate items, etc.)

    • Classify tasks as operational or control/planning Identify the frequency and average volume for each task

    • Relate each task to a specific functional area

    • Then construct a task/data element matrix to specify each task's inputs and outputs in terms of data elements

    Figure 2.5.13-2

    Task # Task Definition Type Frequency Average Volume Department Data Elements
    1 Examine order request operational daily 500 Order Entry 410, 200, 201 - 225
    ... ... ... ... ... ... ...
    ... ... ... ... ... ... ...

2.5.13.2.1.4  (04-01-2003)
Develop a List of Constraints

  1. Develop a list of all implicit and explicit constraints such as security, data integrity, response or cyclic processing time requirements. The purpose of developing a list of all implicit and explicit constraints is to provide information for the physical database designer to use in determining operational considerations such as access restrictions, interfaces to other packages, and recovery capabilities. Document constraints using either a tabular or a memo format. Examples of items to be considered are:

    • Data security needs

    • Access and processing cycle time requirements

    • Special display or calculation requirements

    • Special equipment utilization

2.5.13.2.1.5  (04-01-2003)
Develop a List of Potential Future Changes

  1. Develop a list of potential future changes and the way in which they may affect operations. The purpose of this step is to include in the database design considerations that may affect operations in the future. Consider future changes to include anything that may affect the scope of the organization, present operating policies, or the relationship of the organization to the external environment. When reviewing the interviews to identify changes, highlight anything that implies change, and, if possible, the effect(s) of that change.

2.5.13.2.2  (04-01-2003)
Data Modeling

  1. Data modeling is a technique that involves the analysis of data usage and the modeling the relationships among entities. These relationships are modeled independent of any particular hardware or software system. The objective of logical design is to clearly define and depict user perspectives of data relationships and information needs.

  2. The various approaches to logical database design involve two major design methodologies-entity analysis and attribute synthesis.

  3. In applying this method, the primary tool used is the data relationship diagram. This type of diagram is used to facilitate agreement between the designer and users on the specific data relationships and to convey those relationships to the physical database designer. It is a graphic representation of data relationships. The format used must be either the data structure diagram or entity-relationship diagram. See Exhibits 2.5.13-4 and 2.5.13-5.

  4. Simplify the modeling process by partitioning the model into the following four design perspectives:

    • The organizational perspective reflects senior and middle management's view of the organization's information requirements. It is based on how the organization operates.

    • The application perspective represents the processing that must be performed to meet organizational goals, i.e., reports, updates, etc.

    • The information perspective depicts the generic information relationships necessary to support decision-making and long-term information requirements. It is represented by user ad hoc queries, long-range information plans, and general management requirements.

    • The event perspective deals with time and scheduling requirements. It represents when things happen, e.g., frequency of reports.

  5. There are two general rules that provide the foundation for design perspectives:

    • The design perspectives are modeled by three types of constructs: entity, attribute and relationship;

    • In the design perspective, each component of information must be represented by one, and only one, of these constructs.

  6. An entity refers to an object about which information is collected, e.g., a person, place, thing, or event. A relationship is an association between the occurrences of two or more entities. An attribute is a property of an entity, that is, characteristic about the entity, e.g., size, color, name, age, etc.

  7. It is important to remember that data usage is dynamic. Perceptions change, situations change and rigid concepts of data use are not realistic. Data involves not only values but relationships as well and must be divided into logical groups before being molded into whatever structures are appropriate-matrices, entity relationship diagrams, data structure diagrams, etc. If at any point it becomes apparent that a database approach is definitely not suitable or practical for whatever reason, take an alternative path as soon as possible to save vital resources.

  8. Data modeling involves the following steps:

    1. Identify local views of the data.

    2. Formulate entities.

    3. Specify relationships.

    4. Add descriptive attributes.

    5. Consolidate local views and design perspectives.

    6. Verify the data model.

2.5.13.2.2.1  (04-01-2003)
Identify Local Views of the Data

  1. Identify local views of the data. Develop local views for the organization, application, information, and event design-perspectives.

  2. For each of the functions, activities and tasks identified, there exists what may be called "sub perspective" or local views of the data. Normally there will be several local views of the data depending on the perspective. These views correspond to self-contained areas of data that are related to functional areas. The selection of a local view will depend on the particular perspective and the size of the functional area. Factors which must be considered in formulating local views include a manageable scope and minimum dependence on, or interaction with, other views.

  3. The primary vehicles for determining local views will be the task/data element matrices and the task analysis and description forms constructed during logical database analysis. See Exhibits 2.5.13-2 and 2.5.13-3.

2.5.13.2.2.2  (04-01-2003)
Formulate Entities

  1. For each local view, formulate the entities that are required to capture the necessary information about that particular view.

  2. At this point the designer is confronted with two major considerations. The first consideration deals with the existence of multiple entity instances and can be addressed by using the concept of "type" or "role". For example, the population of the entity EMPLOYEE can be categorized into employees of "type": computer systems analyst, secretary, auditor, etc. It is important, at this stage of logical design to capture the relevant types and model each as a specific entity. The generalization of these types into the generic entity EMPLOYEE will be considered in the next stage of conceptual design where user views are consolidated.

  3. The second consideration deals with the use of the entity construct itself. Often a piece of information can be modeled as either an entity, attribute, or relationship. For example, the fact that two employees are married can be modeled using the entity MARRIAGE, the relationship IS-MARRIED-TO, or the attribute CURRENT-SPOUSE. Therefore, at this point in the design process the designer must be guided by two rules. First, use the construct that seems most natural. If this later proves to be wrong, it will be factored out in subsequent design steps. Second, avoid redundancy in the use of modeling constructs; use one and only one construct to model a piece of information.

  4. One rule of thumb, which has been successfully used to restrict the number of entities identified so that a local view can be properly represented, is the "magic number seven, plus or minus two." This states that the number of facts (information clusters) that a person can manage at any one time is about seven, give or take two. Therefore, when this is applied to the database design process, the number of entities contained in a local view must, at the most, be nine, but probably closer to six or seven. If this restriction cannot be met, perhaps the scope of the local view is too large.

  5. Give careful consideration to the selection and assignment of an entity name. Since an entity represents a fact, give a precise name to this fact. This is also important later when views are consolidated because that subsequent stage deals with homonyms and synonyms. If the name given to an entity does not clearly distinguish that entity, the integration and consolidation process will carry this distortion even further.

  6. Finally, select identifying attributes for each entity. Although a particular collection of attributes may be used as the basis for formulating entities, the significant attribute is the identifier (or primary key) that uniquely distinguishes the individual entity instances (occurrences), for example, employee number. This entity identifier is composed of one or more attributes whose value set is unique. This is also important later in the consolidation phase because the identifying attribute values are in a one-to-one correspondence with the entity instances. Therefore, two entities with the same identifiers may to some degree be redundant. However, this will depend on their descriptive attributes and the degree of generalization.

2.5.13.2.2.3  (04-01-2003)
Specify Relationships

  1. Identify relationships between the entities. In this step, additional information is added to the local view by forming associations among the entity instances. There are several types of relationships that can exist between entities. These include:

    • Optional relationships

    • Mandatory relationships

    • Exclusive relationships

    • Contingent relationships

    • Conditional relationships

  2. In an optional relationship the existence of either entity in the relationship is not dependent on that relationship. For example, there are two entities, OFFICE and EMPLOYEE. Although an office may be occupied by an employee, they can exist independently.

    Figure 2.5.13-3

    This image is too large to be displayed in the current screen. Please click the link to view the image.

  3. In a mandatory relationship, the existence of both entities is dependent on that relationship.

    Figure 2.5.13-4

    This image is too large to be displayed in the current screen. Please click the link to view the image.

  4. An exclusive relationship is a relationship of three entities where one is considered the prime entity that can be related to either one of the other entities but not both.

    Figure 2.5.13-5

    This image is too large to be displayed in the current screen. Please click the link to view the image.

  5. In a contingent relationship the existence of one of the entities in the relationship is dependent on that relationship. A VEHICLE is made from many PARTS.

    Figure 2.5.13-6

  6. A conditional relationship is a special case of the contingent relationship. When it occurs the arrow must be labeled with the condition of existence.

    Figure 2.5.13-7

  7. Relationships can exist in several forms. The associations can be one-to-one (1:1), one-to-many (1:N) or many-to-many (N:N). A one-to-one association is shown by a single-headed arrow and indicates that the relationship involves only one logical record, entity or entity class of each type. A one-to-many association is shown by a double-headed arrow and documents the fact that a single entity, entity class or logical record of one type can be related to more than one of another type. A many-to-many association is shown by a double-headed arrow in both directions.

  8. An informal procedure for identifying relationships is to pair each entity in the local view with all other entities contained in that view. Then for each pair, ask if a meaningful question can be proposed involving both entities or if both entities may be used in the same transaction. If the answer is yes to either question, determine the type of relationship that is needed to form the association. Next, determine which relationships are most significant and which are redundant. Of course, this can be done only with a detailed understanding of the design perspective under consideration.

2.5.13.2.2.4  (04-01-2003)
Add Descriptive Attributes

  1. Add descriptive attributes. Attributes can be divided into two classes-those that serve to identify entity instances and those that provide the descriptive properties of entities. The identifier attributes, which uniquely identify an entity, were added when the entities were formulated. Descriptive attributes help describe the entity. Examples of descriptive attributes are color, size, location, date, name and amount.

  2. In this step of local view modeling, the descriptive attributes are added to the previously defined entities. Only single-valued attributes are allowed for the description of an entity.

2.5.13.2.2.5  (04-01-2003)
Consolidate Local Views and Design Perspectives

  1. Consolidate local views and design perspectives. Consolidation of the local views into a single information' structure is the major effort in the logical database design. It is here that the separate views and applications are unified into a potential database. Three underlying concepts that form the basis for consolidating design perspectives; these concepts are identity, aggregation, and generalization.

  2. Identity is a concept which refers to synonymous elements. Two or more elements are said to be identical, or to have an identity relationship, if they are synonyms. Although the identity concept is quite simple, the determination of synonyms is not. Owing to inadequate data representation methods, the knowledge of data semantics is really quite limited. Typically, an in-depth understanding of the user environments is required to determine if synonyms exist. Determining whether similar definitions may be resolved to identical definitions, or if one of the other element relationships really applies, requires a clear and detailed understanding of user functions and data needs.

  3. Aggregation is a concept in which a relation between elements is considered to become another higher-level element. For example, EMPLOYEE may be thought of as an aggregation of NAME, SSN, and ADDRESS. Actually many aggregations are easy to identify since the major data models incorporate syntax that can represent aggregations.

  4. Generalization is a concept in which a group of similar elements is thought of as a single generic element by suppressing the differences between them. For example, the entity "EMPLOYEE" may be thought of as a generalization of "FACTORY-WORKER", "OFFICE-WORKER", and "EXECUTIVE". An instance of any of these three types is also an instance of the generalized "EMPLOYEE". This is the most difficult concept to grasp and care must be taken not to confuse it with aggregation. Whereas aggregation can be thought of as parts making up a "whole", generalization is concerned only with "wholes".

  5. Since aggregation and generalization are quite similar in structure and application, one element may participate in both aggregation and generalization relationships.

  6. Inferences can be drawn about the aggregation dimension from the generalization dimension and vice versa, e.g., it can be inferred that each instance of "EXECUTIVE" is also an aggregation of Name, SSN, and Address. See Figure 2.5.13-8.

    Figure 2.5.13-8

    This image is too large to be displayed in the current screen. Please click the link to view the image.

  7. There are three consolidation types. These types may be combined in various ways to construct any type of relationship between objects (elements) in different user views. By combining consolidation types, powerful and complex relationships can be represented. In fact, we recommend that most semantic relationships be represented by some combination of these types of consolidation. The consolidation types are:

    • Identity Consolidation - Two objects may be semantically identical with the additional option of having identical names. Homonyms must be guarded against as well as similar, but not identical, objects. Similarity is best expressed using aggregation and generalization. As a check on the consistency of the consolidation and also on user views, if an object from User is view is found to be identical to an object from User 2's view, neither of these objects can participate further in any other identity consolidations between these two views. This is true because each object is assumed to be unique within the context of its own local user view.

    • Aggregation Consolidation - This may occur in two forms. The difference depends on whether or not one of the users has specified the aggregated "whole" object. An example of the simpler form is where User 1 has specified a number of objects without making any consolidation type relationships between them, e.g., an inventory view of HANDLE BARS, WHEELS, SEATS, and FRAMES. However, User 2 has specified an object, BICYCLE, which is an aggregation of User is objects. The conceptually more difficult version of aggregation occurs when both users have specified some or all of the parts of an unmentioned "whole". As an example, when separate inventory functions are maintained for basic, non-variable parts (FRAMES, WHEELS) and for parts that may be substituted by customer request (SEATS, HANDLE BARS). This type of aggregation is more difficult to recognize since neither user has defined a BICYCLE object.

    • Generalization Consolidation - This may also occur in two forms. Again, the difference lies in whether either of the users has specified the generalized or generic object.

  8. The consolidation process comprises four steps:

    1. Select perspectives.

    2. Order local views within each perspective.

    3. Consolidate local views within each perspective.

    4. Resolve conflicts.

  9. Select perspectives. First, confirm the sequence of consolidation by following the order of design perspectives. Since this order is general, check it against the objectives of the database being designed. For example, if you are designing a database for a process-oriented organization, you might consider the key perspectives to be the application and event perspectives and therefore begin the process with these.

  10. Order local views within each perspective. Once the design perspectives have been ordered, focus the consolidation process on local views within each perspective. Several views comprise the perspective chosen and this second step orders these views for the consolidation process. The order must correspond to each local view's importance with respect to specific design objectives for the database.

  11. Consolidate local views within each perspective. This step is the heart of the consolidation process. For simplicity and convenience, use binary consolidation, i.e., integrating only two user views at a time. This avoids the confusion of trying to consolidate too many views. The order of consolidation is determined by the previous step where the local views within a perspective have been placed in a particular order. The process proceeds as follows:

    1. Take the top two views in the perspective being considered and consolidate these using the basic consolidation principles.

    2. Using the binary approach, merge the next local view with the previously consolidated local views. Continue this process until the last view is merged. When the consolidation process is completed for the first design perspective, the next design perspective is introduced and this process continues until all perspectives are integrated.

  12. Resolve conflicts. Conflicts can arise in the consolidation process for a number of reasons, primarily because of the number of people involved and the lack of semantic power in our modeling constructs. They may also be caused by incomplete or erroneous specification of requirements. Although the majority of these conflicts are dealt with in the consolidation step using the rules previously discussed, any remaining conflicts that have to be dealt with by designer decisions are taken care of in this step. When a design decision is made, it is important to "backtrack" to the point in the consolidation process where these constructs were entered into the design. At this point the implications of the design decision are considered and also their effects on the consolidation process.

2.5.13.2.2.6  (04-01-2003)
Present Data Model

  1. The purpose of this step is to present the data model. Use data relationship diagrams to document local views and their consolidation. These must take the form of an entity-relationship diagram or a data structure diagram. See Exhibits 2.5.13-4 and 2.5.13-5.

  2. When constructing either of these diagrams, use these rules of thumb:

    • Each entity and relationship must be clearly labeled.

    • Each entity must be related to at least one other entity.

    • Show attributes only if they uniquely identify the entity or are a common access path to the entity.

    • Limit entities and relationships for a given activity to a single page.

    • Identify the name of the activity supported at the top of the page.

    • If a data relationship diagram for an activity needs to span more than one page, use the same off-page connectors as used in DFDs.

2.5.13.2.2.7  (04-01-2003)
Verify Data Model

  1. Verify the data model. The purpose of this step is to verify the accuracy of the data model and obtain user concurrence on the proposed database design.

  2. The process of developing the information structure involves summarizing and interpreting large amounts of data concerning how different parts of an organization create and/or use that data. However, it is extremely difficult to identify and understand all data relationships and the conditions under which they may or may not exist.

  3. Although the design process is highly structured, it is still probable that some relationships will be missed and/or expressed incorrectly. In addition, since the development of the information structure is the only mechanism that defines explicitly how different parts of an organization use and manage data, it is reasonable to expect that management, with this newfound knowledge, might possibly consider some changes. Because of this possibility, it is necessary to provide management with an understanding of the data relationships shown in the information structure and how these relationships affect the way in which the organization performs, or can perform, its mission. Each relationship in the design and each relationship excluded from the design must be identified and expressed in very clear statements that can be reviewed and approved by management. Following management's review, the design will, if necessary, be adjusted to reflect its decisions.

  4. The verification process is separated into two parts, self-analysis and user review. In self analysis, the analyst must insure that:

    • All entities have been fully defined for each function and activity identified.

    • All entities have at least one relation to another entity.

    • All attributes have been associated with their respective entities.

    • All data elements have been defined in the Enterprise Data Dictionary.

    • All processes in the data flow diagram can be supported by the database when the respective data inputs and outputs are automated.

    • All previously identified potential changes have been assessed for their impact on the database and necessary adjustments to the database have been determined.

  5. To obtain user concurrence on the design of the database, perform the following steps, in the form of a walk-through, to interpret the information structure for the user:

    1. State what each entity is dependent upon (i.e., if an arrow points to it). Example: All ORDERS must be from CUSTOMERS with established accounts;

    2. State what attributes are used to describe each entity;

    3. Define the perceived access path for each entity;

    4. Define the implications of each arrow (i.e., one-to-one, one-to-many etc.);

    5. Define what information cannot exist if an occurrence of an entity is removed from the database.

  6. Give the user an opportunity to comment on any perceived discrepancies in the actual operations or usage. If changes need to be made, then give the user the opportunity to review the full design at the completion of the changes. Once all changes have been made and both the relationship diagram and the data definitions have been updated, obtain user concurrence on the design specifications.

2.5.13.3  (04-01-2003)
Physical Database Design

  1. The boundary between logical and physical database design is difficult to assess because of the lack of standard terminology. However, there seems to be general agreement that logical design encompasses a DBMS-independent view of data and that physical design results in a specification for the database structure, as it will be physically stored. The design step between these two that produces a schema that can be processed by a DBMS can be called implementation design. The DBMS-independent schema developed during logical design is one of the major inputs. Refinements to the database structure that occur during this design phase are developed from the viewpoint of satisfying DBMS-dependent constraints as well as the more general constraints specified in the user requirements.

  2. The major objective of implementation design is to produce a schema that satisfies the full range of user requirements and that can be processed by a DBMS. These extend from integrity and consistency constraints to the ability to efficiently handle any projected growth in the size and/or complexity of the database. However, these must be considerable interaction with the application program design activities that are going on simultaneously with database design. Analyze high-level program specifications and program design guidance supplied to correspond to the proposed database structure.

  3. The guidance provide in this section serves a dual purpose. First, it provides general guidelines for physical database design. Various techniques and options used in physical design are discussed as well as when and how they must be used to meet specific requirements. These guidelines are generic in nature and for this reason are intended to provide a basic understanding of physical database design prior to using specific vendor documentation. Second, since database management systems vary according to the physical implementation techniques and options they support, these guidelines will prove useful during the database management software procurement. They will provide a means for evaluating whether a DBMS under consideration can physically handle data in a manner that will meet user requirements.

  4. The usefulness of these guidelines is directly related to the where one is in a development life cycle and the level of expertise of a developer. This document assumes the reader is familiar with database concepts and terminology since designers will most likely be database administrators or senior computer specialists. To aid the reader, a glossary and bibliography are provided. See Exhibits 2.5.13-9 and 2.5.13-10.

  5. The criterion for determining physical design is quite different from that of logical design. Selection of placement and structure is determined by evaluating such requirements as operational efficiency, response time, system constraints and security concerns. This physical design layout must be routinely adjusted to improve the system operation, while maintaining the user's logical view of data. The physical structuring or design will often be quite different from the user's perception of how the data is stored.

  6. The following steps provide general guidance for physical database design. Since much of the effort will depend on the availability of data and resources, the sequence of these steps is flexible:

    1. Determine user requirements;

    2. Determine processing environment;

    3. Select database management system software;

    4. Design physical placement of data;

    5. Perform sizing of data;

    6. Consider security and recovery.

2.5.13.3.1  (04-01-2003)
Determine the User’s Requirements

  1. These critical factors will dictate the weight placed on the various physical design considerations to be discussed. The following are examples of user requirements. Notice that with each requirement there is an example of an associated trade-off.

  2. Retrieval time decreases with a simple database structure; however, to meet the logical design requirements, it may be necessary to implement a more complex multilevel structure.

  3. Ease of recovery increases with a simple structure but satisfying data relationships may require more complex mechanisms.

  4. Due to increased pointer or index requirements, hardware cost is increased if information is spread over many storage devices; however, data clustering and compacting degrade performance.

  5. Privacy requirements may require stringent security such as encryption or data segmentation. These procedures decrease performance, however, in terms of update and retrieval time.

  6. Active files, especially those accessed in real time, dictate high-speed devices; however, this will represent increased cost.

2.5.13.3.2  (04-01-2003)
Determine the Processing Environment

  1. By attempting to determine the primary type of processing, the designer has a framework of physical requirements with which to begin design. Three environments will be discussed; however, keep in mind that these are merely guidelines since most systems will not fit neatly into one general set of requirements. These considerations will often conflict with the user's requirements or security needs (to be discussed), thus forcing the designer to make decisions regarding priority.

  2. Normally this environment requires fast response time, and multiple run units will actively share DBMS facilities. In order to meet response time specifications, cost may increase due to the necessity for additional system resources. Recovery may be critical in such a volatile environment, and whenever possible, use a simple structure. In a CODASYL (network) structure, this time specification would translate into reduced data levels, number of network relationships, number of sets and size of set occurrences. In a high volume processing environment requests are most frequently random in nature requiring small amounts of information transfer; thus affecting page and buffering considerations.

  3. Low volume systems generally process more data per request, indicating run units may remain in the system longer. There is the likelihood of more sequential requests and reports, and response time is probably not the critical issue. Resources may be more limited in this environment, implying smaller buffers and perhaps fewer peripherals. With the possibility of fewer resources, those resources may need to be more highly utilized. On-line recovery techniques may be unavailable since the resource requirements are costly. Although the number of transactions is low in this environment, the probability of multiple simultaneous run units accessing the same data may be high.

  4. When a batch environment is indicated, the designer is left with maximum flexibility since the requirement is reasonable turnaround time and effective use of resources. Because of job scheduling options, concurrency problems can be controlled. Recovery tends to be less critical and will be determined by such factors as file volatility, the time necessary to rerun update programs, and the availability of input data. For example, if the input data is readily available, the update programs short and processing 85 percent retrieval; the choice may be made to avoid the overhead of maintaining an on-line recovery file.

2.5.13.3.3  (04-01-2003)
Select DBMS Software

  1. The DBMS must first physically support the logical design requirements. That is, based on the logical data model, the package must support the required hierarchical, network or relational structure. Early stages of analysis must provide enough information to determine this basic structure. From a physical database design point of view, an analysis must then be made as to how effectively the DBMS can handle the organizational and environmental considerations. If the proposed package fails to provide adequate support of the requirements, the project manager must be notified. The notification must include the specific point(s) of failure, anticipated impact(s), and any suggestions or alternatives for alleviating the failure(s).

2.5.13.3.4  (04-01-2003)
Design the Physical Placement of Data

  1. This procedure involves selecting the physical storage and access methods as well as secondary and multiple key implementation techniques. DBMS packages vary as to the options offered. The use of vendor documentation, providing specific software handling details, will be necessary to complete this process. Exhibit 2.3.13-8 provides a summary of access methods and their uses.

2.5.13.3.5  (04-01-2003)
Perform Sizing of Data

  1. Obtain the specifics of sizing from vendor documentation as each DBMS handles space requirements in a different manner. Consider Sizing in conjunction with designing the placement of data. Once data records, files and other DBMS specifics have been sized according to a proposed design, a decision may be made, because of the space allocation involved, to change the design. Data compaction techniques may be considered at this point. Flexibility to make changes and reevaluate trade-offs during this entire procedure is of critical importance.

2.5.13.3.6  (04-01-2003)
Consider Security and Recovery

  1. The DBMS selected must have the options necessary to implement security and recovery requirements. Implementation of these considerations will often cause trade-offs in other design areas.

2.5.13.4  (04-01-2003)
Deliverables

  1. In applying database design, the following deliverables result:

    1. Decision Analysis and Description Forms

    2. Task Analysis and Description Forms

    3. Task/Data Element Usage Matrix

    4. Data Models

    5. Entity-Attribute Lists

    6. Data Definition Lists

    7. Physical Data Base Specifications Document

2.5.13.4.1  (04-01-2003)
Decision Analysis and Description Forms

  1. Decision Analysis and Description Forms must identity such items as type of decision, the decision maker, and the nature of the decision. Exhibit 2.5.13-1 provides guidelines and shows a sample form.

2.5.13.4.2  (04-01-2003)
Task Analysis and Description Forms

  1. Task Analysis and Description Forms must include the name of the task, its description (overview), the people/departments involved, and subtasks and their relationships. Exhibit 2.5.13-2 shows a sample form and provides guidelines.

2.5.13.4.3  (04-01-2003)
Task/Data Element Usage Matrix

  1. A task/data element usage matrix relates each data element to one or more tasks. Exhibit 2.5.13-3 shows a sample matrix.

2.5.13.4.4  (04-01-2003)
Data Models

  1. Data relationship diagrams depict the relationships between entities. These are tools that provide one way of logically showing how data within an organization is related. They must be models using conventions for either data structure diagrams or entity relationship diagrams. Exhibits 2.5.13-4 and 2.5.13-5 provide samples of these diagrams and more detailed guidelines. The term "entity" refers to an object of interest-person, place, thing, event-about which information is collected. When constructing either of these diagrams it is recommended that the entities be limited to those of fundamental importance to the organization.

2.5.13.4.5  (04-01-2003)
Entity-Attribute Lists

  1. Entity-attribute relation lists may be derived from the Enterprise Data Dictionary listings.

2.5.13.4.6  (04-01-2003)
Data Definition Lists

  1. Data definition lists may be derived from Enterprise Data Dictionary listings. Exhibit 2.5.13-6 provides a sample.

2.5.13.4.7  (04-01-2003)
Physical Database Specifications Document

  1. It is not the intent of this handbook to present a standard format for documentation but rather to state the required content for Physical Database Specifications. Duplication of effort can be eliminated if there are existing documents available containing physical specifications. Use the following resources for developing documentation:

    • DBMS provided documentation - For example, a listing of the scheme from a Codasyl DBMS will provide such detail as data names, sizing, placement and access methods. Data Dictionary - The

    • Data Dictionary listing provides certain physical specifications, such as, data format and length.

    • Project documentation - All documentation submitted as Physical Database Specifications must be organized in a macro to micro manner or global to specific. That is, begin at the schema level, moving to subschema, indices, data elements, etc. The objective is to organize documentation in a manner that is clear and easy for the user to read.

2.5.13.4.7.1  (04-01-2003)
Names

  1. Where appropriate in the documentation, identify names of physical database items. Specifically, the items will be all those defined to the DBMS software, such as:

    • Schema

    • Subschema

    • Set

    • Record

    • Field

    • Key

    • Index Names

  2. Where data naming standards are applicable, these standards shall be met to the extent possible with the DBMS software. For example, if the DBMS does not permit hyphens in naming, an exception would be made to the standard "all words in a name must be separated with a hyphen".

2.5.13.4.7.2  (04-01-2003)
Data Structure/Sizing

  1. Identification of data elements, associations within and between record types as well as sizing requirements are identified and documented during the logical database design process. The physical representation of data structures will vary from the logical, however, since the physically stored data must adhere to specific DBMS characteristics. As applicable to the DBMS, the following structures must be documented:

    • Records

    • Blocks/Pages

    • Files

  2. Describe the physical record layout. In this description, include the data fields, embedded pointers, spare data fields, and database management system overhead (flags, codes, etc). Besides data record types, document any other record types such as index records. If records are handled as physical blocks or pages, provide the following:

    • Calculations determining block/page size

    • Total number of blocks/pages allocated

  3. Describe the strategy for determining block/page sizes.

  4. Specify the amount of space allocated for each database file. This must be consistent with the total record and block/page sizing documentation described above.

2.5.13.4.7.3  (04-01-2003)
Data Placement

  1. For each record type:

    • State the storage and access method used.

    • Describe the storage and assess method.

    • Where applicable, identify the physical data location (track, cylinder).

  2. Where an algorithm access method is used:

    • Give the primary record key to be used by the algorithm.

    • Describe the algorithm used, including the number and size of randomized address spaces available to the algorithm.

    • Give the packing density, and the strategy for its determination.

  3. Where an index sequential access method is used:

    • Give the primary record key.

    • State indexing strategy/levels.

    • State initial load strategy.

  4. Where a chains access method is used:

    • Give the access path to the record type (i.e., Is this primary access of detail record though the master record or is this a chain of secondary keys?)

    • List the pointer options used (i.e., forward, backward, owner, etc.)

    • Indicate whether the chain is scattered or stored contiguously with the master.

  5. Where an index access method is used, identify keys used to index a record type.

Exhibit 2.5.13-1  (04-01-2003)
Guidelines for Decision Analysis and Description Forms

This image is too large to be displayed in the current screen. Please click the link to view the image.

Exhibit 2.5.13-2  (04-01-2003)
Guidelines for Task Analysis and Description Forms

This image is too large to be displayed in the current screen. Please click the link to view the image.

Exhibit 2.5.13-3  (04-01-2003)
Sample Task/Data Element Matrix

This image is too large to be displayed in the current screen. Please click the link to view the image.

Exhibit 2.5.13-4  (04-01-2003)
Guidelines for Constructing Data Structure Diagrams

The data structure diagram is a tool for expressing logical relationships. The notation describes relationships between logical relationships. The notation describes relationships between logical records, individual entities, entity classes or a combination of these objects. A logical record is a collection of data elements related to an entity. An entity class is a group of entities that are sufficiently similar in terms of the attributes that describe the entities.
 
Two basic graphic symbols are used to build a DSD, that is, a rectangle and an arrow. A rectangle depicts an entity class. An arrow depicts a relationship. A DSD may depict as many entity classes and relationships as required to model the information structure studied.
 
The following only shows that two entity classes have been defined.
This image is too large to be displayed in the current screen. Please click the link to view the image.
These symbols do not indicate how many individual entities constitute each entity class. Relationships depict cardinality, e.g., one to one, one to many, many to many, etc. The following depicts a relationship between entity classes. This relationship indicates that each Department comprises one to many Employee .
 
This image is too large to be displayed in the current screen. Please click the link to view the image.
Data structure diagrams can represent hierarchical structure, network structure, or a combination of these structures. The following diagram depicts a hierarchical structure.
 
This image is too large to be displayed in the current screen. Please click the link to view the image.
The above diagram depicts that an employee works several projects and that each project can have only one name and one completion date. Hierarchical structures are distinguished from other relationships because every node can have only one parent node, except the root, which has no parent. The descendents of a node are called children .
 
In contrast, network structures allow a logical record, entity or entity class to have more than one parent. The following examples show a one-to-many and a many-to-many relationship. The first example shows that a student has only one advisor and one major but an advisor counsels many students and many students choose a particular academic major. The second example shows that a student has many classes and that a class comprises many students.
This image is too large to be displayed in the current screen. Please click the link to view the image.
In a DSD, only the blocks, their names, and the arrows have meaning. The position and proportion of these symbols should be selected for readability.
 
Finally, a note of caution: large data structure diagrams should not be confused with large databases having many record which have been built in response to these diagrams. Although each entity in a database needs an entity class to define it, a particular entity class may represent one, ten thousand, or a million records in storage.

Exhibit 2.5.13-5  (04-01-2003)
Guidelines for Constructing Entity Relationship Diagrams (ERD)

An entity relationship diagram describes an organization’s information structure in terms of three basic components: entities, relationships, and attributes of entities and relationships. They closely correspond to the concept of nouns, verbs, and adjectives.
 
It is sometimes a difficult task to find the best way to derive an E-R diagram. This can be true at any level whether representing a specific local view or a particular design perspective. The difficulty arises because subjective value judgments are often required, e.g., Designer A may consider PROJECT as an entity type, but Designer B sees it as an attribute of the entity DEPARTMENT. To obtain more objectivity, consider factors such as the purpose of the database and the specific system under study.
 
It is recommended that the following steps be taken prior to constructing and E-R diagram at any level. First, define a pool of entity types which are readily identifiable and important to the system. Consider these the base entity types for that level. For each of these, identify as many related entity types as possible. Develop worksheets limiting each page to one of these base entity types. Place the rectangle (block) representing this entity in the center of the page. Then show related entity types as rectangle surrounding the base entity type. One method to pair two entities is to ask if a meaningful question can be proposed involving both of them or if both entities can be used in the same transaction. If the answer is yes to either question, or to both, determine the type of relationship that is needed to form the association. For each of these related entities, indicate the relationship by connecting it to the base entity either by a straight line (or a diamond symbol) labeled with the name of the relationship. Then add connectivity to describe how many instances of the base entity, i.e., one-to-one, one-to-many, etc. The following diagrams show alternate ways of expressing the relationship between two entities, E1 and E2:
This image is too large to be displayed in the current screen. Please click the link to view the image.
Entity E1 may be associated with many instances of E2, but each instance of E2 is associated with only one instance of E1. Either of the above notations is acceptable. The labeled line is simpler, but the diamond symbol has the advantage of emphasizing the importance of the relationship.
 
There are also special relationships where one or more entity types are dependent in some way on another entity type. A subtype dependence (S) is shown by the following diagram in which products are classified into three types:
This image is too large to be displayed in the current screen. Please click the link to view the image.
 
The next diagram shows existence (E) dependence. If an Employee entity does not exist in the organization, then no information will be maintained on his/her Dependents.
 
There is currently no description available for this image. For help with this image, please call the IRS.gov Helpdesk at 1-800-876-1715.
 
The following diagram shows identifier (ID) dependence. The ID of the State must be known before the City can be uniquely identified and both the City and State must be known before the Street can be identified.
 
There is currently no description available for this image. For help with this image, please call the IRS.gov Helpdesk at 1-800-876-1715.
 
The following diagram represents an example of precedence dependence. Here a Propose Budget must precede an Approved Budget.
 
There is currently no description available for this image. For help with this image, please call the IRS.gov Helpdesk at 1-800-876-1715.
After relationships have been defined, the next step is to resolve any naming conflicts such as homonyms or synonyms, e.g., would DEPARTMENT be more meaningful as SECTION; are PROJECT and TASK really the same entity?
 
Then identify attributes. Each entity should have at least one attribute that will serve as a unique identifier. After this identifying attribute has been determined, add any descriptive attributes. Sometimes it is difficult to distinguish whether an element should be treated as an attribute of an entity type or as a separate related entity. In the following example, the designation of type-1 and type-2 is used strictly for illustration.
 
Consider an attribute of a type-1 entity. Would it be better suited as a separate entity, type-2, related to type-1? It is recommended that this attribute be treated as a separate entity if one or more of the following conditions are met.
 
1. It has significance to the system independent of its relationship to the type-1 entity.
2. As a type-2 entity, it could be related to several type-1 entities.
3. As a type-2 entity, it is related to one or more entity types in addition to type-1.
 
This image is too large to be displayed in the current screen. Please click the link to view the image.
For those readers unfamiliar with the E-R diagram conventions, a National Bureau of Standards publication deals with this representation in depth. A Logical Database Design Framework is available as publication PB82203316 from the National Technical Information Service, U.S. Department of commerce.

Exhibit 2.5.13-6  (04-01-2003)
Sample Data Definition List

Sample Data Definition List
Name Definition Length
(Name of data element) (Definition of data element) (Length of data element)
... ... ...
... ... ...

Exhibit 2.5.13-7  (04-01-2003)
Glossary of Terms for Logical Database Design

Attribute - A data item that comprises a single piece of information about an entity. It usually cannot be broken down into parts that have meanings of their own. Attributes can be divided into two classes--those that identify entity instances (occurrences) and those that provide the descriptive properties of an entity.
 
Bachman Diagram - Another name for a data structure diagram. It is named for Charles Bachman who invented it.
 
Branches - The relationship between the records in a tree (hierarchical) data structure.
 
Conceptual Design - Analysis of formally specified processing-independent information requirement and the formulation of a DBMS-independent information structure that accurately models the organization and its important data elements and relationships.
 
Conceptual Model - The overall logical structure of a database (often referred to as conceptual schema) which is independent of any software or data storage structure. (Some organizations may prefer the term "logical model" because "conceptual" might imply that the mode may never be implemented).
 
Database - A collection of interrelated data stored together with controlled redundancy to serve one or more applications; the data are stored so that they are independent of the programs which use them.
 
Database Administrator (DBA) - One or more individuals, possibly aided by a staff, who manage an organization's database resource.
 
Database Design - The process of developing an implemental database structure from user requirements.
 
Database Design Methodology - A collection of techniques and tools employed within an organizational framework that can be applied consistently to successive database development projects.
 
Database Management System (DBMS) - The collection of software required for using a database, which presents multiple views of the data to the users and programmers.
 
Database System - The combination of DBMS software, applications software, database and operating system-hardware environment brought together to provide information services for users.
 
Data Description (Definition) Language (DDL) - A language for describing data. In some cases software uses D DL only for logical data, or only for physical data or for both.
 
Data Dictionary - A catalogue of all data types, giving their names and structures, and information about data usage.
 
Data Element - Synonymous with data item.
 
Data Independence - The property of being able to change the overall logical or physical structure of the data without changing an application program view of that data.
 
Data Item - The smallest unit of data that has meaning in describing information, the smallest unit of named data.
 
Data Model - A logical representation of the data structure that forms a database. It should represent the inherent properties of the data independent of software, hardware or machine performance implications. These representations are independent of the class of software that will be used for implementation. Therefore, even if the software choice changes, the model remains a fundamental description of the data.
 
Design Tool - A software product used to automate the application of design techniques and tools.
 
Entity - A person, place, thing or concept that interests an organization. An entity is something about which data is stored. An entity has various attributes which can be recorded, e.g., COLOR, SIZE, etc.
 
Function - An action or activity performed by an organization through a series of tasks to help accomplish its mission.
 
Implementation Design - A database design activity that involves transforming and refining a conceptual schema into a schema that can be implemented through a database management system.
 
Logical Database Design - A description of the structure of logical relationships among the data elements of the system being designed.
 
Network Structure - A relationship between records (or other groupings) in which a child record can have more than one parent record. Also called "plex structure" .
 
Physical Database - A database in the form in which it is stored on the storage media, including pointers or other means of interconnecting it. Multiple logical databases may be derived from one or more physical databases.
 
Procedure - A particular course or mode of action. A subfunctlon.
 
Schema - A map of the overall logical structure of the database covering all data item and record types. The global logical view of the data.
 
Servicewide Data Dictionary - The authoritative source for the Service's standard data names, definitions and codes.
 
Sub-schema -A map of a programmer's view of data used. It is derived from the schema
 
Task - The lowest level of work that requires, on a repetitive basis, a unique set of data. A unique unit of work consisting of a set of sequential steps all directed toward a common goal and all using and/or creating a common set of data.
 
Tree Structure - A hierarchy of groups of data such that: 1) the highest level in the hierarchy has only one group called a "root"; 2) all groups except the root are related to one and only one group on a higher level than themselves.

Exhibit 2.5.13-8  (04-01-2003)
Summary of Access Methods

This image is too large to be displayed in the current screen. Please click the link to view the image.

Exhibit 2.5.13-9  (04-01-2003)
Glossary of Terms for Physical Database Design

Algorithm - A computational procedure.
 
Attribute - A field containing information on some entity type.
 
Block - Usually a combination of two or. more records which are read and written jointly by one machine instruction.
 
Bucket - An area of storage containing one or more records referred to by the same address.
 
Buffer - Storage used to temporarily hold data being transferred from one device to another.
 
Chain - A list of data items strung together by means of pointers.
 
CODASYL - A network data model developed by the Conference on Data System Languages, Database Task Group. See Network.
 
Compaction - Reduces the number of bits In data without effecting the informational content.
 
Concurrency - In data base systems, refers to the number of run units actively sharing the DBMS facilities.
 
Encryption - The process of encoding/decoding when transferring data to and from the data base.
 
Hierarchical - A tree structure where some records are subordinate to others.
 
Key - The data item which is used to address or identify a record.
 
Network - A structure in which a detail record can have more than one master record.
 
Overflow - An area of placement assigned to a record which for some reason cannot be stored in its home address (i.e. logically assigned address).
 
Packing Density - The number of records stored in a bucket compared to the number that could be stored.
 
Page - See Block.
 
Pocket - See Bucket
 
Real Time - Application or processing in which response to input is fast enough to affect subsequent Input (i.e. terminal dialogues on interactive systems).
 
Recovery - System's ability to determine status at the time of failure and reestablish processing at that point without loss of data.
 
Relational - Pertaining to a database in normalized, two-dimensional flat form. The DBMS recombines data elements giving different relations or greater flexibility.
 
Response Time - Total time between an instruction being given to access particular data and that data being available (seek time + read or write time).
 
Seek Time - Time taken to position the access mechanism to a specified location.
 
Sets - In CODASYL, refers to a collection of record types such that an owner type if defined with one of its member types.
 
Simulation - To represent the functioning of one system by another.
 
Transfer Time - Time taken to move data between a direct access device and the central processor.
 
Volatile File - File with high rate of additions and deletions.

Exhibit 2.5.13-10  (04-01-2003)
Bibliography for Physical Database Design

Martin, James. Computer Data-Base Organization. Englewood Cliffs, New Jersey.: Prentice-Hall, Inc., 1977.
 
Physical Date Base Design. Sperry Univac. Princeton, Now Jersey. 1981.
 
Lockmann, P. C., and E. J. Neuhold. ad. Systems For Large Data Bases. New York: North-Holland Publishing Company, 1977.
 
Data Base Techniques Software Selection and Systems Development. BIS Applied Systems, Ltd. Wellesley, Ma: QED Information Sciences, Inc. 1980.
 
Cohen, Leo J. Creating and Planning rice Corporate Data Base System Project. Wellesley, Ma: 1981.
 
Goldfine, Alan H. ed. Data Base Directions Information Resource Management-Strategies and
Tools. Washington: National Bureau of Standards. 1982.
 
Computer Technology Reports. Pennsauken, New Jersey: Auerbach Publishers, Inc.

More Internal Revenue Manual