Wikia

Psychology Wiki

Entity-Attribute-Value model

Talk0
34,136pages on
this wiki
Revision as of 16:52, December 28, 2006 by Dr Joe Kiff (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Individual differences | Personality | Philosophy | Social |
Methods | Statistics | Clinical | Educational | Industrial | Professional items | World psychology |

Statistics: Scientific method · Research methods · Experimental design · Undergraduate statistics courses · Statistical tests · Game theory · Decision theory


Entity-Attribute-Value model (EAV) is a data model in which one row stores a single fact. In a conventional table that has one column per attribute, by contrast, one row stores a set of facts. EAV design is appropriate when the number of parameters that potentially apply to an entity is vastly more than those that actually apply to an individual entity. An EAV design represents a column-to-row transformation, because each row of such a table stores one fact about an entity. An EAV table records an entity, the attribute, and the associated value of that attribute.

Conceptually, it’s a table with three columns:

  • Entity/Object ID
  • Attribute/Parameter
  • The ‘Value’ for the attribute.

The table has one row for each Attribute-Value pair.

ExampleEdit

The following example, which does not describe physical implementation, illustrates the EAV concept. A “conventional” table of laboratory values would have patient ID and date followed by numerous columns for individual tests, such as hemoglobin, potassium, and alanine transaminase. Each column would contain values for the appropriate test. A particular row would record all tests done for a given patient at a particular date and time and would appear as follows:

(<patient XYZ>, 1/5/98 12:00 AM, 12.5 gm/dl, 4.9 Meq/L, 80 IU...)

Tests not done on that patient would have the corresponding columns empty (null). In an EAV design, the patient ID and date columns appear as before, but instead of numerous columns with the names of tests hard-coded, there would be only two more columns, “Lab TestName” (the attribute) and “Value.” Thus, to record lab tests for a patient, there would be quadruples of the following form:

(<patient XYZ>, 1/5/98 12:00 AM, “Hemoglobin,” 12.5 gm/dl)
(<patient XYZ>, 1/5/98 12:00 AM, “Potassium,” 4.9 Meq/L)

and so on. One row is created for each test performed.

A conventional (“orthodox”) database table design (one fact per column) is unsuitable for such type of data, because of database vendor limitations on the number of columns per table and the need to continually add new tables or columns whenever new facts need incorporation. Most mainstream electronic patient record systems deal with this problem through the entity-attribute-value (EAV) representation, because by using this methodology, the fact descriptors (attributes) are treated as data, so that the addition of new facts does not make database restructuring necessary.

In production systems, using EAV is something like driving a car using the stick shift -- It gives you more control than using automatic transmission, but is also somewhat trickier than using old-fashioned design approaches. Therefore, it should only be used when absolutely necessary.

BenefitsEdit

  • Flexibility. There are no arbitrary limits on the number of attributes per entity. The number of parameters can grow as the database evolves, without schema redesign.
  • Space-efficient storage for highly sparse data: One need not reserve space for attributes whose values are null.
  • A simple physical data format with partially self-describing data. Maps naturally to interchange formats like XML (the attribute name is replaced with start-attribute and end-attribute tags.)
  • For databases holding data describing rapidly evolving domains.

DownsidesEdit

  • Flabbiness. Flexibility is great, but there is a point where you no longer have any structure. Typically, you can no longer rely on built-in database features such as referential integrity. To guarantee that a column takes only values within an acceptable range, you have to code integrity checks inside your application. It doesn't help to make it maintainable.
  • Inefficient queries. Where you would execute a simple query returning 20 columns from a single table, you end up with 20 self-joins, one for each column. It makes for illegible code and dreadful performance as volumes grow (scalability is very bad).
  • Designer laziness. Adding attributes on the fly is acceptable for a prototype. But if you don't really know what data you want to work with in the first place, you are begging for trouble.

See Also Edit

This page uses Creative Commons Licensed content from Wikipedia (view authors).

Around Wikia's network

Random Wiki