Home
Home Page
PHP and Web. Caching
Job with Cookies on PHP
Electronic dispatches
JUzabiliti the main page
Natural keys against artificial keys
Uniform autentifikacija Windows NT/2000 and Oracle
The manual on Link Popularity
Partner Links: optimize an exchange of links
What for registration in catalogues through 1PS.RU is necessary
Krossbrauzernyj DHTML
DHTML-skriplet - it is simple about simple
Promotion of a site with the help of bulletin boards
The practical grant{manual} on a spelling of slogans for websites
We check the site - that has taken place with your ranging?
PHP: Patterns
Use of patterns in PHP4
Really easy change of design
Job with patterns, use HTML-Template with CGI-scripts
Job with files in PHP
Links
 

Natural keys against artificial keys

Given clause{article} states a sight of the author at a problem on a regular basis discussed in newsgroups, the applications devoted to development with use RSUBD.

About essence of a problem


Each recording in the table which is included in RSUBD, should have a primary key (PC) - a set of attributes, is unique identifying her  in the table. The case when the table has no a primary key, has the right to existence, however in given clause{article} is not considered{examined}.


As a primary key it can be used-

Natural Key (EK) - a set of attributes described recording of essence, it is unique its{her} identifying (for example, number{room} of the passport for the person);

Or

Substitute Key (SK) - automatically generated field which in any way has been not connected to the information maintenance{contents} of recording. Usually in role SK the autoincremental field such as INTEGER acts.


There are two opinions:

1.         SK should be used, only if EK does not exist. If EK exists, identification of recording inside a DB is carried out on available EK;

2.         SK should be added in any table on which there are links (REFERENCES) from other tables, and communications{connections} between them should be organized only with help SK. Certainly, search of recording and performance to its{her} user on former are made on basis EK.


Naturally, it is possible to imagine and a certain intermediate opinion, but now discussion is conducted within the framework of two above-stated.

When appear SK


For understanding of a place and value SK we shall consider a design stage on which they are entered in structure of a DB, and a technique of their introduction.


For clearness we shall consider a DB from 2 attitudes{relations} - Cities (City) and People (People) we Assume, that the city is characterized Hazvaniem (Name), all cities have different names, the person is characterized by the Surname (Family), number{room} of the passport (Passport) and city of residing (City). Also we believe, that each person has unique number{room} of the passport. Ha a stage of drawing up infologicheskoj models of a DB its{her} structure is identical both for EK and for SK.

CREATE TABLE City (

Name VARCHAR (30) NOT NULL PRIMARY KEY

);


CREATE TABLE People (

Passport CHAR (9) NOT NULL PRIMARY KEY,

Family VARCHAR (20) NOT NULL,

City VARCHAR (30) NOT NULL REFERENCES City (Name)

);



For EK all is ready. For SK one more stage is done{made} and we shall transform tables as follows:

CREATE TABLE City (

/*

In different dialects of language SQL the autoincremental field will be expressed differently-

For example, through IDENTITY, SEQUENCE or GENERATOR.

Here we use symbol AUTOINCREMENT.

*/

Id INT NOT NULL AUTOINCREMENT PRIMARY KEY

Name VARCHAR (30) NOT NULL UNIQUE

);


CREATE TABLE People (

Id INT NOT NULL AUTOINCREMENT PRIMARY KEY,

Passport CHAR (9) NOT NULL UNIQUE,

Family VARCHAR (20) NOT NULL,

CityId INT NOT NULL REFERENCES City (Id)

);



I pay attention, that:

?         All conditions, diktuemye a subject domain (uniqueness of a name of city and number{room} of the passport) continue to be present in a DB, only are provided not with condition PRIMARY KEY, and condition UNIQUE;

?         Keyword AUTOINCREMENT in one of servers known to me no. It is simple a designation, that the field is generated automatically.


Generally the algorithm of addition SK looks as follows:

1.         In the table field INTEGER AUTOINCREMENT is added;

2.         It appears PRIMARY KEY;

3.         Old PRIMARY KEY (EK) it is replaced on UNIQUE CONSTRAINT;

4.         If in the table is REFERENCES on other tables the fields which are included in REFERENCES, are replaced with one field such as INTEGER, making a primary key (as People. City it is replaced on People. CityId).


It is mechanical operation which in any way does not break infologicheskoj model and integrity of the data. From the point of view infologicheskoj these two databases are equivalent to model.

What for all this is necessary


There is a reasonable question - and what for? Really, to enter into tables any fields, something to replace, what for? So, that we receive, having done this "mechanical" operation.

Simplification of support


It is area where SK show the greatest advantages. As operations of communication{connection} between tables are separated from logic " inside tables " - both that and another can be changed independently and not mentioning the rest.


Haprimer - it was found out, that cities have duplicated names. Is solved to enter in City one more field - Region (Region) and to make the PC (City, Region). In case EK - table City changes, table People changes - field Region (yes, yes, for all recordings, about the sizes I I am silent) is added, all searches, including on clients in whom participates City correspond, in them are added line AND XXX.Region = City. Region.


Yes, nearly has not forgotten, the majority of servers strongly do not like ALTER TABLE on the fields which are included in PRIMARY KEY and FOREIGN KEY.


In case SK - the field in City is added, changes UNIQUE CONSTRAINT. All.


Still the example - in case SK in SELECT never forces to copy change of the list of fields JOIN. In case EK - the field which is not included in the PC of the connected table was added - copy.


Still the example - has exchanged type of the data of the field which are included in EK. And again alterations of a heap of tables, anew optimization of indexes...


In conditions of the varying legislation this advantage SK in itself is enough for their use.

Reduction of the size of a DB


Let's assume in our example, that average length of the name of city - 10 bytes. Then on each person 10 bytes for storage of the link to city (really a little bit more at the expense of the service information on VARCHAR and is greater at the expense of an index on People. City which should be constructed that REFERENCES worked effectively) will have on the average. In case SK - 4 bajta. Economy - a minimum of 6 bytes on the person, approximately 10 Mb for Hovosibirska. Reduction of the size of a DB - not end in itself, but it, obviously is obvious, that in most cases, will result and in growth of speed.


Arguments sounded, that the DB can optimize itself storage EK, having substituted instead of him  in People a certain khehsh-function (actually having created SK itself). Ho any of real-life commercial servers of a DB so does not do{make}, and there are bases to believe, as will not do{make}. The elementary substantiation of such opinion is that at similar substitution banal operators ADD CONSTRAINT … FOREIGN KEY or DROP CONSTRAINT … FOREIGN KEY will result to serious peretrjaske tables, with appreciable change of all DB (it will be physically necessary to add or remove (with replacement by khehsh-function)) all fields which are included in CONSTRAINT.

Increase in speed of sample of the data


Question disputable enough, however, proceeding from assumptions, that:

?         The database is normalized;

?         In tables it is a lot of recordings (tens thousand and more);

?         Searches mainly return the limited data sets (a maximum of unit of percent{interests} from the size of the table).


Speed of system on SK will be above appreciable. And that is why:


EK can potentially give higher speed, when:

?         The information which is included in primary keys of connected tables is required only;

?         There are no conditions WHERE on fields of the connected tables.


I.e., in our example it is search of type:

SELECT Family, City FROM People;



In case SK this search will look as

SELECT P.Family, C.Name FROM People P INNER JOIN City C ON P.CityId = C.Id;



It would seem, EK gives more simple search with smaller quantity{amount} of tables which will be executed faster. Ho and here not all so is simple: the sizes of tables for EK - (are (see higher) also disk activity easily s``est the advantage received at the expense of absence JOIN ` more and. Even more strongly it will have an effect, if at sample of the data their filtering (and at how much - or essential volume of tables it is used necessarily) is used. The matter is that search, as a rule, is carried out on informative fields such as CHAR, DATETIME, etc. Therefore often it happens to find faster in the help table a set of the values limiting result returned by search, and then by JOIN ` and on a fast INTEGER-index to select suitable recordings from the big table. For example:

(EK) SELECT Family, City FROM People WHERE City = ' Ivanovo ';



It will be carried out in times more slowly, than

(CK) SELECT P.Family, C.Name

      FROM People P INNER JOIN City C ON P.CityId = C.Id

     WHERE C.Name = ' Ivanovo ';



In case EK - will be INDEX SCAN big table People on a CHARACTER-index. In case SK - INDEX SCAN smaller CITY and JOIN on effective INTEGER to an index.


And if to replace = ' Ivanovo ' on LIKE ' %vanovo ' the question will be braking EK be relative SK on the order and more.


Similarly, as soon as in a case with EK it is required to switch on in search a field from City, not included in its{her} primary key - JOIN will be osuhhestvljatsja on a slow index and speed will fall is appreciable below level SK. Everyone can do{make} conclusions itself, but let he will recollect, what percent{interest} from the general{common} number of his  searches make SELECT * FROM EdinstvennajaTablica. At me - it is insignificant small.


Yes, supporters EK like to spend as advantage " informativnost` tables " which in case EK grows. Once again I shall repeat, that maximal informativnost`ju the table containing all DB as flat-file possesses. Any " increase informativnosti tables " is increase in a degree of duplication in them the information what to not eat well.

Increase in speed of updating of the data


INSERT


Ha first sight EK is faster - it is not necessary at INSERT to generate a superfluous field and to check his  uniqueness. In general so it also is, though this delay is shown only at very high intensity of transactions. However and it is unevident, since some servers optimize an insert of recordings if on a key field it is constructed monotonously growing CLUSTERED an index. In case SK it is elementary, in case EK - alas, is usually unattainable. Besides INSERT to the table on party  MANY (which occurs more often) will go faster since REFERENCES will be checked on faster index.


UPDATE


At updating the field which are included in EK, it is necessary to update in cascade and all connected tables. So, renaming of Leningrad into Saint Petersburg will demand with our example of transaction on some millions recordings. Updating of any attribute in system with SK will lead to to updating only to one recording. It is obvious, that in case of the distributed{allocated} system, presence of archives, etc. the situation only will be aggravated. If fields not included in EK are updated - speed will be almost identical.


Still about CASCADE UPDATE


It is far from being all servers of a DB support them at a declarative level. Arguments " it at you the server of a curve " in this case are hardly correct. It compels to write separate logic for updating that is not always simple (was resulted the good example - at absence CASCADE UPDATE to update a field on which there are links, in general it is impossible - it is necessary to disconnect REFERENCES or to create a copy of recording, that is not always allowable (other fields can be UNIQUE)).


DELETE


In case SK it will be carried out faster, for that simple reason, that check REFERENCES will go on a fast index.

Whether and is good EK?


Hichto it is not eternal under the Moon. Most, apparently, the reliable attribute is suddenly cancelled and ceases to be unique (I shall far not go - rouble usual and rouble denominated, to examples nest` numbers). Americans swear on neunikal`nost` numbers{rooms} of social insurance, Microsoft - on Chinese grey network cards at duplicated MAC-addresses which can lead to to duplication GUID, doctors do{make} operations on change of a floor, and biologists clone animals. In these conditions (and taking into account the law neubyvanija ehntropii) to pawn in system the thesis about invariance EK - to mine itself. It is necessary to allocate them in a separate logic layer and whenever possible to isolate from other information. So their change is experienced where more easy. And in general: unequivocally associirovat` essence with any from attributes of this essence - well, it is strange, that - whether. Homer passports not is yet the person. SK is a certain substance, and meaning essence. Essence, instead of any from its{her} attributes.

Typical arguments of supporters EK


In system with SK the control of correctness of input of the information is failed


It not so. The control would be failed, if restriction of uniqueness has not been imposed on the fields which are included in EK. It is obvious, that if the subject domain dictates any restrictions on attributes EK they will be reflected in a DB in any case.


In system with EK it is less JOIN ` ov, hence, searches development is easier also more conveniently


Yes, it is less. Ho, in system with SK it is trivial it is written:

CREATE VIEW PeopleEK AS

SELECT P.Family, P.Passport, C.Name

FROM People P INNER JOIN City C ON P.CityId = C.Id



Also it is possible to have all same of charm. With more, the truth, high speed. Thus it is quite good to mention, that in case EK much should program cascade operations, and, God forbid in the distributed{allocated} environment, to struggle with problems of speed. Ha a background of it "short" searches any more do not seem so attractive.


Introduction SK breaks a third normal form


Let's recollect definition: the Table is in the third normal form (3NF) if she satisfies to definition 2NF, and any of its{her} not key fields does not depend functionally on any other not key field.


That is, speech about key fields there does not go in general. Therefore addition of one more key in the table cannot break 3NF at all. In general, for the table with several possible{probable} keys it is meaningful to speak not about 3 NF, and about Boyes - Kodda Normal form which is specially entered for such tables.


So: the Table is in Boyes - Kodda (NFBK) normal form if and only if any functional dependence between his  fields is reduced to full functional dependence on a possible{probable} key.


Thus, the table having SK, can be easily normalized though up to 5NF. will say more precisely, that SK to normalization have no any attitude{relation}. Moreover, introduction SK reduces redundancy of the data in a DB, that in general will well be coordinated to ideology of normalization. In effect, normalization also is reduction informativnosti separate tables by the certain rules. Only SK eliminate anomalies not inside the table, and at an intertabulared level (such as elimination cascade obnovlenij). So to say, system with SK - svjatee Daddies Roman:-). Really - the situation when at change of one of fields of the table it is necessary to change contents of the same field in other recordings SAME table, is considered{examined} as anomaly of updating. But in system with EK it is necessary to do the same In the CONNECTED table at change of key attribute on the party  of 1 attitude{relation} 1:N. It is obvious, that this situation from the point of view of physical realization of a DB is better nothing. In system with SK such situations do not arise.

Tables in system with EK informativnee


Maximal informativnost`ju the table containing all DB as flat-file possesses. Any " increase informativnosti tables " is increase in a degree of duplication in them the information that it is not necessary to eat well. And in general the term " Informativnost` tables " is doubtful. Probably, is more important informativnost` a DB which in both cases is identical.

The conclusion


In general, conclusions are obvious - introduction SK allows to receive better controlled, more compact and high-speed DB. Certainly, it not panacea. In some cases (for example, the table on which no REFERENCES and in which the intensive insert of the data, etc. is carried out) to use EK more correctly or to not use the PC in general (the last is categorically contra-indicated for many RSUBD and means of development of client applications). But the question was a typical technique which should be recommended to application generally. Unique situations can demand unique decisions (sometimes and normalization it is necessary to renounce).