Wednesday, December 31, 2008

Vitamins Fail to Prevent Cancer: Study - washingtonpost.com

Vitamins Fail to Prevent Cancer: Study - washingtonpost.com: "Vitamins Fail to Prevent Cancer: Study


By Randy Dotinga
HealthDay Reporter
Tuesday, December 30, 2008; 12:00 AM

TUESDAY, Dec. 30 (HealthDay News) -- In another blow to the supposed cancer-fighting powers of vitamins C and E, new research suggests that supplement forms of the vitamins don't prevent the disease in women.

And another widely touted supplement, beta carotene, didn't help either, the new study found.

'Simply taking antioxidant supplements is insufficient to prevent cancer development,' said study lead author Jennifer Lin, an assistant professor of medicine at Harvard Medical School.

But it's still a good idea to eat plenty of fruits and vegetables that are rich in nutrients such as antioxidants, Lin said.

Vitamin supplements have taken hits in a number of studies in recent years, with some research suggesting that supplements such as vitamins B, C, D, E, folic acid and calcium don't prevent cancer when taken in combinations or alone. The findings contradict other studies that had suggested the vitamins may have a protective effect due to antioxidants, which reduce damage to cells in the body.

For the new study, the researchers looked at a group of 8,171 women who were randomly assigned to take a supplement, a combination of supplements or a placebo. The supplements were vitamin C (500 milligrams a day), vitamin E (600 International Units every other day) and beta carotene (50 milligrams every other day).
ad_icon

The women, all over the age of 40, took part in the study from 1995 and 1996 until 2005, for an average of nine years. They all had cardiovascular disease or were at risk for it.

A total of 624 of the women developed cancer, and 176 died from it during the period of the study. The researchers didn't find any 'statistically significant' evidence that the supplements either helped or hurt a woman's risk of developing cancer.

The findings were published online Dec. 30 in the Journal of the National Cancer Institute.

Why didn't the supplements prevent cancer, as some earlier research had suggested? One theory, Lin said, is that they might be effective in people who are poorly nourished, but not in well-nourished people such as the women in the study. She said some research has shown that diets lacking in antioxidants -- found in fruits and vegetables -- can lead to higher cancer rates.

The study isn't the final word, Lin said. 'More studies need to be done to see who may benefit from antioxidant supplements. One trial study has suggested that men, compared with women, were more likely to gain benefits from supplementation with antioxidants in reducing cancer risk. However, such findings need verification.'

Dr. Demetrius Albanes, senior investigator with the U.S. National Cancer Institute, who wrote a commentary accompanying the study, acknowledged that while the study results were disappointing, it's possible that the supplements could have had positive effects in women who weren't at high risk for cardiovascular disease, as were those in the study.

He added that the study indicated that vitamin E may have some benefit at preventing colon cancer specifically, as other research has suggested.

In the big picture, Albanes said, research shows that lower-calorie diets with plenty of fruits and vegetables do have benefits. 'But right now, the issue of vitamin supplementation is still very much up in the air for men and for women,' he added.

More information

Learn more about the links between diet, exercise and cancer risk from the American Cancer Society.

SOURCES: Jennifer Lin, Ph.D., assistant professor of medicine, Brigham and Women's Hospital and Harvard Medical School, Boston; Demetrius Albanes, M.D., senior investigator, U.S. National Cancer Institute, Bethesda, Md.; Dec. 30, 2008, Journal of the National Cancer Institute, online"

Vitamins Fail to Prevent Cancer: Study - washingtonpost.com

Vitamins Fail to Prevent Cancer: Study - washingtonpost.com

Tuesday, November 18, 2008

Only the individual can think, and thereby create new values for society, nay, even set up new moral standards to which the life of the community conforms. ... The ideals which have lighted my way, and time after time have given me new courage to face life cheerfully, have been Kindness, Beauty and Truth.(Albert Einstein, 1954)
Communities tend to be guided less than individuals by conscience and a sense of responsibility. How much misery does this fact cause mankind! It is the source of wars and every kind of oppression, which fill the earth with pain, sighs and bitterness.(Albert Einstein, 1934)
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Introduction
A human being is part of the whole called by us universe, a part limited in time and space. We experience ourselves, our thoughts and feelings as something separate from the rest. A kind of optical delusion of consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest to us. Our task must be to free ourselves from the prison by widening our circle of compassion to embrace all living creatures and the whole of nature in its beauty. The true value of a human being is determined by the measure and the sense in which they have obtained liberation from the self. We shall require a substantially new manner of thinking if humanity is to survive. (Albert Einstein, 1954)
If, then, it is true that the axiomatic basis of theoretical physics cannot be extracted from experience but must be freely invented, can we ever hope to find the right way? I answer without hesitation that there is, in my opinion, a right way, and that we are capable of finding it. I hold it true that pure thought can grasp reality, as the ancients dreamed. (Albert Einstein, 1954)
Kindness, Beauty and Truth of Albert Einstein
Over ten years I have read many hundreds of great philosophers, but of them all I have special affection for Albert Einstein. Having now read Albert Einstein's 'Special and General Relativity', and 'Ideas and Opinions' many times, I thought it would be nice to put up a web page that presented his ideas in as simple and ordered way as possible. Albert Einstein was a beautiful man, wise and moral, who lived in difficult times. I think all people will enjoy the great clarity and wisdom of his ideas, and they will find them very relevant and useful in our modern (and very disturbed) world. Below you will find quotations from Albert Einstein on a diversity of subjects, philosophy, religion, war, education, morality etc.
Of most significance though are his ideas on Physics and Reality. It was from reading Einstein that I first realised that matter was not made of tiny 'particles'. And having also read Lorentz (who believed in an Absolute Space) I realised that a slight modification of Einstein's ideas on Physical Reality solved many of the problems of modern physics. Einstein represented Matter as Spherical Fields which caused 'Relative' Space-Time. This can now be explained by replacing Einstein's Spherical Force Fields with Spherical Wave Motions of Space, which cause Matter, Time and Forces.
I hope you enjoy the Kindness, Beauty and Truth of Albert Einstein.Geoff Haselhurst

Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Albert Einstein Quotes on Humanity / Society
Perhaps I am a romantic, but it is my hope that in the future Humanity will live by the truth, with greater harmony between different people, their religions and cultures, and to life in all its complex beauty. As Einstein profoundly writes;
A human being is part of the whole called by us universe, a part limited in time and space. We experience ourselves, our thoughts and feelings as something separate from the rest. A kind of optical delusion of consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest to us. Our task must be to free ourselves from the prison by widening our circle of compassion to embrace all living creatures and the whole of nature in its beauty. The true value of a human being is determined by the measure and the sense in which they have obtained liberation from the self. We shall require a substantially new manner of thinking if humanity is to survive. (Albert Einstein, 1954)
To see with one's own eyes, to feel and judge without succumbing to the suggestive power of the fashion of the day, to be able to express what one has seen and felt in a trim sentence or even in a cunningly wrought word- is that not glorious? It is not a proper subject for congratulation? (Albert Einstein, 1934)
When we survey our lives and endeavours, we soon observe that almost the whole of our actions and desires is bound up with the existence of other human beings. We notice that our whole nature resembles that of the social animals. We eat food that others have produced, wear clothes that others have made, live in houses that others have built. The greater part of our knowledge and beliefs has been communicated to us by other people through the medium of a language which others have created. Without language our mental capacities would be poor indeed, comparable to those of the higher animals; we have, therefore, to admit that we owe our principal advantage over the beasts to the fact of living in human society. The individual, if left alone from birth, would remain primitive and beastlike in his thoughts and feelings to a degree that we can hardly conceive. (Albert Einstein, 1934)
Why Socialism? Man is, at one at the same time, a solitary being and a social being. As a solitary being, he attempts to protect his own existence and that of those who are closest to him, to satisfy his personal desires, and to develop his innate abilities. As a social being, he seeks to gain the recognition and affection of his fellow human beings, to share in their pleasures, to comfort them in their sorrows, and to improve their conditions of life. Only the existence of these varied, frequently conflicting strivings accounts for the special character of a man, and their specific combination determines the extent to which an individual can achieve an inner equilibrium and can contribute to the well-being of society. It is quite possible that the relative strength of these two drives is, in the main, fixed by inheritance. But the personality that finally emerges is largely formed by the environment in which a man happens to find himself during his development, by the structure of the society in which he grows up, by the tradition of that society, and by its appraisal of particular types of behaviour. (Albert Einstein, 1949)
It is society which provides man with food, clothing, a home, the tools of work, language, the forms of thought, and most of the content of thought; his life is made possible through the labor and the accomplishments of the many millions past and present who are all hidden behind the small word society. It is evident, therefore, that the dependence of the individual upon society is a fact of nature which cannot be abolished- just as in the case of ants and bees. However, while the whole life process of ants and bees is fixed down to the smallest detail by rigid, hereditary instincts, the social pattern and interrelationships of human beings are very variable and susceptible to change. Memory, the capacity to make new combinations, the gift of oral communication have made possible developments among human beings which are not dictated by biological necessities. Such developments manifest themselves in traditions, institutions, and organisations; in literature; in scientific and engineering accomplishments; in works of art. This explains how it happens that, in a certain sense, man can influence his life through his own conduct, and that in this process conscious thinking and wanting can play a part. Man acquires at birth, through heredity, a biological constitution which we must consider fixed and unalterable, including the natural urges which are characteristic of the human species. In addition, during his lifetime, he acquires a cultural constitution which he adopts from society through communication and through many other types of influences. It is this cultural constitution which, with the passage of time, is subject to change and which determines to a very large extent the relationship between the individual and society. (Albert Einstein, 1949)
Modern anthropology has taught us, through comparative investigation of so-called primitive cultures, that the social behaviour of human beings may differ greatly, depending upon prevailing cultural patterns and the types of organisation which predominate in society. It is on this that those who are striving to improve the lot of man may ground their hopes: human beings are not condemned, because of their biological constitution, to annihilate each other or to be at the mercy of a cruel, self-inflicted fate. If we ask ourselves how the structure of society and the cultural attitude of man should be changed in order to make human life as satisfying as possible, we should constantly be conscious of the fact that there are certain conditions which we are unable to modify. As mentioned before, the biological nature of man is, for all practical purposes, not subject to change. (Albert Einstein, 1949)
It is only a slight exaggeration to say that mankind constitutes even now a planetary community of production and consumption. I have now reached the point where I may indicate briefly what to me constitutes the essence of the crisis in our time. It concerns the relationship of the individual to society. The individual has become more conscious than ever of his dependence upon society. But he does not experience this dependence as a positive asset, as an organic tie, as a protective force, but rather as a threat to his natural rights, or even to his economic existence. Moreover, his position in society is such that the egotistical drives of his make-up are constantly being accentuated, while his social drives, which are by nature weaker, progressively deteriorate. All human beings, whatever their position in society, are suffering from this process of deterioration. Unknowingly prisoners of their own egotism, they feel insecure, lonely, and deprived of the naive, simple and unsophisticated enjoyment of life. Man can find meaning in life, short and perilous as it is, only through devoting himself to society. The economic anarchy of capitalist society as it exists today is, in my opinion, the real source of evil. (Albert Einstein, 1949)
I, too, am in favour of abolishing large cities (Albert Einstein, 1934)
The population of the civilized countries is extremely dense as compared with former times; Europe today contains about three times as many people as it did a hundred years ago. But the number of leading personalities has decreased out of all proportion. Only a few people are known to the masses as individuals, through their creative achievements. Organisation has to some extent taken the place of leading personalities, particularly in the technical sphere, but also to a very perceptible extent in the scientific. (Albert Einstein, 1934)
Communities tend to be guided less than individuals by conscience and a sense of responsibility. How much misery does this fact cause mankind! It is the source of wars and every kind of oppression, which fill the earth with pain, sighs and bitterness. (Albert Einstein, 1934)
To inquire after the meaning or object of one's own existence or that of all creatures has always seemed absurd from an objective point of view. And yet everybody has certain ideals which determine the direction of his endeavors and judgments. In this sense I have never looked upon ease and happiness as ends in themselves - this ethical basis I call the ideal of a pigsty. The ideals which have lighted my way, and time after time have given me new courage to face life cheerfully, have been Kindness, Beauty and Truth. Without the sense of kinship with men of like mind, without the occupation with the objective world, the eternally unattainable in the field of art and scientific endeavors, life would have seemed to me empty. The trite objects of human efforts-possessions, outward success, luxury-have always seemed to me contemptible.
My passionate sense of social justice and social responsibility has always contrasted oddly with my pronounced lack of need for direct contact with other human beings and human communities. I am truly a 'lone traveler' and have never belonged to my country, my home, my friends, or even my immediate family, with my whole heart; in the face of all these ties, I have never lost a sense of distance and a need for solitude-feelings which increase with the years. One becomes sharply aware, but without regret, of the limits of mutual understanding and consonance with other people. No doubt, such a person loses some of his innocence and unconcern; on the other hand, he is largely independent of the opinions, habits, and judgments of his fellows and avoids the temptation to build his inner equilibrium upon such insecure foundations. (Albert Einstein - Ideas and Opinions, 1954)
(Albert Einstein on the occasion of Mahatma Mohandas Gandhi's 70th Birthday in 1939)A leader of his people, unsupported by any outward authority: a politician whose success rests not upon craft nor the mastery of technical devices, but simply on the convincing power of his personality; a victorious fighter who always scorned the use of force; a man of wisdom and humility, armed with resolve and inflexible consistency, who had devoted all his strength to the uplifting of his people and the betterment of their lot; a man who had confronted the brutality of Europe with the dignity of the simple human being, and thus at all times risen superior. Generations to come, it may be, will scarce believe that such a one as this ever in flesh and blood walked on this earth. (Albert Einstein, 1939)
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Albert Einstein: Quotes on Government, Politics, Economics
I advocate world government because I am convinced that there is no other possible way of eliminating the most terrible danger in which man has ever found himself. The objective of avoiding total destruction must have priority over any other objective. (Albert Einstein, 1947)
Exchange Of Letters With Members Of The Russian Academy Any government is in itself an evil in so far as it carries within it the tendency to deteriorate into tyranny. However, except for a small number of anarchists, every one of us is convinced that civilized society cannot exist without a government. In a healthy nation there is a kind of dynamic balance between the will of the people and the government, which prevents its degeneration into tyranny. It is obvious that the danger of such deterioration is more acute in a country in which the government has authority not only over the armed forces but also over all the channels of education and information as well as over the economic existence of every single citizen. I say this merely to indicate that socialism as such cannot be considered the solution to all social problems but merely as a framework within which such a solution is possible. (Albert Einstein, 1947)
Is it really unavoidable that, because of our passions and our inherited customs, we should be condemned to annihilate each other so thoroughly that nothing would be left over which would deserve to be conserved? Is it not true that all the controversies and differences of opinion which we have touched upon in our strange exchange of letters are insignificant pettiness compared to the danger in which we all find ourselves? Should we not do everything in our power to eliminate the danger which threatens all nations alike? (Albert Einstein, 1947)
If two factories produce the same sort of goods, other things being equal, that factory will be able to produce them more cheaply which employs fewer workmen- i.e., makes the individual worker work as long and as hard as human nature permits. From this it follows inevitably that, with methods of production as they are today, only a portion of the available labor can be used. While unreasonable demands are made on this portion, the remainder is automatically excluded from the process of production. This leads to a fall in sales and profits. Businesses go smash, which further increases unemployment and diminishes confidence in industrial concerns and therewith public participation in the mediating banks; finally the banks become insolvent through the sudden withdrawl of accounts and the wheels of industry therewith come to a complete standstill. (Albert Einstein, 1934)
Of the one thing I feel certain: this same technical progress which, in itself, might relieve mankind of the great part of the labor necessary to its subsistence, is the main cause of our present misery. Hence there are those who would in all seriousness forbid the introduction of technical improvements. This is obviously absurd. But how can we find a more rational way out of our dilemma? (Albert Einstein, 1934)
My personal opinion is that those methods are in general preferable which respect existing traditions and habits so far as that is in any way compatible with the end in view. Nor do I believe that a sudden transference of economy into government management would be beneficial from the point of view of production; private enterprise should be left its sphere of activity, in so far as it has not already been eliminated by industry itself by the device of cartelization.
There are, however, two respects in which this economic freedom ought to be limited. In each branch of industry the number of working hours per week ought so to be reduced by law that unemployment is systematically abolished. At the same time minimum wages must be fixed in such a way that the purchasing power of the workers keeps pace with production.
Further, in those industries which have become monopolistic in character through organisation on the part of the producers, prices must be controlled by the state in order to keep the issue of capital within reasonable bounds and prevent artificial strangling of production and consumption. In this way it might perhaps be possible to establish a proper balance between production and consumption without too great a limitation of free enterprise and at the same time to stop the intolerable tyranny of the owners of the means of production (land and machinery) over the wage-earners, in the widest sense of the term. (Albert Einstein, 1934)
The weakness of your plan lie, so it seems to me, in the sphere of psychology, or rather, in your neglect of it. It is no accident that capitalism has brought with it progress not merely in production but also in knowledge. Egoism and competition are, alas, stronger forces than public spirit and sense of duty. In Russia, they say, it is impossible to get a decent piece of bread. ..Perhaps I am over-pessimistic concerning state and other forms of communal enterprise, but I expect little good from them. Bureaucracy is the death of any achievement. I have seen and experienced too many dreadful warnings, even in comparatively model Switzerland.
I am inclined to the view that the state can only be of real use to industry as a limiting and regulative force. It must see to it that competition among the workers is kept within healthy limits, that all children are given a chance to develop soundly, and that wages are high enough for the goods produced to be consumed. But it can exert a decisive influence through its regulative function if its measures are framed in an objective spirit by independent experts. (Albert Einstein, 1934)
On Receiving The One World Award For all of us who are concerned for peace and the triumph of reason and justice must today be keenly aware how small an influence reason and honest good-will exert upon events in the political field. But however that may be, and whatever fate may have in store for us, yet we may rest assured that without the tireless efforts of those who are concerned with the welfare of humanity as a whole, the lot of mankind would be still worse than in fact it even now is. (Albert Einstein, 1948)
The economic anarchy of capitalist society as it exists today is, in my opinion, the real source of evil. (Albert Einstein, 1949)
Private capital tends to become concentrated in few hands, partly because of competition among the capitalists, and partly because technological development and the increasing division of labor encourage the formation of larger units of production at the expense of the smaller ones. The result of these developments is an oligarchy of private capital the enormous power of which cannot be effectively checked even by a democratically organised political society. This is true since the members of legislative bodies are selected by political parties, largely financed or otherwise influenced by private capitalists who, for all practical purposes, separate the electorate from the legislature. The consequence is that the representatives of the people do not in fact sufficiently protect the interests of the underprivileged sections of the population. Moreover, under existing conditions, private capitalists inevitably control, directly or indirectly, the main sources of information (press, radio, education). It is thus extremely difficult, and indeed in most cases quite impossible, for the individual citizen to come to objective conclusions and to make intelligent use of his political rights. (Albert Einstein, 1949)
This crippling of individuals I consider the worst evil of capitalism.Our whole educational system suffers from this evil. An exaggerated competitive attitude is inculcated into the student, who is trained to worship acquisitive success as a preparation for his future career. I am convinced there is only one way to eliminate these grave evils, namely through the establishment of a socialist economy, accompanied by a educational system which would be oriented toward social goals. In such an economy, the means of production are owned by society itself and are utilised in a planned fashion. A planned economy, which adjusts production to the needs of the community, would distribute the work to be done among all those able to work and would guarantee a livelihood to every man, woman and child. The education of the individual, in addition to promoting his own innate abilities, would attempt to develop in him a sense of responsibility for his fellow-men in place of the glorification of power and success in our present society. (Albert Einstein, 1949)
To Sigmund Freud ( a private letter written by Albert Einstein around 1931-2) It is admirable how the yearning to perceive the truth has overcome every other yearning in you. You have shown with impelling lucidity how inseparably the combative and destructive instincts are bound up in the human psyche with those of love and life. But at the same time there shines through the cogent logic of your arguments a deep longing for the great goal of internal and external liberation of mankind from war. This great aim has been professed by all those who have been venerated as moral and spiritual leaders beyond the limits of their own time and country without exception, from Jesus Christ to Goethe and Kant. It is not significant that such men have been universally accepted as leaders, even though their efforts to mould the course of human affairs were attended with small success?
I am convinced that the great men, those whose achievements in howsoever restricted a sphere set them above their fellows, share to an overwhelming extent the same ideal. But they have little influence on the course of political events. It almost looks as if this domain on which the fate of nations depends has inescapably to be given over to the violence and irresponsibility of political rulers.
Political leaders or governments owe their position partly to force and partly to popular election. They cannot be regarded as representative of the best elements, morally or intellectually, in their respective nations. The intellectual elite have no direct influence on the history of nations in these days; their lack of cohesion prevents them from taking a direct part in the solution of contemporary problems.Don't you think that a change might be brought about in this respect by a free association of people whose previous achievements and actions constitute a guarantee of their ability and purity of aim? This association of an international nature, whose members would need to keep in touch with each other by a constant interchange of opinions, might, by defining its attitude in the Press - responsibility always resting with the signatories on any given occasion - acquire a considerable and salutary moral influence over the settlement of political questions. Such an association would, of course, be a prey to all the ills which so often lead to degeneration in learned societies, dangers which are inseparably bound up with the imperfections of human nature. But should not an effort in this direction be risked in spite of this? I look upon such an attempt as nothing less than an imperative duty.
If an intellectual association of standing, such as I have described, could be formed, it would also have to make a consistent effort to mobilise the religious organisations for the fight against war. It would give countenance to many whose good intentions are paralysed today by a melancholy resignation. Finally, I believe that an association formed of persons such as I have described, each highly esteemed in his own line, would be well suited to give valuable moral support to those elements in the League of Nations which are really working toward the great objective for which that institution exists. I had rather put these proposals to you than to anyone else in the world, because you, least of all men, are the dupe of your desires and because your critical judgement is supported by a most grave sense of responsibility. (Albert Einstein, personal letter to Sigmund Freud written around 1931-32)
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Albert Einstein: War, Peace, Disarmament and PatriotismIn two weeks the sheeplike masses of any country can be worked up by the newspapers into such a state of excited fury that men are prepared to put on uniforms and kill and be killed, for the sake of the sordid ends of a few interested parties. Compulsory military service seems to me the most disgraceful symptom of that deficiency in personal dignity from which civilized mankind is suffering today. (Albert Einstein, 1934)
If unrestricted sacred egoism leads to dire consequences in economic life, it is still worse as a guide in international relations. The development of mechanical methods of warfare is such that human life will become intolerable if people do not discover before long a way of preventing war. The importance of this object is only equalled by the inadequacy of the attempts hitherto made to attain it. (Albert Einstein, 1930, Address To The Students' Disarmament Meeting)
May I begin with an article of political faith? It runs as follows: the state is made for man, not man for the state. The same may be said of science. These are old sayings, coined by men for whom human personality has the highest human value. I should shrink from repeating them, were it not that they are forever threatening to fall into oblivion, particularly in these days of organisation and stereotypes. I regard it as the chief duty of the state to protect the individual and give him the opportunity to develop into a creative personality. (Albert Einstein, 1931, The Disarmament Conference of 1932)
..the greatest obstacle to international order is that monstrously exaggerated spirit of nationalism which also goes by the fair-sounding but misused name of patriotism. During the last century and a half this idol has acquired an uncanny and exceedingly pernicious power everywhere. (Albert Einstein, 1931)
The introduction of compulsory military service is therefore, to my mind, the prime cause of the moral decay of the white race, which seriously threatens not merely the survival of our civilization but our very existence. (Albert Einstein, 1931)
... unfortunate national traditions which are handed on like a hereditary disease from generation to generation through the workings of the educational system. (Albert Einstein, 1931)
Anybody who really wants to abolish war must resolutely declare himself in favour of his own country's resigning a portion of its sovereignty in favour of international institutions: he must be ready to make his own country amenable, in case of a dispute, to the award of an international court. He must, in the most uncompromising fashion, support disarmament all round, as is actually envisaged in the unfortunate Treaty of Versailles; unless military and aggressively patriotic education is abolished, we can hope for no progress. (Albert Einstein, published 1934)
Letters To Friends Of Peace It has come to my knowledge that out of the greatness of your soul you are quietly accomplishing a splendid work, impelled by solicitude for humanity and its fate. Small is the number of them that see with their own eyes and feel with their own hearts. But it is their strength that will decide whether the human race must relapse into that state of stupor which a deluded multitude appears today to regard as the ideal. (Albert Einstein, 1934)
The armament industry is indeed one of the greatest dangers that beset mankind. It is the hidden evil power behind the nationalism which is rampant everywhere ... (Albert Einstein, 1934)
Last year I asked a well-known American diplomat why Japan was not forced by a commercial boycott to desist from her policy of force. Our commercial interests are too strong was the answer. How can one help people who rest satisfied with a statement like that? You believe that a word from me would suffice to get something done in this sphere? What an illusion! People flatter me as long as I do not get in their way. But if I direct my efforts toward objects which do not suit them, they immediately turn to abuse and calumny in defence of their interests. And the onlookers mostly keep out of the limelight, the cowards! Have you ever tested the civil courage of your countrymen? The silently accepted motto is Leave it alone and say nothing about it. (Albert Einstein, 1934)
Active Pacifism We must not conceal from ourselves that no improvement in the present depressing situation is possible without a severe struggle; for the handful of those who are really determined to do something is minute in comparison with the mass of the lukewarm and the misguided. And those who have an interest in keeping the machinery of war going are a very powerful body; they will stop at nothing to make public opinion subservient to their murderous ends. (Albert Einstein, 1934)
Atomic War or Peace I do not believe that a great era of atomic science is to be assured by organising science, in the way large corporations are organised. One can organise to apply a discovery already made, but not to make one. Only a free individual can make a discovery. There can be a kind of organising by which scientists are assured their freedom and proper conditions of work. Professors of science in American universities, for instance, should be relieved of some of their teaching so as to have time for more research. Can you imagine an organisation of scientists making the discoveries of Charles Darwin? (Albert Einstein, 1945)
National Security The ghostlike character of this development lies in its apparently compulsory trend. Every step appears as the unavoidable consequence of the preceding one. In the end, there beckons more and more clearly general annihilation.
Is there any way out of this impasse created by man himself? All of us, and particularly those who are responsible for the attitude of the U.S.A. and the U.S.S.R., should realise that we may have vanquished an external enemy, but have been incapable of getting rid of the mentality created by the war. It is impossible to achieve peace as long as every single action is taken with a possible future conflict in view. The leading point of view of all political action should therefore be: what can we do to bring about a peaceful coexistence and even loyal cooperation of the nations? The first problem is to do away with mutual fear and distrust. Solemn renunciation of violence (not only with respect to means of mass destruction) is undoubtedly necessary. Such renunciation, however, can be effective only if at the same time a supranational judicial and executive body is set up empowered to decide questions of immediate concern to the security of the nations. Even a declaration of the nations to collaborate loyally in the realisation of such a restricted world government would considerably reduce the imminent danger of war.
In the last analysis, every kind of peaceful cooperation among men is primarily based on mutual trust and only secondly on institutions such as courts of justice and police. This holds for nations as well as for individuals. And the basis of trust is loyal give and take. (Albert Einstein's contribution to Mrs Eleanor Roosevelt's television programme concerning the implications of the H-Bomb, February 13, 1950)
The Pursuit Of Peace A: There is a very simple answer. If we have courage to decide ourselves for peace, we will have peace. Q: How? A:By the firm will to reach agreement. This is axiomatic. We are not engaged in a play but in a condition of utmost danger to existence. If you are not firmly decided to resolve things in a peaceful way, you will never come to a peaceful solution. (Albert Einstein, U.N. Radio Interview, 1950)
Culture Must Be One Of The Foundations For World Understanding It is understood that, in the long run, an all destroying conflict can be avoided only by the setting up of a world federation of nations. (Albert Einstein, 1951)
On The Abolition Of The Threat Of War My part in producing the atomic bomb consisted in a single act: I signed a letter to President Roosevelt, pressing the need for experiments on a larger scale in order to explore the possibilities for the production of an atomic bomb. I was fully aware of the terrible danger to mankind in case this attempts succeeded. But the likelihood that the Germans were working on the same problem with a chance of succeeding forced me to this step. I could do nothing else although I have always been a convinced pacifist. To my mind, to kill in war is not a whit better than to commit ordinary murder. (Albert Einstein, 1952)
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Albert Einstein: Science / Physics Quotations
Albert Einstein observed that specialization is invariably damaging to Science as a whole;
The area of scientific knowledge has been enormously extended, and theoretical knowledge has become vastly more profound in every department of science. But the assimilative power of the human intellect is and remains strictly limited. Hence it was inevitable that the activity of the individual investigator should be confined to a smaller and smaller section of human knowledge. Worse still, this specialization makes it increasingly difficult to keep even our general understanding of science as a whole, without which the true spirit of research is inevitably handicapped, in step with scientific progress. Every serious scientific worker is painfully conscious of this involuntary relegation to an ever-narrowing sphere of knowledge, which threatens to deprive the investigator of his broad horizon and degrades him to the level of a mechanic ...It is just as important to make knowledge live and to keep it alive as to solve specific problems. (Albert Einstein, 1954)
To raise new questions, new possibilities, to regard old questions from a new angle, requires creative imagination and marks real advances in science. (Albert Einstein)
Scientific research is based on the idea that everything that takes place is determined by laws of nature, and therefore this holds for the action of people. (Albert Einstein, 1954)
The supreme task of the physicist is to arrive at those universal elementary laws from which the cosmos can be built up by pure deduction. (Albert Einstein, 1954)
Physics constitutes a logical system of thought which is in a state of evolution, whose basis (principles) cannot be distilled, as it were, from experience by an inductive method, but can only be arrived at by free invention. The justification (truth content) of the system rests in the verification of the derived propositions (a priori/logical truths) by sense experiences (a posteriori/empirical truths). ... Evolution is proceeding in the direction of increasing simplicity of the logical basis (principles). .. We must always be ready to change these notions - that is to say, the axiomatic basis of physics - in order to do justice to perceived facts in the most perfect way logically. (Albert Einstein, Physics and Reality, 1936)
It is the grand object of all theory to make these irreducible elements (axioms/assumptions) as simple and as few in number as possible, without having to renounce the adequate representation of any empirical content whatever. (Albert Einstein, 1954)
The supreme task of the physicist is to arrive at those universal elementary laws from which the cosmos can be built up by pure deduction. There is no logical path to these laws; only intuition, resting on sympathetic understanding of experience, can reach them. (Albert Einstein, 1918)
The development during the present century is characterized by two theoretical systems essentially independent of each other: the theory of relativity and the quantum theory. The two systems do not directly contradict each other; but they seem little adapted to fusion into one unified theory. For the time being we have to admit that we do not possess any general theoretical basis for physics which can be regarded as its logical foundation. (Albert Einstein, 1940)
If, then, it is true that the axiomatic basis of theoretical physics cannot be extracted from experience but must be freely invented, can we ever hope to find the right way? I answer without hesitation that there is, in my opinion, a right way, and that we are capable of finding it. I hold it true that pure thought can grasp reality, as the ancients dreamed. (Albert Einstein, 1954)
Can we visualize a 3D universe which is finite yet unbounded? (Albert Einstein, 1954)
The results of calculation indicate that if matter be distributed uniformly, the universe would necessarily be spherical.I must not fail to mention that a theoretical argument can be adduced in favour of the hypothesis of a finite universe. The general theory of relativity teaches that the inertia of a given body is greater as there are more ponderable masses in proximity to it; thus it seems very natural to reduce the total inertia of a body to interactions between it and the other bodies in the universe,as indeed, ever since Newton’s time, gravity has been completely reduced to interaction between bodies. (Albert Einstein, 1954)
I must not fail to mention that a theoretical argument can be adduced in favour of the hypothesis of a finite universe. The general theory of relativity teaches that the inertial mass of a given body is greater as there are more ponderable masses in proximity to it; thus it seems very natural to reduce the total inertia of a body to interactions between it and the other bodies in the universe, as indeed, ever since Newton’s time, gravity has been completely reduced to interaction between bodies. (Albert Einstein, 1954)
The non-mathematician is seized by a mysterious shuddering when he hears of 'four-dimensional' things, by a feeling not unlike that awakened by thoughts of the occult. And yet there is no more common-place statement than that the world in which we live is a four-dimensional space-time continuum. Space is a three-dimensional continuum. ... Similarly, the world of physical phenomena which was briefly called 'world' by Minkowski is naturally four dimensional in the space-time sense. For it is composed of individual events, each of which is described by four numbers, namely, three space co-ordinates x, y, z, and the time co-ordinate t. (Albert Einstein, 1954)
But the path (of general relativity) was thornier than one might suppose, because it demanded the abandonment of Euclidean geometry. This is what we mean when we talk of the 'curvature of space'. The fundamental concepts of the 'straight line', the 'plane', etc., thereby lose their precise significance in physics. In the general theory of relativity the doctrine of space and time, or kinematics, no longer figures as a fundamental independent of the rest of physics. The geometrical behaviour of bodies and the motion of clocks rather depend on gravitational fields which in their turn are produced by matter. (Albert Einstein, 1919)
In the year nineteen hundred, in the course of purely theoretical (mathematical) investigation, Max Planck made a very remarkable discovery: the law of radiation of bodies as a function of temperature could not be derived solely from the Laws of Maxwellian electrodynamics. To arrive at results consistent with the relevant experiments, radiation of a given frequency f had to be treated as though it consisted of energy atoms (photons) of the individual energy hf, where h is Planck's universal constant. This discovery became the basis of all twentieth-century research in physics and has almost entirely conditioned its development ever since. Without this discovery it would not have been possible to establish a workable theory of molecules and atoms and the energy processes that govern their transformations. Moreover, it has shattered the whole framework of classical mechanics and electrodynamics and set science a fresh task: that of finding a new conceptual basis for all physics. Despite remarkable partial gains, the problem is still far from a satisfactory solution. (Albert Einstein, 1954)
The development during the present century is characterized by two theoretical systems essentially independent of each other: the theory of relativity and the quantum theory. The two systems do not directly contradict each other; but they seem little adapted to fusion into one unified theory. For the time being we have to admit that we do not possess any general theoretical basis for physics which can be regarded as its logical foundation. ...If, then, it is true that the axiomatic basis of theoretical physics cannot be extracted from experience but must be freely invented, can we ever hope to find the right way? I answer without hesitation that there is, in my opinion, a right way, and that we are capable of finding it. I hold it true that pure thought can grasp reality, as the ancients dreamed. (Albert Einstein, 1954)
Experiments on interference made with particle rays have given brilliant proof that the wave character of the phenomena of motion as assumed by the theory does, really, correspond to the facts. (Albert Einstein, 1954)
de Broglie conceived an electron revolving about the atomic nucleus as being connected with a hypothetical wave train, and made intelligible to some extent the discrete character of Bohr's 'permitted' paths by the stationary (standing) character of the corresponding waves. (Albert Einstein, 1954)
How can one assign a discrete succession of energy values E to a system specified in the sense of classical mechanics (the energy function is a given function of the co-ordinates x and the corresponding momenta mv)? Planck's constant h relates the frequency f =E/h to the energy values E. It is therefore sufficient to assign to the system a succession of discrete frequency f values. This reminds us of the fact that in acoustics a series of discrete frequency values is coordinated to a linear partial differential equation (for given boundary conditions) namely the sinusoidal periodic solutions. In corresponding manner, Schrodinger set himself the task of coordinating a partial differential equation for a scalar wave function to the given energy function E (x, mv), where the position x and time t are independent variables. (Albert Einstein, 1954)
The de Broglie-Schrodinger method, which has in a certain sense the character of a field theory, does indeed deduce the existence of only discrete states, in surprising agreement with empirical facts. It does so on the basis of differential equations applying a kind of resonance argument. (Albert Einstein, 1954)
It seems to be clear, therefore, that Born's statistical interpretation of quantum theory is the only possible one. The wave function does not in any way describe a state which could be that of a single system; it relates rather to many systems, to an 'ensemble of systems' in the sense of statistical mechanics. (Albert Einstein, 1954)
Thus the last and most successful creation of theoretical physics, namely quantum mechanics (QM), differs fundamentally from both Newton's mechanics, and Maxwell's e-m field. For the quantities which figure in QM's laws make no claim to describe physical reality itself, but only probabilities of the occurrence of a physical reality that we have in view. … I cannot but confess that I attach only a transitory importance to this interpretation. I still believe in the possibility of a model of reality - that is to say, of a theory which represents things themselves and not merely the probability of their occurrence. On the other hand, it seems to me certain that we must give up the idea of complete localization of the particle in a theoretical model. This seems to me the permanent upshot of Heisenberg's principle of uncertainty. (Albert Einstein, 1954)
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Albert Einstein: Knowledge, Education & Freedom Quotations
Knowledge of the history and evolution of our ideas is absolutely vital for wise understanding. It is also important to read the original source (not a later interpretation which often leads to misrepresentation and error) and that these original quotes should give confidence to the truth of what we say. As Albert Einstein astutely remarks;
Somebody who only reads newspapers and at best books of contemporary authors looks to me like an extremely near-sighted person who scorns eyeglasses. He is completely dependent on the prejudices and fashions of his times, since he never gets to see or hear anything else. And what a person thinks on his own without being stimulated by the thoughts and experiences of other people is even in the best case rather paltry and monotonous. There are only a few enlightened people with a lucid mind and style and with good taste within a century. What has been preserved of their work belongs among the most precious possessions of mankind. We owe it to a few writers of antiquity (Plato, Aristotle, etc.) that the people in the Middle Ages could slowly extricate themselves from the superstitions and ignorance that had darkened life for more than half a millennium. Nothing is more needed to overcome the modernist's snobbishness. (Albert Einstein, 1954)
Symptoms Of Cultural Decay The free, unhampered exchange of ideas and scientific conclusions is necessary for the sound development of science, as it is in all spheres of cultural life. (Albert Einstein, 1952)
... knowledge must continually be renewed by ceaseless effort, if it is not to be lost. It resembles a statue of marble which stands in the desert and is continually threatened with burial by the shifting sand. The hands of service must ever be at work, in order that the marble continue to lastingly shine in the sun. To these serving hands mine shall also belong. (Albert Einstein, On Education, 1950)
When, after several hours reading, I came to myself again, I asked myself what it was that had so fascinated me. The answer is simple. The results were not presented as ready-made, but scientific curiosity was first aroused by presenting contrasting possibilities of conceiving matter. Only then the attempt was made to clarify the issue by thorough argument. The intellectual honesty of the author makes us share the inner struggle in his mind. It is this which is the mark of the born teacher. Knowledge exists in two forms - lifeless, stored in books, and alive, in the consciousness of men. The second form of existence is after all the essential one; the first, indispensable as it may be, occupies only an inferior position. (Albert Einstein, 1954)
My dear children: I rejoice to see you before me today, happy youth of a sunny and fortunate land. Bear in mind that the wonderful things that you learn in your schools are the work of many generations, produced by enthusiastic effort and infinite labour in every country of the world. All this is put into your hands as your inheritance in order that you may receive it, honour it, and add to it, and one day faithfully hand it on to your children. Thus do we mortals achieve immortality in the permanent things which we create in common. If you always keep that in mind you will find meaning in life and work and acquire the right attitude towards other nations and ages. (Albert Einstein talking to a group of school children. 1934.)
Numerous are the academic chairs, but rare are wise and noble teachers. Numerous and large are the lecture halls, but far from numerous the young people who genuinely thirst for truth and justice. Numerous are the wares that nature produces by the dozen, but her choice products are few. We all know that, so why complain? Was it not always thus and will it not always thus remain? Certainly, and one must take what nature gives as one finds it. But there is also such a thing as a spirit of the times, an attitude of mind characteristic of a particular generation, which is passed on from individual to individual and gives its distinctive mark to a society. Each of us has to his little bit toward transforming this spirit of the times. (Albert Einstein, 1954)
On Freedom 1. Those instrumental goods which should serve to maintain the life and health of all human beings should be produced by the least possible labor of all. 2. The satisfaction of physical needs is indeed the indispensable precondition of a satisfactory existence, but in itself it is not enough. In order to be content, men must also have the possibility of developing their intellectual and artistic powers to whatever extent accords with their personal characteristics and abilities. The first of these two goals requires the promotion of all knowledge relating to the laws of nature and the laws of social processes, that is, the promotion of all scientific endeavour. For scientific endeavour is a natural whole, the parts of which mutually support one another in a way which, to be sure, no one can anticipate. (Albert Einstein, 1940)
The development of science and of the creative activities of the spirit in general requires still another kind of freedom, which may be characterised as inward freedom. It is this freedom of spirit which consists in the independence of thought from the restrictions of authoritarian and social prejudices as well as from unphilosophical routinizing and habit in general. This inward freedom is an infrequent gift of nature and a worthy objective for the individual...schools may favour such freedom by encouraging independent thought. Only if outward and inner freedom are constantly and consciously pursued is there a possibility of spiritual development and perfection and thus of improving man's outward and inner life. (Albert Einstein, 1940)
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Albert Einstein on Metaphysics and PhilosophyRemarks on Bertrand Russell’s Theory of Knowledge
In the evolution of philosophical thought through the centuries the following question has played a major role: what knowledge is pure thought able to supply independently of sense perception? Is there any such knowledge? If not, what precisely is the relation between our knowledge and the raw material furnished by sense impressions?
There has been an increasing skepticism concerning every attempt by means of pure thought to learn something about the 'objective world', about the world of 'things' in contrast to the world of 'concepts and ideas'. During philosophy's childhood it was rather generally believed that it is possible to find everything which can be known by means of mere reflection. It was an illusion which anyone can easily understand if, for a moment, he dismisses what he has learned from later philosophy and from natural science; he will not be surprised to find that Plato ascribed a higher reality to 'ideas' than to empirically experienceable things. Even in Spinoza and as late as in Hegel this prejudice was the vitalising force which seems still to have played the major role.
The more aristocratic illusion concerning the unlimited penetrative power of thought has as its counterpart the more plebeian illusion of naive realism, according to which things 'are' as they are perceived by us through our senses. This illusion dominates the daily life of men and of animals; it is also the point of departure in all of the sciences, especially of the natural sciences.
As Russell wrote;
'We all start from naive realism, i.e., the doctrine that things are what they seem. We think that grass is green, that stones are hard, and that snow is cold. But physics assures us that the greenness of grass, the hardness of stones, and the coldness of snow are not the greenness, hardness, and coldness that we know in our own experience, but something very different. The observer, when he seems to himself to be observing a stone, is really, if physics is to be believed, observing the effects of the stone upon himself.'
Gradually the conviction gained recognition that all knowledge about things is exclusively a working-over of the raw material furnished by the senses. Galileo and Hume first upheld this principle with full clarity and decisiveness. Hume saw that concepts which we must regard as essential, such as, for example, causal connection, cannot be gained from material given to us by the senses. This insight led him to a skeptical attitude as concerns knowledge of any kind.
Man has an intense desire for assured knowledge. That is why Hume's clear message seemed crushing: the sensory raw material, the only source of our knowledge,through habit may lead us to belief and expectation but not to the knowledge and still less to the understanding of lawful relations.Then Kant took the stage with an idea which, though certainly untenable in the form in which he put it, signified a step towards the solution of Hume's dilemma: whatever in knowledge is of empirical origin is never certain. If, therefore, we have definitely assured knowledge,it must be grounded in reason itself. This is held to be the case, for example, in the propositions of geometry and the principles of causality.
These and certain other types of knowledge are, so to speak, a part of the implements of thinking and therefore do not previously have to be gained from sense data (i.e. they are a priori knowledge). Today everyone knows, of course, that the mentioned concepts contain nothing of the certainty, of the inherent necessity, which Kant had attributed to them. The following, however, appears to me to be correct in Kant's statement of the problem: in thinking we use with a certain right, concepts to which there is no access from the materials of sensory experience, if the situation is viewed from the logical point of view. As a matter of fact, I am convinced that even much more is to be asserted: the concepts which arise in our thought and in our linguistic expressions are all- when viewed logically- the free creations of thought which cannot inductively be gained from sense experiences. This is not so easily noticed only because we have the habit of combining certain concepts and conceptual relations (propositions) so definitely with certain sense experiences that we do not become conscious of the gulf- logically unbridgeable- which separates the world of sensory experiences from the world of concepts and propositions. Thus, for example, the series of integers is obviously an invention of the human mind, a self-created tool which simplifies the ordering of certain sensory experiences. But there is no way in which this concept could be made to grow, as it were, directly out of sense experiences.
As soon as one is at home in Hume's critique one is easily led to believe that all those concepts and propositions which cannot be deduced from the sensory raw material are, on account of their 'metaphysical' character, to be removed from thinking. For all thought acquires material content only through its relationship with that sensory material. This latter proposition I take to be entirely true; but I hold the prescription for thinking which is grounded on this proposition to be false. For this claim- if only carried through consistently- absolutely excludes thinking of any kind as 'metaphysical'. In order that thinking might not degenerate into 'metaphysics', or into empty talk, it is only necessary that enough propositions of the conceptual system be firmly enough connected with sensory experiences and that the conceptual system, in view of its task of ordering and surveying sense experience, should show as much unity and parsimony as possible. Beyond that, however, the 'system' is (as regards logic) a free play with symbols according to (logically) arbitrarily given rules of the game. All this applies as much (and in the same manner) to the thinking in daily life as to the more consciously and systematically constructed thinking in the sciences.
By his clear critique Hume did not only advance philosophy in a decisive way but also- though through no fault of his- created a danger for philosophy in that, following his critique, a fateful 'fear of metaphysics' arose which has come to be a malady of contemporary empiricist philosophising; this malady is the counterpart to that earlier philosophising in the clouds, which thought it could neglect and dispense with what was given by the senses. ... It finally turns out that one can, after all, not get along without metaphysics.
(Albert Einstein, Ideas and Opinions, 1944)
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Albert Einstein Quotes on Religion
The most beautiful and most profound experience is the sensation of the mystical. It is the sower of all true science. He to whom this emotion is a stranger, who can no longer wonder and stand rapt in awe, is as good as dead. To know that what is impenetrable to us really exists, manifesting itself as the highest wisdom and the most radiant beauty which our dull faculties can comprehend only in their primitive forms - this knowledge, this feeling is at the center of true religiousness. ( Albert Einstein - The Merging of Spirit and Science)
The religion of the future will be a cosmic religion. It should transcend personal God and avoid dogma and theology. Covering both the natural and the spiritual, it should be based on a religious sense arising from the experience of all things natural and spiritual as a meaningful unity. Buddhism answers this description. If there is any religion that could cope with modern scientific needs it would be Buddhism. (Albert Einstein)
It was, of course, a lie what you read about my religious convictions, a lie which is being systematically repeated. I do not believe in a personal God and I have never denied this but have expressed it clearly. If something is in me which can be called religious then it is the unbounded admiration for the structure of the world so far as our science can reveal it. (Albert Einstein, 1954) From Albert Einstein: The Human Side, edited by Helen Dukas and Banesh Hoffman, Princeton University Press
Scientific research is based on the idea that everything that takes place is determined by laws of nature, and therefore this holds for the action of people. For this reason, a research scientist will hardly be inclined to believe that events could be influenced by a prayer, i.e. by a wish addressed to a Supernatural Being. (Albert Einstein, 1936) Responding to a child who wrote and asked if scientists pray. Source: Albert Einstein: The Human Side, Edited by Helen Dukas and Banesh Hoffmann
A man's ethical behaviour should be based effectually on sympathy, education, and social ties and needs; no religious basis is necessary. Man would indeed be in a poor way if he had to be restrained by fear of punishment and hope of reward after death. (Albert Einstein, Religion and Science, New York Times Magazine, 9 November 1930
I cannot conceive of a God who rewards and punishes his creatures, or has a will of the kind that we experience in ourselves. Neither can I nor would I want to conceive of an individual that survives his physical death; let feeble souls, from fear or absurd egoism, cherish such thoughts. I am satisfied with the mystery of the eternity of life and with the awareness and a glimpse of the marvelous structure of the existing world, together with the devoted striving to comprehend a portion, be it ever so tiny, of the Reason that manifests itself in nature. (Albert Einstein, The World as I See It)
I cannot imagine a God who rewards and punishes the objects of his creation, whose purposes are modeled after our own -- a God, in short, who is but a reflection of human frailty. Neither can I believe that the individual survives the death of his body, although feeble souls harbour such thoughts through fear or ridiculous egotisms. (Albert Einstein, obituary in New York Times, 19 April 1955)
I believe in Spinoza's God who reveals himself in the orderly harmony of what exists, not in a God who concerns himself with the fates and actions of human beings. (Albert Einstein) Following his wife's advice in responding to Rabbi Herbert Goldstein of the International Synagogue in New York, who had sent Einstein a cablegram bluntly demanding Do you believe in God? Quoted from and citation notes derived from Victor J. Stenger, Has Science Found God? (draft: 2001), chapter 3.
One strength of the Communist system ... is that it has some of the characteristics of a religion and inspires the emotions of a religion. (Albert Einstein, Out Of My Later Years, 1950)
http://www.positiveatheism.org/hist/quotes/quote-e.htm
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Albert Einstein: Quotes on Morality & Human Rights
What the individual can do is to give a fine example, and to have the courage to uphold ethical values .. in a society of cynics. (Albert Einstein, letter to Max Born.)
Science has therefore been charged with undermining morality, but the charge is unjust. A man's ethical behaviour should be based effectually on sympathy, education, and social ties and needs; no religious basis is necessary. Man would indeed be in a poor way if he had to be restrained by fear of punishment and hope of reward after death. (Albert Einstein, 1954)
There is nothing divine about morality; it is a purely human affair. (Albert Einstein, 1954)
Realising the healthy international relations can be created only among populations made up of individuals who themselves are healthy and enjoy a measure a independence, the United Nations elaborated a Universal Declaration of Human Rights, which was adopted by the U.N. General Assembly on December 10, 1948. (Albert Einstein, 1951)
The existence and validity of human rights are not written in the stars. The ideals concerning the conduct of men toward each other and the desirable structure of the community have been conceived and taught by enlightened individuals in the course of history. Those ideals and convictions which resulted from historical experience, from the craving for beauty and harmony, have been readily accepted in theory by man- and at all times, have been trampled upon by the same people under the pressure of their animal instincts. A large part of history is therefore replete with the struggle for those human rights, an eternal struggle in which a final victory can never be won. But to tire in that struggle would mean the ruin of society. (Albert Einstein, 1954)
In talking about human rights today, we are referring primarily to the following demands: protection of the individual against arbitrary infringement by other individuals or by the government; the right to work and to adequate earnings from work; freedom of discussion and teaching; adequate participation of the individual in the formation of his government. These human rights are nowadays recognised theoretically, although, by abundant use of formalistic, legal manoeuvres, they are being violated to a much greater extent than even a generation ago. (Albert Einstein, 1954)
The Nuremberg Trial of the German war criminals was tacitly based on the recognition of the principle: criminal actions cannot be excused if committed on government orders; conscience supersedes the authority of the law of the state. (Albert Einstein, 1954)
Introduction to Albert Einstein Quotes - Humanity / Society - Government / Politics / Economics - War / Peace - Science - Knowledge / Education - Metaphysics / Philosophy - Religion - Morality / Human Rights - Collection of Albert Einstein Quotes - Top of Page
Collection of Albert Einstein Quotes
Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction.
Imagination is more important than knowledge.
Gravitation is not responsible for people falling in love.
I want to know God's thoughts; the rest are details.
The hardest thing in the world to understand is the income tax.
Reality is merely an illusion, albeit a very persistent one.
The only real valuable thing is intuition.
A person starts to live when he can live outside himself.
I am convinced that He (God) does not play dice.
God is subtle but he is not malicious.
Weakness of attitude becomes weakness of character.
I never think of the future. It comes soon enough.
The eternal mystery of the world is its comprehensibility.
Sometimes one pays most for the things one gets for nothing.
Science without religion is lame. Religion without science is blind.
Anyone who has never made a mistake has never tried anything new.
Great spirits have often encountered violent opposition from weak minds.
Everything should be made as simple as possible, but not simpler.
Common sense is the collection of prejudices acquired by age eighteen.
Science is a wonderful thing if one does not have to earn one's living at it.
The secret to creativity is knowing how to hide your sources.
The only thing that interferes with my learning is my education.
God does not care about our mathematical difficulties. He integrates empirically.
The whole of science is nothing more than a refinement of everyday thinking.
Technological progress is like an axe in the hands of a pathological criminal.
Peace cannot be kept by force. It can only be achieved by understanding.
The most incomprehensible thing about the world is that it is comprehensible.
We can't solve problems by using the same kind of thinking we used when we created them.
Education is what remains after one has forgotten everything he learned in school.
The important thing is not to stop questioning. Curiosity has its own reason for existing.
Do not worry about your difficulties in Mathematics. I can assure you mine are still greater.
Equations are more important to me, because politics is for the present, but an equation is something for eternity.
If A is a success in life, then A equals x plus y plus z. Work is x; y is play; and z is keeping your mouth shut.
Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe.
As far as the laws of mathematics refer to reality, they are not certain, as far as they are certain, they do not refer to reality.
Whoever undertakes to set himself up as a judge of Truth and Knowledge is shipwrecked by the laughter of the gods.
I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.
In order to form an immaculate member of a flock of sheep one must, above all, be a sheep.
The fear of death is the most unjustified of all fears, for there's no risk of accident for someone who's dead.
Too many of us look upon Americans as dollar chasers. This is a cruel libel, even if it is reiterated thoughtlessly by the Americans themselves.
Heroism on command, senseless violence, and all the loathsome nonsense that goes by the name of patriotism -- how passionately I hate them!
No, this trick won't work...How on earth are you ever going to explain in terms of chemistry and physics so important a biological phenomenon as first love?
My religion consists of a humble admiration of the illimitable superior spirit who reveals himself in the slight details we are able to perceive with our frail and feeble mind.
Yes, we have to divide up our time like that, between our politics and our equations. But to me our equations are far more important, for politics are only a matter of present concern. A mathematical equation stands forever.
The release of atom power has changed everything except our way of thinking...the solution to this problem lies in the heart of mankind. If only I had known, I should have become a watchmaker.
Great spirits have always found violent opposition from mediocrities. The latter cannot understand it when a man does not thoughtlessly submit to hereditary prejudices but honestly and courageously uses his intelligence.
The most beautiful thing we can experience is the mysterious. It is the source of all true art and all science. He to whom this emotion is a stranger, who can no longer pause to wonder and stand rapt in awe, is as good as dead: his eyes are closed.
A man's ethical behaviour should be based effectually on sympathy, education, and social ties; no religious basis is necessary. Man would indeed be in a poor way if he had to be restrained by fear of punishment and hope of reward after death.
The further the spiritual evolution of mankind advances, the more certain it seems to me that the path to genuine religiosity does not lie through the fear of life, and the fear of death, and blind faith, but through striving after rational knowledge.
Now he has departed from this strange world a little ahead of me. That means nothing. People like us, who believe in physics, know that the distinction between past, present, and future is only a stubbornly persistent illusion.
You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
One had to cram all this stuff into one's mind for the examinations, whether one liked it or not. This coercion had such a deterring effect on me that, after I had passed the final examination, I found the consideration of any scientific problems distasteful to me for an entire year.
...one of the strongest motives that lead men to art and science is escape from everyday life with its painful crudity and hopeless dreariness, from the fetters of one's own ever-shifting desires. A finely tempered nature longs to escape from the personal life into the world of objective perception and thought.
He who joyfully marches to music rank and file, has already earned my contempt. He has been given a large brain by mistake, since for him the spinal cord would surely suffice. This disgrace to civilization should be done away with at once. Heroism at command, how violently I hate all this, how despicable and ignoble war is; I would rather be torn to shreds than be a part of so base an action. It is my conviction that killing under the cloak of war is nothing but an act of murder.
A human being is a part of a whole, called by us universe, a part limited in time and space. He experiences himself, his thoughts and feelings as something separated from the rest ... a kind of optical delusion of his consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest to us. Our task must be to free ourselves from this prison by widening our circle of compassion to embrace all living creatures and the whole of nature in its beauty.
Not everything that counts can be counted, and not everything that can be counted counts. (Sign hanging in Einstein's office at Princeton)
Collected Quotes from Albert Einstein - Kevin Harris 1995http://rescomp.stanford.edu/~cheshire/EinsteinQuotes.html

Albert Einstein Quotes: Famous Quotations on Religion, Science, War, Peace, Education, Morality, Philosophy of Physics

Albert Einstein Quotes: Famous Quotations on Religion, Science, War, Peace, Education, Morality, Philosophy of Physics: "http://www.spaceandmotion.com/Albert-Einstein-Quotes.htm"

Sunday, November 16, 2008

The skew.org XML Tutorial

The skew.org XML Tutorial: "The skew.org XML Tutorial

A reintroduction to XML with an emphasis on character encoding

© 2002–2005 Mike J Brown.
Non-commercial distribution in whole or in part is permitted, provided authorship credit is preserved.
Latest revision: Tuesday, 23-Oct-2007 00:38:14 MDT
Foreword

The only slightly odd thing about this marvellous and indispensable document (with lots of other fine goodies on the same site) is the title, with its use of the terms 'tutorial' and 'reintroduction'. People with little or no knowledge of XML who went there might soon wander off again in puzzlement. And experienced XML practitioners might not look at the site at all, expecting it would be too elementary for them. You do need to know XML quite well to make much sense of what Mike offers; but even if you know XML very well indeed, Mike has things to say about encoding that you almost certainly either don't know at all, or haven't yet fully grasped. If only everyone writing XSLT processors, for example, had taken everything Mike says on board from the start, there would be a lot less encoding and transcoding headaches forever recurring on the relevant lists. A further point, which is not Mike's problem, but a general cultural issue. He uses a few key terms ('abstract', 'mapping' etc) in a precise mathematical/comp. sci. way, and unless you understand those precise senses you won't fully follow what he is saying. Since I'm writing for people with a Humanities background, I have regrettably to sound that warning, because so many people in the Humanities foolishly pride themselves on their militant ignorance of basic mathematical terminology and concepts Where would we be if scientists took the same view? If I'm writing a review for the TLS, I naturally avoid some of the more technical linguistic or analytical vocabulary I freely use when writing an article for a scholarly journal. But I'd be very surprised if the TLS editor complained because my copy used hard words like 'narrator', 'sonnet' or 'parody' that a 'general reader' couldn't possibly be expected to understand. None of Mike's use of 'technical' vocabulary in this piece goes beyond the analogous domain in his own area of expertise, so if you don't understand his terms, educate yourself rather than giving up or complaining. You know it makes sense!

-- Michael Beddow
Introduction

This is a crash course in some essential concepts for software developers who are reading and writing XML documents on a regular basis. It is oriented toward people with some sort of programming background. The intended audience should already understand some basic things like what bits and bytes are, how to read hexadecimal numbers, what characters are, and they should be comfortable with phrases like 'hierarchical data model'.

I wrote this document after having the realization, in early 2000, that every published XML reference I have seen does not adequately explain some fundamental concepts that, in my opinion, are essential to understand before trying to do any serious development with XML. It might seem strange that examples of how to write an XML document are not introduced until well into the tutorial. I feel strongly that the proper way to learn this material is to understand the founding concepts and principles first. Then the details of the syntax become almost incidental.
1. Characters, The Unicode Standard, and ISO/IEC 10646-1

Why you need to know this: XML is specified in terms of allowable sequences of 'characters' as defined by the ISO/IEC 10646-1:1993 international standard, which is almost, but not quite, the same thing as The Unicode Standard version 2.0.
1.1 Graphemes and characters

In written languages, a grapheme is an unit, expressed as some kind of mark, that conveys basic information essential to the language. Letters, numbers, punctuation and diacritics are examples of graphemes found in the Latin script (component of a writing system) that is used to write in English, French, Spanish, German, Vietnamese, and various other languages. Graphemes are abstract concepts; any time you write or see the capital letter 'A' rendered on paper or on your computer screen, you recognize it as being the same grapheme — the letter A — regardless of the font or handwriting or medium in which it is written. The actual marks you see are allographs, or glyphs, that represent the 'A' grapheme.

In computing and telecommunications, dividing the basic marks of a writing system into graphemes is helpful, but is not sufficient, on its own, to reproduce written text, since there is more to writing than just spewing a stream of graphemes. Therefore, in these contexts, graphemes are represented by abstract units called characters. For example, the capital letter 'A' is a grapheme that, in computing and telecom, is represented by the capital letter 'A' character.

Characters don't always represent things that people who study writing systems would consider to be true graphemes. For example, instead of representing an individual grapheme, a character might represent a particular combination of graphemes: the small letter 'e' grapheme with a grave accent grapheme over it, '`', can be, in computing and telecom, a single 'small e with grave' character, 'è'.

Characters can also embody other units that are not graphemes. For example, in the Latin script, we need to put space between words. In the other scripts, such space isn't necessary, although the concept of word separation is still useful. So there exist special characters that represent different kinds of word, line, and paragraph separators and/or literal, visible space. There are also special characters that don't manifest in writing at all, but are rather just exist in order to convey instructions to a mechanical device (tab, line feed, carriage return, and form feed characters, for example) or to provide hints for interpreting or rendering subsequent characters.
1.2. The Unicode Standard

The Unicode Standard is a character coding system designed to support the interchange, processing, and display of the written texts of the diverse languages of the modern world. It is a product of The Unicode Consortium. The Unicode Consortium is a group of major computer corporations, software producers, database vendors, research institutions, international agencies, various user groups, and interested individuals.
1.3. ISO/IEC 10646-1

Since 1991 the Unicode Consortium has worked with the International Organization for Standardization (ISO) to develop the Unicode Standard and the international standard ISO 10646 in tandem. The character encoding portion of Version 2.0 of the Unicode Standard is identical to ISO/IEC 10646-1:1993 plus its first seven published amendments. Unicode 3.0 was published in February 2000 and its relevant portions were later adopted as ISO/IEC 10646-1:2000. Although there are newer versions of Unicode that correspond with ISO/IEC 10646-1 and 10646-2 combined, this tutorial primarily addresses the versions available prior to 2000, since XML does not reference the newer versions.

ISO/IEC 10646-1 defines and is also known as the Universal Character Set, or UCS.
1.3. The Unicode Standard vs. ISO/IEC 10646-1

In general, the terms Unicode and UCS are interchangeable because the two specifications share the following characteristics:

* They both assign the same values and descriptions to all the same characters
* They both specify the same levels of implementation
* They both use a 16-bit code space (this will be explained later)
* They both specify the UTF-8 and UTF-16 character encoding forms (this will also be explained later)

Unicode differs from ISO/IEC 10646-1 in the following significant ways:

* The Unicode Standard specifies semantics, properties and rendering algorithms for characters; ISO/IEC 10646-1 does not.
* The Unicode Standard does not acknowledge (but likewise does not prohibit) the UCS-2 and UCS-4 character encoding forms found in ISO/IEC 10646-1. (explained later)
* The Unicode Standard is a relatively affordable printed publication that can be purchased through any bookseller and is supplemented by many online materials at http://www.unicode.org/. The entire Unicode Standard itself is also now available online, but ISO/IEC 10646-1 is an expensive printed publication that can only be purchased through ISO partners and has no online edition.

There are a few other less significant, but still important, differences that are described in Tony Graham's excellent book Unicode - A Primer (ISBN 0-7645-4625-2). The title of this book is somewhat misleading, as it seems to be less a primer and more a technical encyclopedia, but it's still worth perusing, as it explains many aspects of Unicode in prose that is relatively easy to comprehend.

Note: Unless otherwise stated, any further references to Unicode in this document mean The Unicode Standard, version 3.0.
2. The Unicode/UCS character encoding model

Why you need to know this: XML documents consist, at a granular level, of abstract characters that have had several encoding mechanisms applied to them. In order to consistently author, store, transmit and process XML documents, there must be an awareness of the encodings that are being or have been applied.
2.1 Character names and the need for encoding

The basic idea of Unicode and the UCS is that a set of abstract objects called characters can be represented by at least one descriptive name and also by at least one unique number. The names are not canonical because they translated to many languages for different publications of the standard. The numbers are constant and canonical.

A character's number is abstract to computers because there are many different ways of representing numbers in an information processing architecture. So, Unicode and the UCS prescribe a model for information systems to store, exchange and process character data.
2.2. Character encodings - assignment of unique numbers to abstract characters

In general, a set of abstract characters is a character repertoire.

A code space is a set of numbers called code points, or code positions. These numbers are scalar values: non-negative, not-necessarily-contiguous integers.

A mapping of abstract characters from a character repertoire to code points is called a coded character set. Other names for such mappings are character encoding, coded character repertoire, character set definition, or code page. Each combination of an abstract character and its code point in a coded character set is an encoded character. A coded character set can reserve code points for special purposes other than mapping to abstract characters.

Aside from the Universal Character Set and Unicode, other popular coded character sets include the following subsets:

* The WGL4 (Windows Glyph List) defined by Microsoft and Agfa Monotype, which is a repertoire of 560 abstract characters implemented by most MS Windows fonts. It is a subset of Unicode plus two private use characters, encompassing the characters in ISO 6937 plus all Microsoft/IBM 8-bit code pages. Since WGL4 is defined as a subset of Unicode, it can be considered a coded character set.
* The AGL (Adobe Glyph List), a superset of WGL4.

(Despite being called 'Glyph' Lists, they are actually character-to-code point mappings.)

Note: The '0x' notation used in this document is the C language's notation for hexadecimal numbers. (Ref: ISO 9899). It is one of many possible notations for values in a base 16 system. There is no particular reason it is being used here other than that it seems to be a fairly widely recognized convention.

Unicode and the UCS define a coded character set in which each abstract character is mapped to a code point in the range 0x0..0x10FFFF (0 through 1,114,111 decimal). This code space is divided into 17 planes of 65,536 (0x10000) code points each. The first plane, encompassing code points 0x0..0xFFFF, is called the Basic Multilingual Plane, or BMP, and it covers all of the characters in common use in all of the modern languages of the world. It omits some less common characters as well as those that were used in arcane scripts; those characters are in the higher planes.

The Unicode Standard calls each of the code points in the 0x0..0x10FFFF code space a Unicode scalar value. Each Unicode scalar value uniquely identifies the character assigned to that code point, if such an assignment has been made. There are certain ranges of Unicode scalar values that are not assigned to characters by the standard; they are reserved for special functions or future extension mechanisms. There are also code points that have been assigned to unspecified, privately-defined characters.

In the diagram below, each green cloud shows a code point and each beige cloud shows a character name. Each combination of code point + character name is an encoded character ('encoded' just in the sense that is has been associated with a number).
Diagram: Unicode model for character encodings

Unicode allows certain encoded characters to be combined in sequences in order to represent abstract characters that may or may not have other encoded character representations. That is, one or more encoded characters can together represent, through equivalence, a single abstract character. For example, as shown by one of the pink clouds in the diagram above, the character LATIN CAPITAL LETTER A (code point 0x41) followed by the 'combining' character COMBINING RING ABOVE (code point 0x30A) are two separate characters that are not only equivalent to the single 'compatibility' character LATIN CAPITAL LETTER A WITH RING ABOVE (code point 0xC5), but also to the equivalent to the single character ANGSTROM SIGN (code point 0x212B).

Here are 3 ways of representing the Unicode scalar value of the Unicode character named 'ANGSTROM SIGN':

* in the C language's hexadecimal notation: 0x212B
* in decimal notation: 8491
* in EBNF notation: \v00212B

Here is a way of representing the abstract character itself, using its scalar value:

* in Unicode's deprecated 'U-' notation, which requires 8 hex digits: U-0000212B
* in Unicode's 'U+' notation, which requires 4 to 6 hex digits: U+212B
Note: using the 'U+' notation to represent a character by its code point is a convention introduced in Unicode 3.1 (most likely due to abuse of the old convention). Prior to Unicode 3.1, the 'U+' notation could only be used for Unicode code values, as described below, and required exactly 4 hex digits.

In prose, the 'U+' notation is the preferred way of referring to characters.
2.3. Encoding forms and code values - conversion of abstract character numbers to sequences of numbers that data processing devices can manipulate

Code values, or code units, are numbers that computers use to represent abstract objects and concepts like Unicode characters. Like code points, code values are typically non-negative integers, but code values usually only manifest in a fixed 8 bit, 16 bit, or 32 bit width. An encoding form is the mapping of a code point representing an abstract character in a coded character set to a sequence of one or more code values.

ISO/IEC 10646-1 defines a 32-bit encoding form called UCS-4, in which each encoded character in the UCS is represented by a 32-bit code value in the code space 0x0..0x7FFFFFFF (the most significant bit is not used). This encoding form is sufficient to represent all 0x10FFFF Unicode scalar values and then some. Some people consider this wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0x0..0x10FFFF code space. UTF-32 became part of the Unicode Standard in 2002, with the publication of Unicode Standard Annex #19, which was later incorporated into Unicode 4.0.

ISO/IEC 10646-1 also defines a 16-bit encoding form called UCS-2, in which a 16-bit code value in the code space 0x0..0xFFFF directly corresponds to an identical scalar value, but this form is, of course, inherently limited to representing only the first 65,536 scalar values.

The Unicode Standard and ISO/IEC 10646-1 both define two more important encoding forms: UTF-8 and UTF-16.

UTF-16 is a variation on UCS-2 that maps each Unicode scalar value to a unique sequence of up to two 16-bit code values. In UTF-16, each 16-bit code value in the 0x0..0xD7FF and 0xE000..0xFFFF code spaces directly corresponds to the same Unicode scalar value. A surrogate pair of 16-bit code values from the 0xD800..0xDFFF code space algorithmically represents a single Unicode scalar value in the range 0x010000..0x10FFFF. The first half of the pair is always in the 0xD800..0xDBFF range, and the second half of the pair is in the 0xDC00..0xDFFF range.
Unicode
scalar value UCS-4
code value
sequence UCS-2
code value
sequence UTF-16
code value
sequence
0x0 0x00000000 0x0000 0x0000
| | | |
0xD7FF 0x0000D7FF 0xD7FF 0xD7FF

Unicode scalar values omit
0xD800..0xDFFF

0xE000 0x0000E000 0xE000 0xE000
| | | |
0xFFFF 0x0000FFFF 0xFFFF 0xFFFF
0x10000 0x00010000 0xD800 0xDC00
| | |
0x10FFFF 0x0010FFFF 0xDBFF 0xDFFF
0x00110000
[ n/a ] |
0x7FFFFFFF

UTF-8 algorithmically maps each Unicode scalar value to a unique sequence of one to six 8-bit code values. The mechanism used by UTF-8 is relatively complex.

XML developers should at least know that the first 256 Unicode scalar values 0x0..0xFF intentionally coincide with identical code points and byte values in the ASCII (0x20..0x7F), ISO/IEC 8859-1 (0xA0..0xFF), and ISO/IEC 6429 (0x00..0x1F, 0x80-0x9F) standards. The UTF-8 sequences for the same range are shown in this table:
Unicode scalar value UTF-8 code value sequence
0x0..0x7F 0x00..0x7F
0x80..0xBF 0xC2 0x80 .. 0xC2 0xBF
0xC0..0xFF 0xC3 0x80 .. 0xC3 0xBF

Here are various ways to represent the abstract character named 'GOTHIC LETTER QAITHRA (=Q)', which is assigned to the Unicode scalar value 0x10335:

* as a Unicode scalar value, in Unicode's 'U-' notation: U-00010335
* as a Unicode scalar value, in Unicode's 'U+' notation: U+10335
* as a UCS-4 code value sequence, in C hex notation: 0x00010335
* as a UCS-2 code value sequence: illegal; out of range
* as a UTF-16 code value sequence, in C hex notation: 0xD800 0xDF35
* as a UTF-8 code value sequence, in C hex notation: 0xF0 0x90 0x8c 0xB5

2.3.1. Unicode values - representation of abstract characters as UTF-16 code value sequences

Starting with Unicode 3.1, the standard directly assigns abstract characters to Unicode scalar values (code points). Previous versions of Unicode only assigned each character to a sequence of 1 or 2 'Unicode values'. Unicode values are the code value sequences produced by the UTF-16 encoding form.

In order to retain backward compatibility with earlier versions of Unicode, Unicode 3.0 and ISO/IEC 10646-1:2000 adopted the UTF-16 encoding form as the basis for Unicode values, making UTF-16 the only official usage of the 0xD800..0xDFFF scalar range.

Prior to version 3.1, Unicode prescribed a 'U+xxxx' notation with 4 hex digits to designate a Unicode value in printed literature. A Unicode value sequence is considered equivalent to the abstract character it represents. Since these Unicode values were UTF-16 code values, encoded characters with scalar values in the 0x0..0xFFFF range were represented with one U+xxxx designation, and encoded characters with scalar values in the 0x010000..0x10FFFF range were represented with a pair of U+xxxx designations. For example, the character at code point 0x010000 was represented in the old notation by U+D800 U+DC00, but is represented in the new notation by U+10000.

Starting with Unicode 3.1, the 'U+' notation with 4 to 6 hex digits now designates a Unicode scalar value, not a code value. Code values are now written as 4 hex digits in angle brackets, separated by spaces when there is a sequence, like <D800 DC00>.

Unicode allows certain encoded characters to be combined in sequences in order to represent abstract characters that may or may not have other encoded character representations. That is, one or more encoded characters can together represent, through equivalence, a single abstract character. For example, as shown by one of the pink clouds in the diagram above, the character LATIN CAPITAL LETTER A (code point 0x41) followed by the 'combining' character COMBINING RING ABOVE (code point 0x30A) are two separate characters that are not only equivalent to the single 'compatibility' character LATIN CAPITAL LETTER A WITH RING ABOVE (code point 0xC5), but also to the equivalent to the single character ANGSTROM SIGN (code point 0x212B).

So here are three more ways to represent the abstract character named 'GOTHIC LETTER QAITHRA (=Q)':

* as a Unicode value pair, in EBNF notation: \uD800 \uDF35
* as a Unicode value pair, in Unicode 3.0's 'U+' notation: U+D800, U+DF35
* as a Unicode value pair, in Unicode 3.1's notation: <D800 DF35>

As precise as code values are, these representations are still too abstract for a computer to work with. Computers need code values to manifest as bits & bytes in a certain order. Character encoding schemes and character maps accomplish this.
2.4. Character encoding schemes - conversion of code values to byte sequences

An algorithm for converting code values to a sequence of 8-bit values (bytes or octets) for cross-platform data exchange is a character encoding scheme. Encoding forms that produce 7-bit or 8-bit code value sequences don't need additional processing, so UTF-8, for example, can be considered to be both a character encoding form and a character encoding scheme.

Other encoding forms, however, need to have a consistent mechanism applied to convert their 16-bit or 32-bit code value sequences to 8-bit sequences. Unicode 3.0 has the character encoding schemes UTF-16BE and UTF-16LE for this purpose. These work like UTF-16 but split each code value into a sequence of one or more pairs of bytes, with each byte pair being either in Big Endian order for UTF-16BE (i.e., the byte with the most significant bits comes first) or Little Endian order for UTF-16LE.

Continuing with the example, here are representations of GOTHIC LETTER QAITHRA (=Q) as a sequence of octets that a computer can use:

* UTF-16BE bytes: 11011000 00000000 11011111 00110101 (0xD800 0xDF35)
* UTF-16LE bytes: 00000000 11011000 00110101 11011111 (0x00D8 0x35DF)
* UTF-8 bytes: 11110000 10010000 10001100 10110101 (0xF0 0x90 0x8C 0xB5)

2.5. Character maps (character sets) - direct mappings of abstract characters to byte sequences

A character map correlates an abstract character in a character repertoire with a specific sequence of bytes, skipping the intermediate steps of code points, encoding forms, and encoding schemes. Other words for character map are character set, charset (i.e., what is used in Content-Type HTTP and MIME headers), charmap, or sometimes code page.

Character maps are what most people envision when they speak of 'character sets'. Examples of character maps are US-ASCII, ISO-8859-1, EUC-JP, KOI8-R, to name just a few.

A note about fonts: A font is, in general, just a collection of glyphs: visual representations of characters, or the necessary instructions for drawing those characters, in a particular, often decorative, style. A glyph that represents a character is no more that character than a painting of a tree is an actual tree. TrueType font files happen to contain a mapping of glyphs to Unicode code points. This makes it easy for a Unicode-aware operating system to obtain the rendering instructions for characters according to their code point.
3. XML document character syntax

Why you need to know this: In order to author XML documents, one must understand what sequences of what characters are allowed in an XML document, and how to find and interpret the syntax rules that are defined in the spec.
3.1. How to read the syntax rules in the XML 1.0 Recommendation

An XML document is a UCS character sequence that follows certain patterns. These patterns provide a means of representing a logical hierarchy (a tree) of data.

The XML 1.0 Recommendation establishes conventions for using certain UCS character sequences to represent data and certain other UCS character sequences to represent markup. The markup allows the logical hierarchy to be expressed in the document along with the data itself.

The Recommendation defines these conventions partly with prose explanations and partly with a formal grammar written as a set of 'productions' in Extended Backus-Naur Form (EBNF) notation. This notation is described briefly in section 6 of the spec. It is helpful to know how to read the EBNF productions because they are the definitive reference for proper syntax.

The EBNF productions do little more than enumerate allowable UCS character sequences. Basic sequences are assigned to symbols, which in turn are the foundation for more advanced combinations of symbols and other character sequences. These sequences build upon each other to the point where an entire XML document can be expressed with the following EBNF production:

document ::= prolog element Misc*

This production says that the symbol named document (which represents a well-formed XML document), consists simply of one prolog followed by one element followed by zero or more Miscs. Each of these symbols is defined in terms of other symbols and character sequences.

Note that the XML 1.0 Recommendation refers to UCS characters by their Unicode scalar values, using a notation of #x followed by only as many hex digits as needed. So #x9 in the EBNF productions means the abstract character that would be represented in Unicode 3.1's 'U+' notation as U+0009. It does not necessarily mean a byte with hex value 9.

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
S ::= (#x20 | #x9 | #xD | #xA)+

The first line means that Char is the one character that is in those ranges listed. Note that characters U+0000 through U+0008 and several other ranges are not considered Chars and are not allowed in XML documents. The second line shows that S is a sequence of one or more instances of any of the 4 'whitespace' characters. The definition of a Comment is given as:

Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

This means that Comment is the 4 characters <!-- and the 3 characters -->, in between which are 0 or more instances of either a Char that is not -, or the character - followed by a Char that is not -.

Misc ::= Comment | PI | S

This means that Misc is one of Comment, PI, or S. The definition of PI is too lengthy to include here, so we'll just leave it as it is.

Since Comment and S have been defined, it would be just as accurate to say:

Misc ::= '<!--' ((#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] - '-') | ('-' (#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] - '-')))* '-->' | PI | (#x20 | #x9 | #xD | #xA)+

The other components of document are defined in the same way. It follows that a well-formed XML document is a UCS character sequence that follows certain patterns.
3.2. XML document syntax and character encoding forms

XML documents, in order to be stored or transmitted, must manifest in an encoded form as bits and bytes, using a consistent character encoding mechanism such as UTF-16 or UTF-8.

When these 'physical' documents are assembled or modified, care must be taken to ensure that encodings are consistently applied. If one encoded document is pasted into the middle of another that has a different encoding, the resulting byte sequence could represent corrupted data or could even be unparsable.

The XML 1.0 Recommendation requires that any software that reads XML documents and provides access to their content and structure must be able to support both UTF-8 and UTF-16 encoding forms. The spec further dictates that if UTF-16 encoding is used, a byte-order mark must be present at the beginning of the document. If no hints to a document's encoding are available, it is assumed that UTF-8 encoding is in effect, and it would be an error if the document were not actually encoded with UTF-8.

Because in Latin-based languages the majority of the characters needed in an XML document come from the US-ASCII range (U+0000 to U+007F), UTF-8 is usually the most suitable encoding. UTF-16 may be more straightforward to implement, but it is difficult to compose UTF-16 encoded documents with most text editing software, and it is wasteful to use 2 bytes per character when most characters fall in a very small range. UTF-8 is also advantageous because the XML spec requires that it be the assumed encoding when the document contains no other cues as to its encoding.
3.3. Parsing - decoding and interpreting an XML document

Interpretation of an XML document's logical contents cannot begin until the encoded document has first been decoded into a sequence of UCS characters. Since UCS characters are intangible, decoding, to a computer, really means conversion to some other encoding form, most likely UTF-16, UCS-2 or UCS-4.

Decoding a document, comparing it to the EBNF productions, and interpreting its logical contents in a consistent manner is the job of a software application called an XML processor, also commonly referred to as an XML parser. An XML parser feeds the logical contents to another application that makes use of that info in some way. SAX (Simple API for XML) is a de facto standard that defines a convention for parsers to report the logical contents to an application.
4. XML document entities

Why you need to know this: The term 'entity' is ubiquitous in XML, but has a very specific meaning. It is important to understand and distinguish between entities, entity references, character references, and character entities.

The XML 1.0 Recommendation states that an XML document can be divided into sections called entities. Each entity can exist in a different place —a block of memory or a file on a disk, for example. The entity that contains the main body of the document is the document entity.

If an entity consists of XML character data (i.e., it is a fragment of an XML document), it is called a parsed entity. An XML parser combines the document entity and parsed entities into a contiguous sequence of UCS characters. As it reads a document entity, it locates, decodes, and imports the contents of each parsed entity as replacement text that replaces references to that entity. Parsed entities can have their own character encodings.

An entity that contains non-XML data of any kind (e.g., a binary file like a JPEG or MP3) cannot be read by an XML parser and is therefore called an unparsed entity. An XML document can contain information about the location and format of an unparsed entity, and it can refer to the entity, but it cannot actually contain the entity itself. An XML parser does not replace a reference to an unparsed entity; it just passes the info about the entity to the application. Unparsed entities can only be referred to in limited contexts and are not particularly useful.

The XML 1.0 Recommendation requires that entities be declared in the Document Type Definition (DTD), which is a special part of an XML document's logical structure where document validity constraints are declared. A required part of the DTD, the internal subset, exists in the document entity. An optional part of the DTD, the external subset, may exist in an entity that is external to the document entity.

An entity that is only for use in the document is a general entity. An entity that is for use only within the DTD is a parameter entity. Parameter entities are useful as macros for often-repeated text that is used in a DTD, or to represent pseudo data types.

An entity is either internal or external. If the declaration of an entity identifies the entity's replacement text by its location (a URI), or if the entity is unparsed, then the entity is said to be external. If the declaration of an entity includes its replacement text (either with literal characters, entity references, or both), then it is said to be internal.

Due to the limitations on unparsed entities, the actual combinations of characteristics of a given entity are as follows:

* Internal parsed general
* Internal parsed parameter
* External parsed general
* External parsed parameter
* External unparsed general

When a parsed entity is declared in a DTD, it is given a name. This name is the basis of references to that entity. The syntax of an entity reference is the UCS character sequence &name; for general entities, and %name; for parameter entities.

There are 5 built-in internal general parsed entities that all XML processors must recognize, even if they have not been declared in a DTD. These entities are used to escape character data that is not markup.
Built-in entity reference Replaces character
&amp; &
&lt; <
&gt; >
&quot; '
&apos; '

An XML parser that is not validating an XML document is not required to read any external entities, so in some situations it is not an error for a document to refer to an entity that is declared in one of those entities. This depends on whether the document declares itself as being 'standalone', which means that it does not have markup declarations (including entity declarations) in any external entities.

In addition to entity references there are character references, each of which refers to one UCS character by its code points. The syntax of a character reference is the same as for general entities, but instead of a name, the character is identified by its code position, in the form #xABCD for hex or #1234 for decimal. For example, &#xA0; or &#160; are both references to U+00A0, the non-breaking space character.

The term character entity is not defined by the XML spec, but since character reference and entity have definite meanings, one can infer that a character entity is a general entity that has a single character as its replacement text. The five built-in entities can be considered character entities, but a numeric character reference like &#160; is not an entity at all, so it cannot be called a character entity.

There are restrictions on what UCS characters are allowed in a parsed entity. Certain characters are disallowed, and cannot even be referenced via character references. The allowed characters are: U+0009 (tab), U+000A (linefeed), U+000D (carriage return), U+0020through U+D7FF, U+E000 through U+FFFD, and U+10000 through U+10FFFF. Consequently, a parsed entity is not a good place to store arbitrary binary data, unless the data is pre-encoded with the Base64 or uuencode mechanisms.
5. XML logical structures

Why you need to know this: This is the heart of XML; one must think of XML not just in terms of its literal, lexical structures, but also in terms of the logical, implied structures that the document's text represents.
5.1. Markup vs. character data

The allowable UCS character sequences in a decoded document fall into two main categories: markup and character data. The character data is at the very least a representation of data, and often is in fact literal data. The markup models that data as a tree, annotates the document with comments, provides information to an XML parser, declares and refers to entities, and declares certain valid logical structures for all documents of that type. Most markup is considered to be part of the 'logical' contents of a document, but entity and character references are considered 'physical' for some reason.

There are several logical structures in an XML document:

* XML Declaration or Text Declaration
* Document Type Declaration
* Processing Instructions
* Comments
* Text (Character Data)
* Elements and their Attributes
* Document Type Definition (DTD)

5.2. The prolog

An XML document must begin with markup called a prolog. A prolog consists of either an XML declaration or a text declaration, optionally followed by a Document Type Declaration, optionally followed by comments or processing instructions. Whitespace may appear after any of these components of the prolog.

A document entity's prolog begins with an XML declaration and takes the form:

<?xml version='1.0' encoding='UTF-8' standalone='no'?>

The XML version is required, but encoding and standalone declarations are optional. The prolog for any entity other than the document entity begins with a text declaration. A text declaration is in the same format as an XML declaration, but it is optional, always contains an encoding declaration, and never contains a standalone declaration.

In an XML declaration, the encoding declaration is not required, but is recommended so that an XML parser can be sure it is decoding the document correctly. Without an encoding declaration, the parser must rely on a default heuristic for determining the encoding, inevitably resulting in an assumption of either UTF-8 or UTF-16. It is considered an error if the document's encoding is not what was declared or assumed. So for example, if the encoding is declared to be iso-8859-1, the parser should reject it if any bytes in the 0x80..0x9F range are encountered, because those bytes do not exist in iso-8859-1.

Although it is allowed to have any value, an encoding declaration should use the name of a character map as defined by the Internet Assigned Numbers Authority (IANA) in their official list of 'character set' names, or else a made-up name beginning with 'x-'. The encoding name is case insensitive.
5.3. Character data

Character data can exist in one of two forms: parsed or unparsed. If it is parsed, then it is a PCDATA section and the UCS characters can be included in the document directly, provided they have instances of the markup delimiters < > and & escaped using entity references, like so:

1 &amp; 2 are &lt; three

In general, '>' does not have to be escaped, but it is good practice to escape it for the benefit of humans who might be looking at the character data. It is also good practice, and sometimes necessary, to escape ''' and ''' in attribute values.

If a section of character data is to be unparsed, then it is a CDATA section and must be enclosed in markup of the form:

<![CDATA[1 & 2 are &lt; three]]>

People are often misinformed about what a CDATA section actually is. It is just a convenience for the document author, saving them the trouble of escaping data upon input. It does not mark a span of text with a 'Dear Application, please preserve me all the way through to output, if you can' flag. Rather, it merely says 'Dear XML Parser, if you see something in here that looks like markup, it's not really markup. Please report it to the Application as ordinary character data, as if '<' and '&' had been written as '&lt;' and '&amp;'.' Using a CDATA section does not really buy anything other than convenience, and it does nothing to make XML a good vehicle for transporting other markup, unless such markup is never going to be treated as markup ever again.
5.4. Elements and attributes

Character data is divided into named chunks called elements and attributes. Although the XML 1.0 Recommendation does not dictate semantics for these structures, it does imply that they define a hierarchy - a tree of data with a root, branches and leaves. It also places restrictions on attributes such that each attribute can only be a name-value pair that is associated with an element, thereby implying that an attribute is a granular, inherent property of an element.

An element or attribute can have any name that begins with a letter, underscore or colon and otherwise contains only certain other characters (letters, digits, periods, hyphens, underscores, colons, combining characters or extenders). Names beginning with the case-insensitive sequence 'xml' have special meaning. An XML element's name is its type. All elements with the same name are of the same type (this word is used a lot in the spec, so it's good to know what it means). Two attributes with the same name cannot be associated with a single element.

An element is a container for its contents, which can be character data, more elements, or both, in any combination. An XML document must have exactly one root element, also known as the document element. All character data and other elements must be contained within the document element. A parent-child relationship exists between an element and the elements contained within it.

If an element has no contents, then it is 'empty' and is denoted with an empty-element tag of the form:

<elementName/>

If an element has contents, then the contents are bounded by a start tag and end tag, like this:

<elementName>this character data is the contents</elementName>

When other elements are in the contents, start and end tags must not overlap.

<greeting>Hello <name>Jane, how are you?</greeting></name>
<greeting>Hello <name>Jane</name>, how are you?</greeting>

An attribute that is associated with an element is inserted in the element's start tag next to the name of the element. The name of the attribute is given, along with its value in single quotes (ASCII apostrophes, actually, not curly quotes) or ASCII double quotes. Double quotes are most common.

<greeting type='informal'>Hey Dude! What up?</greeting>
<greeting type='informal'>Hey Dude! What up?</greeting>

If an attribute value contains the same kind of quotes that are quoting the entire value, then those quotes in the value must be escaped.

<movie name='Rosemary&apos;s Baby'/>

Attribute values are not the best place to store just any character data, because an XML parser will not interpret the values exactly as they appear in the document. When the document is read by an XML parser, the attribute values will be subjected to whitespace normalization, which in this case means that sequences of whitespace characters (space, tab, linefeed, or carriage return) will be removed from the beginning and end of the value, and consecutive sequences of whitespace elsewhere in the value will be replaced by a single space character.

An XML document can be annotated with comments, as long as those comments are separate from other markup. An XML parser may choose to ignore comments. Text inside the comment does not need to be escaped with entity references, but a sequence of two hyphens is disallowed within the comment.

<!--this is a comment-->
<anElement>hello <!--this is another comment--> world</anElement>
5.5. Processing instructions

Looking very much like the prolog is a bit of markup called a processing instruction. It provides a mechanism for an XML parser to pass information to an application via the document, but it is not considered part of the document's data. It takes the form:

<?foo bar?>

where foo is the target, an identifier for the application to which the instruction is directed. The target can optionally be formally declared in the DTD via a notation declaration. Any number of whitespace-separated characters that follow the target, up to the '?>' delimiter, are made available to the application. Processing instructions are not used that often.

<?xml-stylesheet href='style.css' type='text/css'?>

In this example, xml-stylesheet is the target (presumably this is meaningful to the application), and the underlined string is the instruction. This instruction might be said to contain 'pseudo-attributes' because it resembles a series of attributes, but it is just a single opaque, meaningless string, as far as the XML parser is concerned. This string will be interpreted by the application, not the parser.
5.6. The Document Type Definition (DTD)

XML 1.0 provides for a logical structure called the Document Type Definition, or DTD. Like a processing instruction, the DTD is not part of the data in a document. The DTD contains user-defined declarations of what element and attribute names are valid, rules for contents of elements and values of attributes, and the names and locations of entities. All documents written to conform to the rules in a given DTD are considered to be of the same type, hence the name Document Type Definition.

If a document matches all the character encoding and syntax rules defined in the EBNF productions in the XML 1.0 Recommendation, then it is well-formed and can be read by any XML parser. If a document also matches the rules defined in a DTD and the validity constraints imposed by the XML 1.0 Recommendation, then it is valid.

A validating parser is required to check for well-formedness and report validity violations in an entire document, including parsed entities and the complete DTD. A non-validating parser is required to check for well-formedness only in the document entity and the internal DTD subset, and does not need to report any validity violations. A non-validating parser is also not required to read and get replacement text from external parsed entities, but it must inform the application where references to such entities occur.
5.6.1. DTD syntax

XML DTDs use an SGML-inherited syntax to define a frustratingly limited vocabulary for documents. There are a lot of subtle rules to follow when writing DTDs and the only good reference is the XML spec itself, so this tutorial will not go into too much detail.

The declarations in an XML DTD consist of a mixture of tokens and EBNF constructs, inside what look like element tags (but aren't).
5.6.2. DTD syntax: entity declarations

One major kind of declaration is for entities. Each entity is declared with an indicator of whether it is a parameter or general entity, its name, and, depending on whether it is internal or external, a literal entity value in quotes (if internal), or an identifier for where the replacement text can be found (if external and parsed):

<!ENTITY whoa 'WHOA!'>
<!ENTITY baby SYSTEM 'baby.txt'>
<!ENTITY % my-DTD-fragment SYSTEM 'http://foo.net/path/to/my.dtd'>
<!ENTITY % standard-DTD PUBLIC '//W3C-Gobbledygook/1.0'>

These examples say that there is a general entity named 'whoa' with replacement text 'WHOA!'; a general entity named 'baby' whose replacement text can be found in the file named 'baby.txt'; a parameter entity (denoted by the percent sign) named 'my-DTD-fragment' whose replacement text can be found at the location identified; and a parameter entity named 'standard-DTD' whose replacement text can be found at a location that the XML parser should know where to find, based on the public identifier given.

An XML parser will determine an internal entity's replacement text by replacing character references and parameter entity references it finds in the literal entity value (the quoted string in the entity declaration). Therefore, the built-in entities, if declared, must look like the following:

What's the deal with the double-escaping for lt and amp? It has to do with how replacement text for entity declarations is handled: General entity references are not resolved inside the declaration, but character references and parameter entity references are. The idea is that you can do this:

<!ENTITY % nombre 'Se&#xF1;or Gomez'>
<!ENTITY question '&#xBF;Como esta?'>
<!ENTITY foo '&#161;Hola %nombre;! &question;'>

These declarations indicate the following:
entity
name entity
type literal entity value replacement text
nombre parameter Se&#xF1;or Gomez Señor Gomez
question general &#xBF;Como esta? ¿Como esta?
foo general &#161;Hola %nombre;! &question; ¡Hola Señor Gomez! &question;

When you put in a document...

<greeting>&foo;</greeting>

...the replacement text goes in:

<greeting>¡Hola Señor Gomez! &question;</greeting>

Notice how '&' in the replacement text ends up in the document unchanged, so it signifies the beginning of an entity reference. Similarly, '<' would also be unchanged and would thus look like the beginning of a tag.

When the document is parsed, after the replacement text has been substituted in, general entities will be resolved and other markup will be processed normally. In effect, it is as if the document contained:

<greeting>¡Hola Señor Gomez! ¿Como esta?</greeting>

So now it should be evident why one would need to doubly escape '<' and '&' when defining entities. They only need to be escaped in this special way when writing an entity declaration, due to the way replacement text is calculated and parsed.

<!ENTITY lt '&#38;#60;'>
<!ENTITY gt '&#62;'>
<!ENTITY amp '&#38;#38;'>
<!ENTITY apos '&#39;'>
<!ENTITY quot '&#34;'>

If a DTD or part of a DTD is in an external entity, the entity's replacement text should begin with a text declaration, but is not required to.

External unparsed general entities are handled a little differently. Processing them is the responsibility of the application; the XML parser's responsibility ends with the reporting of information about the entity. Some information about the entity's internal format must be declared after the identifier that indicates the entity's location:

<!ENTITY selfPortrait SYSTEM 'me.jpg' NDATA JPEGformat>
<!NOTATION JPEGformat SYSTEM 'http://www.jpeg.org'>

The presence of the NDATA token indicates that the entity is unparsed. The arbitrary name following it is just a key to the notation declaration. The notation declaration just pairs the name with an identifier that the application should recognize as a cue for how to handle the entity. Notation declarations have additional applications, described below in the section on attribute declarations.

The only place in an XML document where an unparsed entity can be referenced is in the value of an attribute that has been declared to be of type ENTITY or ENTITIES. The entity name is the attribute value; no delineation with '&' and ';' is needed.
5.6.3. DTD syntax: element declarations

Another major kind of declaration is for elements:

<!ELEMENT greeting ( #PCDATA | name )*>

This says: An element of type 'greeting' exists (i.e., there can be elements named 'greeting'). The asterisk indicates that there can be zero or more instances of whatever precedes the asterisk, which in this case is the grouping (denoted by parentheses) of parsed character data (denoted by the token #PCDATA) or (denoted by the vertical bar) an element of type 'name'. The element type 'name' must also be declared, since it was mentioned. This will do:

<!ELEMENT name ( #PCDATA )>

If the DTD contained those two element declarations, then the following document would be valid:

<greeting>¡Hola, <name>César</name>!</greeting>

In this example, the text before and after the <name> is allowed because of the #PCDATA in the declaration for the element greeting.

The XML parser would report to the application:

* There is an element named 'greeting'.
* It contains the character data '¡Hola, ',
* followed by an element named 'name', which contains
* the character data 'César'.
* After that 'name' element, the 'greeting' element contains the character data '!'.

There is a relatively obscure feature of XML that says that a validating parser must notify the application when the DTD says that an element can contain only element content (no PCDATA). The purpose of this notification is so that the application can know whether it is OK to discard any whitespace that might appear in that element.

Continuing with the previous example, let's add a wrapper element called, simply 'wrapper':

<!ELEMENT wrapper ( greeting )>

The valid document could then look like:

<wrapper>
<greeting>¡Hola, <name>César</name>!</greeting>
</wrapper>

The XML parser would report to the application:

* There is an element named 'wrapper'.
* It contains the character data '(linefeed)(space)(space)'. This is insignificant whitespace.
* After that, there is an element named 'greeting', which contains
* the character data '¡Hola, ' (note the significant space),
* followed by an element named 'name', which contains
* the character data 'César'.
* After that 'name' element, the 'greeting' element contains the character data '!'.
* Following the 'greeting' element is the character data '(linefeed)'. This is insignificant whitespace.

There is a way for a document author to override this behavior and force an application to recognize all whitespace in an element as being significant, using the xml:space attribute, described below.
5.6.4. DTD syntax: attribute declarations

Another major kind of declaration is for attributes. Attributes can be complicated to declare, so here is a relatively simple example:

<!ATTLIST greeting
type ( formal | informal ) #REQUIRED
length CDATA #IMPLIED>

This says that for the element of type 'greeting', there are two attributes named 'type', which is required to be present, and 'length', which is optional (denoted by the #IMPLIED token). The 'type' attribute can have a value consisting of a special class of parsed character data called an NMTOKEN, which must be in this case one of either 'formal' or 'informal'. The 'length' attribute can have a value consisting of any parsed character data, denoted by the CDATA token. CDATA is just a token and should not be confused with an unparsed CDATA section in the document.

Attribute value types can be:

* CDATA (any parsed character data; may be further restricted by a default value);
* NMTOKEN (any character sequence matching the production for NMTOKEN);
* NMTOKENS (a sequence of one or more whitespace-separated NMTOKENs);
* An enumerated list of particular NMTOKENS (as in the example above);
* ID (any character sequence matching the production for ID and that doesn't repeat in a document);
* IDREF (an ID sequence that is the same as an attribute of type ID elsewhere in the same document);
* IDREFS (one or more whitespace-separated IDREF sequences);
* NOTATION (a character sequence matching the name of a declared notation; see below for explanation and examples).
* ENTITY (the name of an unparsed entity declared elsewhere in the DTD)
* ENTITIES (a sequence of one or more whitespace-separated ENTITY)

It is also possible to declare default values for attributes by putting the quoted value in place of the #REQUIRED or #IMPLIED token. The attribute can be declared as always existing (even if it is omitted from the document) and always having the default value by preceding the default value with the token #FIXED.

The XML 1.0 Recommendation defines two attributes that have special meaning and that can be associated with any element. When these special attributes are used and the document is being checked for validity, the spec requires that they be declared in the DTD.

1. The special CDATA-type attribute named xml:space, as mentioned before, must be declared as having either the value 'preserve' to indicate that the application should always consider any whitespace in the element contents to be significant character data, or the value 'default' to indicate that the application does not need to consider the element's whitespace to be significant.
2. The special CDATA-type attribute named xml:lang associates an element's contents with a human language. More information about its valid values and when to use it is in the next section, below.

There is also a rarely-used declaration called a notation, which creates a name-location pair that can be used to signal to an application that a parsed character data section is to be interpreted as having some kind of additional encoding, such as Base64 or PostScript. It can also be used to describe the target for a processing instruction. Here is an example declaring two notations, an element, and an attribute of type NOTATION. The possible value of the attribute is one of the two declared notation names:

<!NOTATION ps PUBLIC 'Postscript Level 3'>
<!NOTATION vrml SYSTEM 'http://www.web3d.org/'>
<!ELEMENT FormattedData ( #PCDATA )>
<!ATTLIST FormattedData
Format NOTATION ( ps | vrml ) #REQUIRED>

An element conforming to these declarations might look like this:

<FormattedData Format='ps'>
gsave
112 75 moveto 112 300 lineto
showpage grestore
</FormattedData>
5.7. The xml:lang attribute

As mentioned above, xml:lang is a special attribute that allows document authors to flag element content as being related to a particular language.
5.7.1 Acceptable values for xml:lang

According to the XML 1.0 Recommendation and its errata, the value of an xml:lang attribute must be an (XML) LanguageID. A LanguageID is normatively defined by IETF RFC 1766, 'Tags for the Identification of Languages'. The XML spec muddles this quite a bit by trying to provide a summary of what RFC 1766 says, and they botched their references, so it's rather confusing.

RFC 1766 provides several ways of constructing a LanguageID.

The simplest method is to use an ISO 639:1988 2-letter language code. These codes are case-insensitive but are typically lowercase. Examples: 'en' or 'fr'. Reference: http://ftp.std.com/obi/Standards/ISO/ISO_639.

ISO 639 has been updated a number of times since 1988 and is now in 2 parts, ISO 639-1 for the 2-letter codes and ISO 639-2 for 3-letter codes. It has been argued that due to XML 1.0's normative reference to RFC 1766 and that document's requirement that it be superseded to accommodate updates to ISO 639, we are technically stuck with using the 1988 codes. In a post to the IETF Languages mailing list on 02 Aug 2000, Harald Tveit Alvestrand, the author of RFC 1766, said 'The intent of RFC 1766 and the current draft is that the lists referred to are the published versions + any later changes. I refuse to put in references to unpublished documents, but that's my only religion on the matter; replacement text is welcome.'

2. ISO 639 doesn't identify some obscure languages, so RFC 1766 also allows IANA registered language identifiers to be used. These codes either begin with 'i-' followed by 3 to 8 letters identifying a language, or they begin with an ISO 639 2-letter language code, followed by a hypen and 3 to 8 letters denoting the region in which the langauge is used (useful for identifying regional dialects). These codes are case-insensitive but are typically lowercase. Examples: 'i-navajo' (Navajo) or 'zh-yue' (Cantonese). Reference: http://www.isi.edu/in-notes/iana/assignments/languages/tags.

3. RFC 1766 says you can make up your own identifiers, as long as they begin with 'x-' or 'X-'. Example: 'x-piglatin'.

4. RFC 1766 allows 2-letter country codes to be appended to the 2-letter language codes, in the same way the IANA language tags append 3-to-8 letter region codes. When a 2-letter suffix is being used, it *must* be a 2-letter country code from ISO 3166:1988. These codes are case-insensitive but are typically UPPERCASE. Examples: 'en-US' or 'en-GB' or 'fr-CA'. Reference: http://ftp.std.com/obi/Standards/ISO/ISO_3166.

As with ISO 639, ISO 3166 has been updated a number of times and is now ISO 3166-1, but you're only allowed to use the 1988 codes.

5. You can go on tacking on as many additional suffixes onto the end as you want, after the 2-letter country code from ISO 3166:1988. If you didn't use a 2-letter country code, you can still append any suffixes you want, as long as the first one isn't 2 letters.
5.7.2 When (not) to use xml:lang

1. Use the xml:lang attribute as a descriptive supplement to elements that contain language-specific character data, whether that character data is element content or in the element's other attribute values. xml:lang is intended to apply to everything contained within the element, so it's not necessary to use it on all sub-elements if it has already been declared.

If xml:lang is used in an element, it must be declared in the DTD, like any other attribute. The 'xml:' prefix does not have to be declared in an xmlns:xml attribute, though; the XML Namespaces recommendation says that xml: is by default bound to a particular namespace.

Example:

<?xml version='1.0'?>
<!DOCTYPE dialog [
<!ELEMENT question (#PCDATA)>
<!ELEMENT answer (#PCDATA)>
<!ELEMENT dialog (question,answer)>
<!ATTLIST question
by CDATA #IMPLIED
xml:lang CDATA #IMPLIED>
<!ATTLIST answer
by CDATA #IMPLIED
xml:lang CDATA #IMPLIED>
]>
<dialog>
<question by='Limey Brit' xml:lang='en-GB'>What colour is your tea, mate?</question>
<answer by='American Dork' xml:lang='en-US'>Tea comes in different colors?</answer>
</dialog>

Here is a demonstration of the inheritance principle. The entire dialog is English, and it may not be necessary to differentiate between dialects. Only the dialog element contains the xml:lang attribute, but the attribute implies that the entire contents of the element are in English, so an application will likely say that the language of the question and answer elements is English in each case:

<?xml version='1.0'?>
<dialog xml:lang='en'>
<question by='Limey Brit'>What colour is your tea, mate?</question>
<answer by='American Dork'>Tea comes in different colors?</answer>
</dialog>

2. Try to only use xml:lang as a descriptor of language-specific content of data elements. In situations where, say, a user has made a language choice in a UI and you want to record that choice in an XML document, you should make up your own element for this purpose. When the language itself is a significant piece of data rather than just a property of one other granular piece of data, you need to use something other than xml:lang.

Example:

<?xml version='1.0'?>
<Site>
<SiteProperties>
<SiteLanguage>fr</SiteLanguage>
...
</SiteProperties>
<SiteData>
<MerchantName>Violet's Violets</MerchantName>
<Slogan xml:lang='en-US'>We aim to please</Slogan>
<Slogan xml:lang='fr-CA'>Parlez-vous? Oui!</Slogan>
...
</SiteData>
</Site>

In this example, the application could select the correct Slogan for inclusion in the site by comparing the SiteLanguage with the xml:lang attributes. In XSLT/XPath, this is trivial, using the lang() function, which looks for ancestors with xml:lang and ignores suffix disparities (so a test for 'fr' would match 'fr-CA').

In practice, for most applications, using just the ISO 639:1988 2-letter codes, or those codes plus the ISO 3166:1988 2-letter country code suffixes, is more than sufficient.


Acknowledgments:

This work was based upon...

1. The Unicode Standard, Version 3.0; ISBN 0-201-61633-5, which has various explanatory sections that apply to chapter 2 of this tutorial
2. Unicode Technical Report #17, which goes a bit further than chapter 2 and has excellent diagrams explaining the relationship between abstract characters and glyphs
3. Kenneth Whistler @ Sybase, who proofread a draft of chapter 2 and suggested a few edits for accuracy
4. XML 1.0, the W3C Recommendation annotated by Tim Bray.



This document is part of the skew.org XML & XSLT resources."