HTML Character Entities

This page enumerates the character entities HTML provides. These are tokens between an & (ampersand) and a ; (semicolon), of form &token; and displayed by an HTML renderer as the symbol described by the token thus enclosed. Wherever an HTML document uses a character which is not part of the character set covered by the encoding the document's web server will claim it uses, it should represent it by a character entity. One can also use numeric character entities, consisting of a number (the unicode code-point for the character) between &# and a ; (semicolon) – however the verbal character entities (when available) are more intelligible to anyone reading the page source.

When a web server responds to a request for a page, it reports a (content, as opposed to transfer) encoding, which specifies how the stream of bytes delivered should be interpreted as characters. If the page's author used an authoring tool which worked in some native encoding, but the server doesn't know about this (so doesn't report it, or reports some default at odds with it) this can confuse the user agent (though, since this not uncommonly happens, many attempt to guess the actual encoding – but the less we rely on programs to guess, the less scope they have for bugginess). While a page can contain meta-data which specifies data equivalent to that in HTTP headers, it is in principle hopeless (though in practice it may help the user agent's guess-work) to specify the encoding (and some others, such as MIME content-type) this way, since the user agent won't correctly read this meta-data unless it's correctly interpreting the byte-stream as the sequence of characters it's supposed to understand as telling it how to read the byte-stream – a classic chicken and egg problem. Consequently, web pages should be written in the character encoding the web server will report for them. HTML character entities provide a way to use characters absent from the encoding advertised by the web server.

Some authoring tools will allow the user to switch character sets (and hence, typically, encodings) without stopping to warn about this issue. While this provides a convenient way to author a page using a wide repertoire of characters, the results are more or less guaranteed to display unintelligibly to the page's readers. For contrast, if the page uses a character entity that some browser does not support, it will typically display the &token; verbatim; this may not look beautiful to the reader, but at least it won't look like some arbitrary other (i.e. wrong) character. If your authoring tool leaves raw characters in web documents, you may find the demoronizer useful. It ain't perfect, but I haven't written anything better yet.

The non-experimental parts of this page are derived from the HTML 4 character entity set, which supercedes and subsumes the Web Project's description of the ISO 8859 Latin 1 "ISO 8879:1986//ENTITIES Added Latin 1//EN" character entities. I provide illustrations, so you can see what's what (and whether your browser copes), and shuffled the order. Jukka Korpela provides similar in a table.

In due course this page is due an update to take account of stuff I've been told about unicode; here are pages of charts and names. Ian Hickson's data: URI kitchen can also be useful … and, speaking of Ian, HTML 5 has a much expanded repertoire of character entities, that I should document some day. Some entries from it are included below.

Another update, possibly superseding the preceding: in 2010/April, the W3C MathML WG published its entity definitions for characters, which aims to be fairly comprehensive. It is way too big to assimilate here.

In an attempt to make it easier to find particular characters, I've also broken the list into logical groups:

Accented letters: acute, circumflex, grave, umlaut, other vowels and consonants.
Symbols: lone accents, legal symbols, enclosures (e.g. quotation marks), punctuation, spaces, Unicode magic, a miscellany and symbols I didn't recognise until I looked them up. So I can sympathise with the many browsers that still don't recognise them.
Mathematical Symbols: binary operators, arithmetic comparators, set relations, prefix or unary operators, special names, a miscelany and the Greek alphabet.
Arrows: Of so many kinds, with so many variations, that they needed their own section.
… and, finally, experiments.

Aside from the table of greek letters, each entry is of form:

what: how [ = emacs ] [ comments ]

in which the bold punctuation won't be bold in actual entries, portions enclosed in […] aren't always present (and the [ and ] themselves never are),

what is an illustrative use of the relevant character entity (or entities), which will display correctly as the denoted character(s) if your browser supports them;
how is some pure ASCII which displays as the text you would need to type in your HTML document to have what appear on your web page;
comments will describe, or give a name to, the character; and
emacs is what an emacs user must type, in iso-accents-mode, to produce what; this is controlled by iso-accents-list (or was, in emacs-19.29). This, and its leading =, get omitted for characters not covered by iso-accents-list.

Note that raw, numeric and emacs forms are only provided for actual ISO 8859 Latin-1 characters; and that only browsers which claim to support HTML 5 can be complained at for failure to support the rest.

Accented Letters

Acute accents for A, E, I, O, U, Y, a, e, i, o, u and y:

Á
Á = 'A

É
É = 'E

Í
Í = 'I

Ó
Ó = 'O

Ú
Ú = 'U

Ý
Ý = 'Y

á
á = 'a

é
é = 'e

í
í = 'i

ó
ó = 'o

ú
ú = 'u

ý
ý = 'y
Circumflex accents for A, E, I, O, U, a, e, i, o and u:

Â
Â = ^A

Ê
Ê = ^E

Î
Î = ^I

Ô
Ô = ^O

Û
Û = ^U

â
â = ^a

ê
ê = ^e

î
î = ^i

ô
ô = ^o

û
û = ^u
Grave accents for A, E, I, O, U, a, e, i, o and u:

À
À = `A

È
È = `E

Ì
Ì = `I

Ò
Ò = `O

Ù
Ù = `U

à
à = `a

è
è = `e

ì
ì = `i

ò
ò = `o

ù
ù = `u
Dieresis or umlaut accents for A, E, I, O, U, Y, a, e, i, o, u and y

Ä
Ä = "A

Ë
Ë = "E

Ï
Ï = "I

Ö
Ö = "O

Ü
Ü = "U

Ÿ
&Yuml;

ä
ä = "a

ë
ë = "e

ï
ï = "i

ö
ö = "o

ü
ü = "u

ÿ
ÿ = "y
Aside for C programmers: including an ISO Latin-1 ÿ (represented by character code 255) in the text read by a C program is a very simple way to test whether the author was rash enough to collect answers from getc() in char variables – in which case, ÿ looks like an end-of-file marker (or the program fails to spot end of file).
Other non-ASCII vowels: A, a, O and o with tilde accents; A and a with ring accents; slashed O and o; and the ligatures for AE and ae:

Ã
Ã = ~A

ã
ã = ~a

Õ
Õ = ~O

õ
õ = ~o

Å
Å = /A

å
å = /a

Ø
Ø = /O

ø
ø = /o

Æ
Æ = /E

æ
æ = /e

Œ
&OElig;

œ
&oelig;
Non-ASCII consonants: the German sharp s (sz ligature); N and n with tilde; C and c with cedilla; S and s with caron; the Icelandic characters Thorn (first in upper case, then lower) and Eth (likewise).

ß
ß = "s (see also β, β)

Ñ
Ñ = ~N

ñ
ñ = ~n

Ç
Ç = ~C

ç
ç = ~c

Š
&Scaron;

š
&scaron;

Þ
Þ = ~T

þ
þ = ~t

Ð
Ð = ~D

ð
ð = ~d
Lone accents: acute, cedilla, ring, dieresis/umlaut, circumflex and small tilde (pretty useless without a backspace [ or , I think] to place them on letters — and note that ` is the grave accent, though widely abused, e.g. on earlier editions of this page, as left quote and matched by an apostrophe ' used in place of right quote – with the result that many browsers display them as these quote marks rather than the appropriate accent and apostrophe; but see enclosures, below):

´
´ = ''

¸
¸ = ~~ cedilla

°
° = // ring accent or degree sign

¨
¨ = ""

ˆ
&circ; circumflex

˜
&tilde; small tilde, cf. ASCII ~ tilde

Symbols

Legally meaningful symbols:

€
€ euro sign

¥
¥ the Yen

£
£ pounds sterling

¢
¢ cent

¤
¤ generic currency

©
© copyright and

®
® registered, should usually be put in a superscript, e.g. ACME^®

™
™ trademark, e.g. ACME™, no need to superscript.
Enclosures of various kinds, giving two symbols in each item; but note that HTML supports the use of <q>…<q> to denote quoting (and which of the following to use can be configured by the use of CSS):

“double”
“double” double quotes

‘single’
‘single’ single quotes, done properly

«m»
«m» = ~<m~> guillemet

‹m›
&lsaquo;m&rsaquo; single angle quotation mark

⟨m | n⟩
&lang;m | n&rang; Dirac angle bracket, bra and ket (Jukka says these aren't the proper angle brackets – U+27E8 and U+27E9 should be used instead – but mentions that HTML5 has &[lr]ang; mean these, matching common browser practice.)

{m}
{m} curly braces; but ASCII's {m} work fine.

⌈m⌉
&lceil;m&rceil; ceiling, left is a.k.a. apl upstile

⌊m⌋
&lfloor;m&rfloor; floor, left is a.k.a. apl downstile
Punctuation for ordinary text:

¡
¡ = ~! the inverted exclamation mark

¿
¿ = ~? the inverted question mark

&
& ampersand

…
… horizontal ellipsis, three dot leader

…
&mldr; mid-leader ellipsis

⋮
&vellip; vertical ellipsis

⋰
&utdot; diagonal ellipsis, bottom-left to top-right

⋱
&dtdot; diagonal ellipsis, top-left to bottom-right

3–6
3–6 en dash

—
— em dash

'
' the apostrophe

"
" = " double quote (but see below)

‚
&sbquo; single low-9 quotation mark

„
&bdquo; double low-9 quotation mark
Spaces of various kinds; here illustrated between a pair of underscores so you can see them !

_ _
&emsp; em space, width of letter m

_ _
&ensp; en space, width of letter n

_ _
&numsp; number space, the width of a digit

_  _
 

_ _
 

_ _
  non-breaking, normal width (sadly, its narrow sibling, _ _ as   is not an HTML entity);

_ _
&puncsp; punctuation space, the width of narrow punctuation

_ _
  thin space, useful for grouping digits in long numbers, as in 12 345 678

_ _
&hairsp; very thin space a little too narrow for the previous: 12 345 678

_
&ZeroWidthSpace;
Unicode magic tokens without attempted illustration; I don't want to give your browser the magic messages these convey !
- &zwnj; zero width non-joiner
- &zwj; zero width joiner
- &rlm; right-to-left mark
- &lrm; left-to-right mark
Other symbols (see also arrows):

§
§ a section symbol

¶
¶ pilcrow, a.k.a. paragraph symbol

•
• bullet

ƒ
&fnof; latin small f with hook, function, florin

‰
&permil; per mille sign, cf. ASCII percent %

′
′ prime, minutes, feet

″
″ double prime, seconds (of angle), inches

‾
&oline; overline

†
&dagger; dagger

‡
&Dagger; double dagger

◊
&loz; lozenge, hollow diamond suit

♠
&spades; black spade suit

♣
&clubs; filled club suit, shamrock

♥
&hearts; filled heart suit, valentine

♦
&diams; filled diamond suit

☎
&phone; telephone (old-fashioned)

⌖
&target; cross-hairs on a target; a small circle over-laid on a large +
Since I didn't know what these were for until I looked them up, I guess it'd be worth explaining these.

¦
¦ broken vertical bar

µ
µ micro sign – cf. μ, μ

¯
¯ macron.

 soft hyphen.

º
º ordinal indicator, masculine.

ª
ª ordinal indicator, feminine.

Mathematical Symbols

See also floor and ciel enclosures, above; þeoretical physicists should also see the Icelandic letter eth, above; and the section of arrows.

Binary operators:

∘
&compfn; function composition operator (I use this also to compose relations; and hack XHTML to let me type it as &on; instead).

÷
÷ divide or subtract, depending on culture, e.g. p ÷ q; I avoid it, due to the ambiguity

p ⁄ q
p &frasl; q fraction slash; I just use the ASCII solidus, /, as it's easier to type.

±
± plus-or-minus

∓
&mnplus; minus-or-plus

p − q
p − q minus sign (cf. - the hyphen)

·
· a centred dot, e.g. p·q

p⋅q
p⋅q dot operator

×
× multiply; there's also an invisible times here between x and y: x⁢y, but it lives up to its name, so isn't much help to a reader.

p∗q
p&lowast;q asterisk operator, compare the ASCII asterisk: p*q

p⊕q
p&oplus;q circled plus, direct sum

p⊗q
p&otimes;q circled times, vector product

A∧B
A&and;B logical and, wedge

A∨B
A&or;B logical or, vee, vel

A∩B
A∩B intersection

A∪B
A∪B union
Comparators, encoding arithmetic relations

<
< less than

>
> greater than

≤
≤ less than or equal to

≥
≥ greater than or equal to

≫
&gg; much greater than

≪
&ll; much less than

≠
≠ not equal to

≡
&equiv; identical to

≅
&cong; approximately equal to, isomorphic to

≈
≈ almost equal to, asymptotic to

∝
&prop; proportional to

∼
&sim; tilde operator, varies with, similar to
Set Relations:

⊂
⊂ is a subset of

⊃
⊃ is a superset of

⊄
&nsub; is not a subset of

⊆
&sube; is a subset of or equal to

⊇
&supe; is a superset of or equal to

a ∈A
a ∈A is an element of

a ∉A
a ∉A is not an element of

A ∋a
A &ni;a contains as member
Unary operators:

¬
¬ logicians' not

∀
∀ for all

∂
∂ partial differential; see also ð for theoretical physicists

∃
∃ there exists

∇
∇ nabla, backward difference, the 3-spatial vector differential operator; contrast with Greek Δ

∏
∏ n-ary product, product sign contrast with Greek Π

∑
∑ n-ary sumation, sum sign contrast with Greek Σ

√
√ square root, radical sign

∫
∫ integral

∬
&Int; double integral

∮
&conint; contour integral

∯
&Conint; double contour integral

⨑
&awint; anticlockwise integral

∳
&awconint; anticlockwise contour integral

∴
&there4; therefore
though why the idiots couldn't have used &thus; (a perfectly good synonym for therefore, with the bonus of reading as and thus) or &so; (similarly justified) rather than the hideously clumsy pun there4 (presumably motivated by the desire for brevity, which thus and so attain better) is beyond me. But, apparently, it's ISO 8879:1986's fault, not W3C's.
Special names:

∅
∅ empty set, null set, diameter

ℵ
&aleph; aleph (the letter of the Hebrew alphabet; ℵ₀ denotes the first transfinite cardinal)

∞
∞ infinity

ℑ
&image; blackletter capital I, imaginary part

ℜ
&real; blackletter capital R, real part symbol

℘
&weierp; script capital P, power set, Weierstrass p

ℕ
&naturals; double-struck capital N, which denotes the set of natural (counting) numbers.

ℏ
&hbar; Dirac's constant (Planck's h divided by 2.π)
Miscelaneous mathematical symbols:

∠
&ang; angle

⊥
&perp; up tack, is orthogonal to, perpendicular

¹
¹ superscript 1

²
² superscript 2

³
³ superscript 3

¼
¼ a quarter

½
½ a half

¾
¾ three quarters

A bunch more &frac…;s have been added: 13, 15, 16, 18, 23, 25, 34, 35, 38, 45, 56 and 78.

Greek letters: the character entity for each being its name wrapped in &…; with the name capitalised for the upper case form; thus ψ and Ψ produce the small and capital forms of psi, for example.

name	small	big
alpha	α	Α
beta	β	Β
gamma	γ	Γ
delta	δ	Δ
epsilon	ε	Ε
zeta	ζ	Ζ
eta	η	Η
theta	θ	Θ
iota	ι	Ι
kappa	κ	Κ
lambda	λ	Λ
mu	μ	Μ
nu	ν	Ν
xi	ξ	Ξ
omicron	ο	Ο
pi	π	Π
rho	ρ	Ρ
sigma	σ	Σ
tau	τ	Τ
upsilon	υ	Υ
phi	φ	Φ
chi	χ	Χ
psi	ψ	Ψ
omega	ω	Ω

The HTML standard also blesses the following lower-case greek letter variants (with no upper-case versions):

ς: &sigmaf; final sigma
ϑ: &thetasym; theta symbol
ϒ: &upsih; upsilon with hook
ϖ: ϖ pi symbol (I've known this to be displayed as a variant on omega, with a tilde over it, that I've also known to be called pomega)

Some browsers (e.g. grail) support variants for some lower-case letters, named by adding a v to the end of the usual letter's name: the cases I know of are ϑ for theta (cf. thetasym) and ς for sigma; but other browsers don't support these so a prudent author will abstain from using them. See also µ, µ, and ß, ß, which are very like μ and β, respectively.

There may be more mathematical symbols in a table in W3.org's tour of HTML 3. See also W3.org's table of HTML MATH mode symbols, if it still exists.

Special type-faces

There are a few font-styles that have been widely adopted in mathematics to provide distinct forms of letters that have thereby taken on their own meanings; thus ℕ and its friends are from an 𝕠𝕡𝕖𝕟 type-face, whose letters can be obtained by putting opf; after the plain ASCII letter and an & before, e.g. ℂ, ℕ, ℚ, ℝ and ℤ. There's also a rather 𝔊𝔬𝔱𝔥𝔦𝔠 type-face, using suffix fr; in the same way, but I find it mostly unreadable: for example, 𝔄 is its A, which I would flatly fail to recognise if I hadn't just looked it up in the table; compare 𝔘 (which is U). Then there's a 𝓈𝒸𝓇𝒾𝓅𝓉 font-face, using suffix scr; to get 𝒜, ℬ, … 𝒴, 𝒵, 𝒶, … 𝒾, 𝒿, 𝓀, … 𝓏. It's OK, but I doubt I'll use it much.

There's also at least some of the hebrew alphabet: ℵ, ℶ, ℷ, …, but that's as far as it seems to get (early 2018).

Arrows

There are up to eight directions for arrows – each way horizontal or vertical and each of the diagonals between these – and many styles of arrow, albeit not every style has all directions.

There's a few arrows that take a corner or end at a bar, three of which commonly appear as labels of keys on a keyboard (useful, e.g., in instructions for what someone is to type to a computer):

⇤
&larrb;, the shift-tab key symbol

⇥
&rarrb; or &RightArrowBar;, the tab key symbol

↵
&crarr;, &ldsh; down-then-left carriage return symbol

↳
&rdsh; down-then-right

↱
&rsh;, &Rsh; up-then-right

↰
&lsh;, &Lsh; up-then-left
Simple arrows point in all eight directions, plus the two-way arrows for horizontal and vertical. There's a common pattern to the names; I've also listed the more verbose names for some examples (but not all):

←
← (U+2190)

↑
↑ (U+2191)

→
→, &RightArrow;, &rightarrow;, &srarr;, &ShortRightArrow; (U+2192)

↓
↓ (U+2193)

↔
↔

↕
&varr;

↙
&swarr;

↖
&nwarr;

↘
&searr;

↗
&nearr;, &nearrow;, &UpperRightArrow;
You can get two non-diagonal arrows, in the same or opposite directions, by using two direction letters:

⇄
&rlarr;, &RightArrowLeftArrow;, &rightleftarrows;

⇅
&udarr;

⇊
&ddarr;

⇇
&llarr;
You can double the stem by capitalising Arr (for all directions). For the one-way horizontal or vertical arrows, you double the head by making the first letter a capital. For left, right and the diagonals, you can hook the start of the arrow; just append hk for left and right, but replace arr with arhk for the diagonals. Illustrating with only a few directions for each:

⇒
⇒, &DoubleRightArrow; or &Rightarrow;, &Implies;, implies

⇔
⇔ if and only if or iff

⇕
&vArr;

⇙
&swArr;

↠
&Rarr;

↟
&Uarr;

↪
&rarrhk;, &hookrightarrow;

⤤
&nearhk;
For the horizontal arrows, an x prefix extends (lengthens) the arrow; an n prefix crosses them with a diagonal line (negating their meaning, when they're used in mathematics); on double-stemmed arrows, an nv prefix crosses the stems perpendicularly, albeit hArr becomes Harr for this. A suffix w makes a both-ways or right arrow wavy; it doesn't work for left arrows, but does for the wavy right arrow. Inserting o between the direction letter and arr gets you an arrow whose head isn't filled in.

⟵
&xlarr; (U+27f5)

⟶
&xrarr; (U+27f6)

⟺
&xhArr;

↚
&nlarr;

⇏
&nrArr;

⤂
&nvlArr;

⤄
&nvHarr;

↭
&harrw;

↝̸
&nrarrw;

⇿
&hoarr;
Many arrows only have left and right forms.

↺, ↻
&olarr;, &orarr; anti-clockwise and clockwise arrows

↫, ↬
&larrlp;, &rarrlp; arrow starting in a loop

↶, ↷
&cularr, &curarr; (a.k.a. &curvearrowleft;, &curvearrowright;) curved arrows

⤽, ⤼
&cularrp;, &curarrm; curved arrows around a plus (left) or minus (right) sign

⤶, ⤷
&ldca; &rdca; downward curving off to either side

⤝, ⤞
&larrfs;, &rarrfs; arrows ending at a diamond

⤟, ⤠
&larrbfs;, &rarrbfs; arrows from a bar to a diamond

↢, ↣
&larrtl;, &rarrtl; arrow with tail

⤙, ⤚
&latail;, &ratail; headless arrow with tail

⤛, ⤜
&lAtail;, &rAtail; headless arrow with double tail

⇚, ⇛
&lAarr;, &rAarr; triple-stemmed arrow

⤌, ⤍
&lbarr;, &rbarr;, a dashed (broken) arrow

⤎, ⤏
&lBarr;, &rBarr; a twice-broken arrow

⥳, ⥴
&larrsim;, &rarrsim; arrow with a tilde below the stem
The previous group clearly should also include (but for a bug in the spec):

⥅
&rarrpl; right arrow with a plus sign below its start

⤹
&larrpl; left-side arc anticlockwise arrow (rather than ⥆ left-arrow with plus sign below its start)

⤸
&cudarrl; right-side arc anticlockwise arrow, mirror image of the previous; which should thus be named

⤵
&cudarrr; arrow pointing rightwards then curving downwards, for which Unicode offers no matching arrow pointing leftwards then curving downwards. There is a right-then-up, ⤴ (Unicode has four of the eight possibilities), but HTML doesn't have a name for it.
Some of the entries that exist left-and-right have companions that only exist on the right:

⤖
&Rarrtl;, double-headed arrow with tail

⤐
&RBarr;, a twice-broken arrow with tail

⥲
&simrarr; arrow with a tilde above the stem

⥵
&rarrap; double tilde below the arrow (approximate arrow)
Then we have the odd-balls and special cases:

⤳
&rarrc;, curvy arrow

⤳̸
&nrarrc;, diagonally crossed curvy arrow

⥱
&erarr; arrow with an equal sign by stem

↦
&maps;, &mapsto;, &RightTeeArrow;

⟼
&xmap;, &longmapsto;

⇝
&zigrarr;

⟿
&dzigrarr;

⤑
&DDotrahd;

⧴
&RuleDelayed;
There are also harpoons, which are arrows with one side of their head missing. These don't do diagonal directions, but the simple ones do go in all horizontal and vertical directions – and we have to specify which half of the head each has:

↼
&lharu;

↽
&lhard;

⇁
&rhard;, &rightharpoondown;, &DownRightVector;

⇀
&rharu;, &rightharpoonup;, &RightVector;

↾
&uharr;, &upharpoonright;, &RightUpVector;

↿
&uharl;

⇂
&dharr;, &downharpoonright;, &RightDownVector;

⇃
&dharl;
You can even have a two-way one – the arrow's ends can be on the same side or (for a horizontal harpoon) opposite:

⥊
&lurdshar;

⥋
&ldrushar;

⥎
&LeftRightVector;

⥐
&DownLeftRightVector;

⥏
&RightUpDownVector;

⥑
&LeftUpDownVector;
Two together can go in the same direction or opposite directions; in the easy form, they have heads on the outside:

⥢
&lHar;

⥤
&rHar;

⥣
&uHar;

⥥
&dHar;

⇌
&rlhar;, &rightleftharpoons;, &Equilibrium;, is in equilibrium with

⇋
&lrhar;, &ReverseEquilibrium;, &leftrightharpoons;

⥮
&udhar;, &UpEquilibrium;

⥯
&duhar;, &ReverseUpEquilibrium;
You can have both heads on the same side, instead; or only have a head on one of them;

⥩
&rdldhar;

⥨
&ruluhar;

⥧
&ldrdhar;

⥦
&luruhar;

⥬
&rharul;

⥪
&lharul;

⥭
&lrhard;

⥫
&llhard;
You can start or end them at a perpendicular line (with results that can look a lot like the digit 1, without being it) – here's a sample:

⥞
&DownLeftTeeVector;

⥕
&RightDownVectorBar;

⥓
&RightVectorBar;

⥠
&LeftUpTeeVector;

There's a whole mess of others, such as right angle with downwards zig-zag, ⍼, but this starts to get into unrecognisable gibberish.

Experiments

I lob experiments in here to see if they work. Some are inspired by TeX, others by wishful thinking and randomness. W3.org doesn't sanction them and I don't necessarily think they're a good idea.

&cdots;: &cdots; centered dots

And if you think I should have done all that with tables (I admit I've been tempted – much of it is crying out for it; and I've succumbed for the Greek alphabet), please pause to consider that even under Lynx I can use this list-form to find the right code to type into a file I'm writing. If I did it with tables, it would look a total mess under browsers that don't support them, which would greatly diminish its utility. Meanwhile the present form works fine in, Arena, Mozilla, Grail – all free (as in liberty) – as well as the proprietary (but gratis) Opera and (gratis once you've paid for the operating system they want you to use) IE.

On semantic character entities

Back in 2002, I suggested to the W3C's CSS folk that maybe it'd be a good idea for style sheet mechanisms to provide for mapping style-sheet-defined &…; tokens to official ones. To illustrate why this would be useful, consider:

Mathematics uses the ∧ wedge character, an inverted v, in at least two ways; as logical and and as the antisymmetric product operator of linear algebra (where it has, again, two meanings: one specific to three-dimensional space, combining vectors to yield a vector; the other applicable to a broad class of tensors; but I'll ignore this ambiguity for the moment).
It would be desirable to allow an HTML document to use ∧, say, for the latter but specify, via a style sheet media="print, screen", that this is to be displayed using the same symbol as &and;
Then, for the media="aural" style sheet, the two conceptually distinct symbols can be specified for distinct pronunciations; and a theorem-prover analysing the document is saved an ambiguity problem.

Similarly, one might wish to have &tensor; map to ⊗, &union; to ∪, ⇔ to ⇔ and so on, enabling mnemonic names even for character entities with only one (orthodox) reading. When co-opting some unicode character to serve some particular purpose, it would likewise make sense to give it a mnemonic name, indicative of that purpose, thereby abstracting away the choice of particular unicode character selected to denote it.

It turns out this can be achieved in various ways.

Abuse some poor innocent HTML element with a short name and no end-tag (i.e. empty content model and optional end-tag), that you don't use much, e.g. BR: specify classes for it that over-ride its content suitably, so that <br class="and"> and <br class="wedge"> will do the job, with a style sheet (for suitable media) along the lines of
```
br.and br.wedge { content: "∧"; }
br.unite { content: "∪"; }
…
```
Downsides: requires you to use the relevant raw unicode character in your style sheet; and, officially, content only applies to the :before and :after pseudo-elements.
Invent your own HTML elements, <wedge> and similar; use a style sheet as above to specify their contents. Same downsides as above, but it's mal-formed HTML instead of abusing real HTML.
Roll your own DTD – write a DTD that pulls in the standard XHTML one and adds a few character entities. Downside: only works for XML, and browser support for reading DTDs on the fly isn't there yet …
Inline DTD fragment in DOCTYPE – a stripped down variant of the previous, that actually works (at least in Opera ≥ 8 and mayhap some other modern enough browsers): begin your XHTML document with something along the lines of:
```
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
	"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [ 
 <!ENTITY wedge     "&#8743;">
 <!ENTITY unite     "&#8746;">
 
]>
```
Downsides: only works for XML (including XHTML) and I haven't yet found a way to put the entity definitions in a separate file and import them to each document: so each document has to duplicate the whole mess.

I still think, fundamentally, that the semantic web would be better served by scrapping the whole ghastly mess of character entities in the DTDs and replacing them with a style-sheet-based approach (or an approach similar to that taken by style sheets). The W3C could perfectly readily provide a standard set of style sheets specifying the present entities (and browsers could still have these built in) for backwards compatibility, but authors would be enabled to provide domain-specific semantic names for the characters they're using and @import the default specs. Doing it via style-sheets is more compatible with existing infrastructure than doing it via DTDs and, in any case, what we're doing here is specifying presentation for character entities, so it belongs in style sheets.

Valid CSS ? Not Valid HTML (due to experimental entities). Written by Eddy.