Unicode Vim
Unicode → UnicodeVim
In Kürze
- Jedes UTF-Zeichen kann mit
Ctrl-V u ####
eingefügt werden, wobei
#### die Hexadezimalrepräsentation des Zeichens ist.
- im „normalen“ Modus zeigt
ga
das Zeichen unter dem Cursor als
Text, Dezimalzahl, Oktalzahl und Hexadezimalzahl an.
Englisch: Working with Unicode
url: http://vim.sourceforge.net/tips/tip.php?tip_id=246
basic Tip #246: Working with Unicode (the same, rewritten for legibility)
tip karma Rating 119/46, Viewed by 3166
created: May 10, 2002 16:19 complexity: basic
author: Tony Mechelynck as of Vim: 6.0
Where to look for help
:h utf8
:h encoding-values
:h 'enc'
:h 'fenc'
:h 'fencs'
:h 'tenc'
:h 'bomb'
:h 'guifont'
:h ga
:h g8
:h :dig
:h i_Ctrl-V_digit
:h has()
What to do (These are examples. Modify them to suit your work environment.)
if has("multi_byte")
set encoding=utf-8
setglobal fileencoding=utf-8
set bomb
set termencoding=iso-8859-15
set fileencodings=ucs-bom,iso-8859-15,iso-8859-3,utf-8
else
echoerr "Sorry, this version of (g)vim was not compiled with +multi_byte"
endif
What the above does
- has("multi_byte") checks if you have the right options compiled-in. If you haven't got what it takes, it's no use trying to use Unicode.
- 'encoding' sets how vim shall represent characters internally. Utf-8 is necessary for most flavors of Unicode.
- 'fileencoding' sets the encoding for a particular file (local to
buffer); :setglobal sets the default value. An empty value can also
be used: it defaults to same as 'encoding'. Or you may want to set
one of the ucs encodings, It might make the same disk file bigger or
smaller depending on your particular mix of characters. Also, IIUC,
utf-8 is always big-endian (high bit first) while ucs can be
big-endian or little-endian, so if you use it, you will probably
need to set 'bomb" (see below).
- 'bomb' (boolean): if set, vim will put a "byte order mark" at the
start of ucs files. This option is irrelevant for most non-ucs files
(utf-8, iso-8859, etc.)
- 'termencoding' defines how your keyboard encodes what you type. The
value you put there will depend on your locale: iso-8859-15 is
Latin1 + Euro currency sign, but you may want something else for,
say, an Eastern European keyboard.
- 'fileencodings' defines the heuristic to set 'fileencoding' (local
to buffer) when reading an existing file. The first one that matches
will be used (and, IIUC, if there is no match, Vim falls back on
Latin1). Ucs-bom is "ucs with byte-order-mark"; it must not come
after utf-8 if you want it to be used.
Additional remarks
- In "replace" mode, one utf character (one or more data bytes)
replaces one utf character (which need not use the same number of
bytes)
- In "normal" mode, ga shows the character under the cursor as text,
decimal, octal and hex; g8 shows which byte(s) is/are used to
represent it.
In "insert" or "replace" mode:
- any character defined on your keyboard can be entered the usual
way (even with dead keys if you have them, e.g. French circumflex,
German umlaut, etc.);
- any character which has a "digraph" (there are a huge lot of them,
see :dig after setting enc=utf-8) can be entered with a Ctrl-K
prefix;
- any utf character at all can be entered with a Ctrl-V prefix,
either
<Ctrl-V> u aaaa
or <Ctrl-V> U bbbbbbbb
, with 0
<= aaaa <= FFFF, or 0 <= bbbbbbbb <= 7FFFFFFF.
...
- Unicode can be used to create html "body text", at least for
Netscape 6 and probably for IE; but on my machine it doesn't display
properly as "title text" (i.e., between <title></title>
tags in the <head> part).
- Gvim will display it properly if you have the fonts for it, provided
that you set 'guifont' to some fixed-width font which has the glyphs
you want to use (Courier New is OK for French, German, Greek,
Russian and more, but I'm not sure about Hebrew or Arabic; its
glyphs are of a more "fixed" width than those of, e.g. Lucida
Console: the latter can be awkward if you need bold Cyrillic
writing).
Happy Vimming !
Tony.