Security The Unicode and UCS benchmarks need that producers of UTF-8 shall use *** shortest sort achievable, for case in point, producing a two-byte sequence with *** first byte 0xc0 is nonconforming.
This is for protection reasons: if user input is checked for feasible security violations, a program may check only for *** ASCII model of "/../" or "" or NUL and neglect that t*** are lots of non-ASCII approaches to characterize ***se things in a non-shortest UTF-8 encod- ing.
UTF-8 encoded UCS figures may possibly be up to six bytes extensive, having said that *** Unicode regular specifies no characters earlier mentioned 0x10ffff, so Unicode people can be only up to 4 bytes extended in UTF-8. All people enclosed among a pair of one estimate marks (''
, o***r than a one quotation, are quoted.
Secondly, since contemporary ter- minal emulators in UTF-8 method also support Chinese, Japanese, and Ko- rean double-width figures as properly as nonspacing combining charac- ters, outputting a single character does not necessarily advance *** cursor by one posture as it did in ASCII.
Matches any a single of *** enclosed people. A pair of characters separated by - matches any character lexically amongst *** pair, inclusive. Matches any one character.