Re: Unable to read UDF fs on a DVD

From: Pat LaVarre
Date: Tue Apr 27 2004 - 14:54:29 EST


> compression id 16 (search for "cid:";),
> means that the characters are coded 16 bits per character.
>
> UDF 2.1.1:
> UDF supports standard Unicode 2.0 except the 'byte-order mark'
> chars #FEFF and #FFFE.
> These characters are coded in OSTA Compressed Unicode format,
> which means 8 bits per char or 16 bits per char.
> If a file identifier contains only unicode chars with al value
> less than #0100, compression id 8 can be used.

Link! Thank you. I clicked thru to:

--- http://www.osta.org/specs/pdf/udf250.pdf
--- (page 17 of 165)

2.1.1 Character Sets

The character set used by UDF for the structures defined in this
document is the ... OSTA CS0 character set ... defined as follows:
...

---

Between your English and their English, I conclude,

I should expect to see 8 or 16 bits per char. Specifically, when I'm
looking at hex bytes, if I see x08 then thereafter I should see 8 bits
per char thereafter, but if I see x10 then thereafter I should see x10
bits per char.

That sure sounds easier than UTF-8 is, to decode visually from a
hexdump. For example, I now think, with "OSTA Compressed Unicode", also
known as the "OSTA CS0 character set", the $'\xE2\x82\xAC' x20AC â "EURO
SIGN" will always appear as the plain hex byte pair x 20 AC.

With this much context in place, now the 2004-04-23 guess of "a problem
with 16 bit characters vs 8 bit characters" makes sense. That guess
says cid 8 maybe works better than cid 16, maybe especially when we need
cid 16 to express a char outside of the x00..FF range.

Pat LaVarre


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/