org.jmol.adapter.readers.cifpdb

Class CifReader.RidiculousFileFormatTokenizer

class CifReader.RidiculousFileFormatTokenizer extends Object

A special tokenizer class for dealing with quoted strings in CIF files.

regarding the treatment of single quotes vs. primes in cif file, PMR wrote:

* There is a formal grammar for CIF (see http://www.iucr.org/iucr-top/cif/index.html) which confirms this. The textual explanation is

14. Matching single or double quote characters (' or ") may be used to bound a string representing a non-simple data value provided the string does not extend over more than one line.

15. Because data values are invariably separated from other tokens in the file by white space, such a quote-delimited character string may contain instances of the character used to delimit the string provided they are not followed by white space. For example, the data item _example 'a dog's life' is legal; the data value is a dog's life.

[PMR - the terminating character(s) are quote+whitespace. That would mean that: _example 'Jones' life' would be an error

The CIF format was developed in that late 1980's under the aegis of the International Union of Crystallography (I am a consultant to the COMCIFs committee). It was ratified by the Union and there have been several workshops. mmCIF is an extension of CIF which includes a relational structure. The formal publications are:

Hall, S. R. (1991). "The STAR File: A New Format for Electronic Data Transfer and Archiving", J. Chem. Inform. Comp. Sci., 31, 326-333. Hall, S. R., Allen, F. H. and Brown, I. D. (1991). "The Crystallographic Information File (CIF): A New Standard Archive File for Crystallography", Acta Cryst., A47, 655-685. Hall, S.R. & Spadaccini, N. (1994). "The STAR File: Detailed Specifications," J. Chem. Info. Comp. Sci., 34, 505-508.

Field Summary
intcch
intich
intichPeeked
Stringstr
StringstrPeeked
booleanwasUnQuoted
Method Summary
StringfullTrim(String str)
specially for names that might be multiline
booleangetData()
general reader for loop data fills loopData with fieldCount fields
StringgetNextDataToken()
first checks to see if the next token is an unquoted control code, and if so, returns null
StringgetNextToken()
StringgetTokenPeeked()
booleanhasMoreTokens()
StringnextToken()
assume that hasMoreTokens() has been called and that ich is pointing at a non-white character.
StringpeekToken()
just look at the next token.
voidsetString(String str)
sets a string to be parsed from the beginning
StringsetStringNextLine()
sets the string for parsing to be from the next line when the token buffer is empty, and if ';' is at the beginning of that line, extends the string to include that full multiline string.

Field Detail

cch

int cch

ich

int ich

ichPeeked

int ichPeeked

str

String str

strPeeked

String strPeeked

wasUnQuoted

boolean wasUnQuoted

Method Detail

fullTrim

String fullTrim(String str)
specially for names that might be multiline

Parameters: str

Returns: str without any leading/trailing white space, and no '\n'

getData

boolean getData()
general reader for loop data fills loopData with fieldCount fields

Returns: false if EOF

Throws: Exception

getNextDataToken

String getNextDataToken()
first checks to see if the next token is an unquoted control code, and if so, returns null

Returns: next data token or null

Throws: Exception

getNextToken

String getNextToken()

Returns: the next token of any kind, or null

Throws: Exception

getTokenPeeked

String getTokenPeeked()

Returns: the token last acquired; may be null

hasMoreTokens

boolean hasMoreTokens()

Returns: TRUE if there are more tokens in the line buffer

nextToken

String nextToken()
assume that hasMoreTokens() has been called and that ich is pointing at a non-white character. Also sets boolean wasUnQuoted, because we need to know if we should be checking for a control keyword. 'loop_' is different from just loop_ without the quotes.

Returns: null if no more tokens, "\0" if '.' or '?', or next token

peekToken

String peekToken()
just look at the next token. Saves it for retrieval using getTokenPeeked()

Returns: next token or null if EOF

Throws: Exception

setString

private void setString(String str)
sets a string to be parsed from the beginning

Parameters: str

setStringNextLine

String setStringNextLine()
sets the string for parsing to be from the next line when the token buffer is empty, and if ';' is at the beginning of that line, extends the string to include that full multiline string. Uses \1 to indicate that this is a special quotation.

Returns: the next line or null if EOF

Throws: Exception