Package com.fasterxml.aalto.in
Class StreamScanner
java.lang.Object
com.fasterxml.aalto.in.XmlScanner
com.fasterxml.aalto.in.ByteBasedScanner
com.fasterxml.aalto.in.StreamScanner
- All Implemented Interfaces:
XmlConsts
,NamespaceContext
,XMLStreamConstants
- Direct Known Subclasses:
Utf8Scanner
Base class for various byte stream based scanners (generally one
for each type of encoding supported).
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final XmlCharTypes
This is a simple container object that is used to access the decoding tables for characters.protected InputStream
Underlying InputStream to use for reading content.protected byte[]
protected int[]
This buffer is used for name parsing.protected final ByteBasedPNameTable
For now, symbol table contains prefixed names.Fields inherited from class com.fasterxml.aalto.in.ByteBasedScanner
_inputEnd, _inputPtr, _tmpChar, BYTE_a, BYTE_A, BYTE_AMP, BYTE_APOS, BYTE_C, BYTE_CR, BYTE_D, BYTE_EQ, BYTE_EXCL, BYTE_g, BYTE_GT, BYTE_HASH, BYTE_HYPHEN, BYTE_l, BYTE_LBRACKET, BYTE_LF, BYTE_LT, BYTE_m, BYTE_NULL, BYTE_o, BYTE_p, BYTE_P, BYTE_q, BYTE_QMARK, BYTE_QUOT, BYTE_RBRACKET, BYTE_s, BYTE_S, BYTE_SEMICOLON, BYTE_SLASH, BYTE_SPACE, BYTE_t, BYTE_T, BYTE_TAB, BYTE_u, BYTE_x
Fields inherited from class com.fasterxml.aalto.in.XmlScanner
_attrCollector, _attrCount, _cfgCoalescing, _cfgLazyParsing, _config, _currElem, _currNsCount, _currRow, _currToken, _defaultNs, _depth, _entityPending, _isEmptyTag, _lastNsContext, _lastNsDecl, _nameBuffer, _nsBindingCache, _nsBindingCount, _nsBindings, _nsBindMisses, _pastBytesOrChars, _publicId, _rowStartOffset, _startColumn, _startRawOffset, _startRow, _systemId, _textBuilder, _tokenIncomplete, _tokenName, _xml11, CDATA_STR, INT_0, INT_9, INT_a, INT_A, INT_AMP, INT_APOS, INT_COLON, INT_CR, INT_EQ, INT_EXCL, INT_f, INT_F, INT_GT, INT_HYPHEN, INT_LBRACKET, INT_LF, INT_LT, INT_NULL, INT_QMARK, INT_QUOTE, INT_RBRACKET, INT_SLASH, INT_SPACE, INT_TAB, INT_z, MAX_UNICODE_CHAR, TOKEN_EOI
Fields inherited from interface com.fasterxml.aalto.util.XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWN
Fields inherited from interface javax.xml.stream.XMLStreamConstants
ATTRIBUTE, CDATA, CHARACTERS, COMMENT, DTD, END_DOCUMENT, END_ELEMENT, ENTITY_DECLARATION, ENTITY_REFERENCE, NAMESPACE, NOTATION_DECLARATION, PROCESSING_INSTRUCTION, SPACE, START_DOCUMENT, START_ELEMENT
-
Constructor Summary
ConstructorsConstructorDescriptionStreamScanner
(ReaderConfig cfg, InputStream in, byte[] buffer, int ptr, int last) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
protected int
Helper method used to isolate things that need to be (re)set in cases whereprotected void
protected final PName
addPName
(int hash, int[] quads, int qlen, int lastQuadBytes) protected final int
checkInTreeIndentation
(int c) Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end).protected final int
checkPrologIndentation
(int c) private final PName
findPName
(int onlyQuad, int lastByteCount) Method called to process a sequence of bytes that is likely to be a PName.private final PName
findPName
(int lastQuad, int[] quads, int qlen, int lastByteCount) Method called to process a sequence of bytes that is likely to be a PName.private final PName
findPName
(int firstQuad, int secondQuad, int lastByteCount) Method called to process a sequence of bytes that is likely to be a PName.private final PName
findPName
(int lastQuad, int lastByteCount, int firstQuad, int qlen, int[] quads) Method called to process a sequence of bytes that is likely to be a PName.protected final int
private final int
private final int
protected final int
Note that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1.private final int
handleEndElementSlow
(int size) protected abstract int
handleEntityInText
(boolean inAttr) private final int
Method called after leading 'invalid input: '<'?' has been parsed; needs to parse target.private final int
handlePrologDeclStart
(boolean isProlog) protected abstract int
handleStartElement
(byte b) Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.protected final boolean
loadAndRetain
(int nrOfChars) protected final boolean
loadMore()
protected final byte
loadOne()
protected final byte
loadOne
(int type) private final void
matchAsciiKeyword
(String keyw) protected final byte
nextByte()
protected final byte
nextByte
(int tt) final int
nextFromProlog
(boolean isProlog) final int
protected final PName
parsePName
(byte b) This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.protected final PName
parsePNameLong
(int q, int[] quads) protected PName
parsePNameMedium
(int i2, int q1) protected final PName
parsePNameSlow
(byte b) protected abstract String
parsePublicId
(byte quoteChar) protected abstract String
parseSystemId
(byte quoteChar) protected byte
skipInternalWs
(boolean reqd, String msg) Methods inherited from class com.fasterxml.aalto.in.ByteBasedScanner
addUTFPName, decodeCharForError, getCurrentColumnNr, getCurrentLocation, getEndingByteOffset, getEndingCharOffset, getStartingByteOffset, getStartingCharOffset, markLF, markLF, reportInvalidInitial, reportInvalidOther, setStartLocation
Methods inherited from class com.fasterxml.aalto.in.XmlScanner
bindName, bindNs, checkImmutableBinding, close, decodeAttrBinaryValue, decodeAttrValue, decodeAttrValues, decodeElements, findAttrIndex, findOrCreateBinding, finishCData, finishCharacters, finishComment, finishDTD, finishPI, finishSpace, finishToken, fireSaxCharacterEvents, fireSaxCommentEvent, fireSaxEndElement, fireSaxPIEvent, fireSaxSpaceEvents, fireSaxStartElement, getAttrCollector, getAttrCount, getAttrLocalName, getAttrNsURI, getAttrPrefix, getAttrPrefixedName, getAttrQName, getAttrType, getAttrValue, getAttrValue, getConfig, getCurrentLineNr, getDepth, getDTDPublicId, getDTDSystemId, getEndLocation, getInputPublicId, getInputSystemId, getName, getNamespacePrefix, getNamespaceURI, getNamespaceURI, getNamespaceURI, getNonTransientNamespaceContext, getNsCount, getPrefix, getPrefixes, getQName, getStartLocation, getText, getText, getTextCharacters, getTextCharacters, getTextLength, handleInvalidXmlChar, hasEmptyStack, isAttrSpecified, isEmptyTag, isTextWhitespace, loadMoreGuaranteed, loadMoreGuaranteed, reportDoubleHyphenInComments, reportDuplicateNsDecl, reportEntityOverflow, reportEofInName, reportIllegalCDataEnd, reportIllegalNsDecl, reportIllegalNsDecl, reportInputProblem, reportInvalidNameChar, reportInvalidNsIndex, reportInvalidXmlChar, reportMissingPISpace, reportMultipleColonsInName, reportPrologProblem, reportPrologUnexpChar, reportPrologUnexpElement, reportTreeUnexpChar, reportUnboundPrefix, reportUnexpandedEntityInAttr, reportUnexpectedEndTag, resetForDecoding, skipCData, skipCharacters, skipCoalescedText, skipComment, skipPI, skipSpace, skipToken, throwInvalidSpace, throwNullChar, throwUnexpectedChar, verifyXmlChar
-
Field Details
-
_in
Underlying InputStream to use for reading content. -
_inputBuffer
protected byte[] _inputBuffer -
_charTypes
This is a simple container object that is used to access the decoding tables for characters. Indirection is needed since we actually support multiple utf-8 compatible encodings, not just utf-8 itself. -
_symbols
For now, symbol table contains prefixed names. In future it is possible that they may be split into prefixes and local names? -
_quadBuffer
protected int[] _quadBufferThis buffer is used for name parsing. Will be expanded if/as needed; 32 ints can hold names 128 ascii chars long.
-
-
Constructor Details
-
StreamScanner
-
-
Method Details
-
_releaseBuffers
protected void _releaseBuffers()- Overrides:
_releaseBuffers
in classXmlScanner
-
_closeSource
- Specified by:
_closeSource
in classByteBasedScanner
- Throws:
IOException
-
handleEntityInText
- Throws:
XMLStreamException
-
parsePublicId
- Throws:
XMLStreamException
-
parseSystemId
- Throws:
XMLStreamException
-
nextFromProlog
- Specified by:
nextFromProlog
in classXmlScanner
- Throws:
XMLStreamException
-
nextFromTree
- Specified by:
nextFromTree
in classXmlScanner
- Throws:
XMLStreamException
-
_nextEntity
protected int _nextEntity()Helper method used to isolate things that need to be (re)set in cases where -
handlePrologDeclStart
- Throws:
XMLStreamException
-
handleDtdStart
- Throws:
XMLStreamException
-
handleCommentOrCdataStart
- Throws:
XMLStreamException
-
handlePIStart
Method called after leading 'invalid input: '<'?' has been parsed; needs to parse target.- Throws:
XMLStreamException
-
handleCharEntity
- Returns:
- Code point for the entity that expands to a valid XML content character.
- Throws:
XMLStreamException
-
handleStartElement
Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.- Throws:
XMLStreamException
-
handleEndElement
Note that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1. The reason is that since we already know exact bytes that need to be matched, there's no danger of getting invalid encodings or such. So, for now, let's leave this method here in the base class.- Throws:
XMLStreamException
-
handleEndElementSlow
- Throws:
XMLStreamException
-
parsePName
This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.Some notes about assumption implementation makes:
- Well-formed xml content can not end with a name: as such, end-of-input is an error and we can throw an exception
- Throws:
XMLStreamException
-
parsePNameMedium
- Throws:
XMLStreamException
-
parsePNameLong
- Throws:
XMLStreamException
-
parsePNameSlow
- Throws:
XMLStreamException
-
findPName
Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
onlyQuad
- Word with 1 to 4 bytes that make up PNamelastByteCount
- Number of actual bytes contained in onlyQuad; 0 to 3.- Throws:
XMLStreamException
-
findPName
private final PName findPName(int firstQuad, int secondQuad, int lastByteCount) throws XMLStreamException Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
firstQuad
- First 1 to 4 bytes of the PNamesecondQuad
- Word with last 1 to 4 bytes of the PNamelastByteCount
- Number of bytes contained in secondQuad; 0 to 3.- Throws:
XMLStreamException
-
findPName
private final PName findPName(int lastQuad, int[] quads, int qlen, int lastByteCount) throws XMLStreamException Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
lastQuad
- Word with last 0 to 3 bytes of the PName; not included in the quad arrayquads
- Array that contains all the quads, except for the last one, for names with more than 8 bytes (i.e. more than 2 quads)qlen
- Number of quads in the array, except if less than 2 (in which case only firstQuad and lastQuad are used)lastByteCount
- Number of bytes contained in lastQuad; 0 to 3.- Throws:
XMLStreamException
-
findPName
private final PName findPName(int lastQuad, int lastByteCount, int firstQuad, int qlen, int[] quads) throws XMLStreamException Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
lastQuad
- Word with last 0 to 3 bytes of the PName; not included in the quad arraylastByteCount
- Number of bytes contained in lastQuad; 0 to 3.firstQuad
- First 1 to 4 bytes of the PName (4 if length at least 4 bytes; less only if not).qlen
- Number of quads in the array, except if less than 2 (in which case only firstQuad and lastQuad are used)quads
- Array that contains all the quads, except for the last one, for names with more than 8 bytes (i.e. more than 2 quads)- Throws:
XMLStreamException
-
addPName
protected final PName addPName(int hash, int[] quads, int qlen, int lastQuadBytes) throws XMLStreamException - Throws:
XMLStreamException
-
skipInternalWs
- Returns:
- First byte following skipped white space
- Throws:
XMLStreamException
-
matchAsciiKeyword
- Throws:
XMLStreamException
-
checkInTreeIndentation
Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end). This so that if a CDATA section follows, it can be coalesced in coalescing mode. Although we could check if coalescing mode is enabled, this should seldom have significant effect either way, so it removes one possible source of problems in coalescing mode.
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
XMLStreamException
-
checkPrologIndentation
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
XMLStreamException
-
loadMore
- Specified by:
loadMore
in classXmlScanner
- Throws:
XMLStreamException
-
nextByte
- Throws:
XMLStreamException
-
nextByte
- Throws:
XMLStreamException
-
loadOne
- Throws:
XMLStreamException
-
loadOne
- Throws:
XMLStreamException
-
loadAndRetain
- Throws:
XMLStreamException
-