Class ByteBasedPNameTable

java.lang.Object
com.fasterxml.aalto.util.NameTable
com.fasterxml.aalto.in.ByteBasedPNameTable

public final class ByteBasedPNameTable extends NameTable
This is a symbol table implementation used for storing byte-based PNames, specifically, instances of (ByteBasedPName).
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    (package private) static final class 
     
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    (package private) static final int
     
    (package private) static final int
    Bucket index is 8 bits, and value 0 is reserved to represent 'empty' status.
    private int
    Total number of PNames in collision buckets (included in mCount along with primary entries)
    private int
    Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker)
    Array of heads of collision bucket chains; size dynamically
    private boolean
    Flag that indicates whether underlying data structures for the collision list are shared or not.
    private int
    Total number of PNames in the symbol table
    (package private) static final int
     
    private int[]
    Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index)
    private int
    Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N).
    private boolean
    Flag that indicates whether underlying data structures for the main hash area are shared or not.
    private ByteBasedPName[]
    Array that contains PName instances matching entries in mMainHash.
    private boolean
     
    private boolean
    This flag is set if, after adding a new entry, it is deemed that a rehash is warranted if any more entries are to be added.
  • Constructor Summary

    Constructors
    Constructor
    Description
    ByteBasedPNameTable(int hashSize)
     
    Constructor used when creating a child instance
  • Method Summary

    Modifier and Type
    Method
    Description
    addSymbol(int hash, String symbolStr, int colonIx, int[] quads, int qlen)
     
    addSymbol(int hash, String symbolStr, int colonIx, int firstQuad, int secondQuad)
     
    static final int
    calcHash(int firstQuad)
     
    static final int
    calcHash(int[] quads, int qlen)
     
    static final int
    calcHash(int firstQuad, int secondQuad)
     
    static int[]
    calcQuads(byte[] wordBytes)
     
    private void
    doAddSymbol(int hash, ByteBasedPName symbol)
     
    private void
     
    private int
    Method called to find the best bucket to spill a PName over to: usually the first bucket that has only one entry, but in general first one of the buckets with least number of entries
    findSymbol(int hash, int[] quads, int qlen)
    Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.
    findSymbol(int hash, int firstQuad, int secondQuad)
    Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.
    void
     
    boolean
    Method called to check to quickly see if a child symbol table may have gotten additional entries.
    boolean
     
    void
    Method used by test code, to reset state of the name table.
    private void
     
    int
     
     
    private void
     
    private void
    Method that needs to be called, if the main hash structure is (may be) shared.
    private void
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • MIN_HASH_SIZE

      static final int MIN_HASH_SIZE
      See Also:
    • INITIAL_COLLISION_LEN

      static final int INITIAL_COLLISION_LEN
      See Also:
    • LAST_VALID_BUCKET

      static final int LAST_VALID_BUCKET
      Bucket index is 8 bits, and value 0 is reserved to represent 'empty' status.
      See Also:
    • mCount

      private int mCount
      Total number of PNames in the symbol table
    • mMainHashMask

      private int mMainHashMask
      Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N).
    • mMainHash

      private int[] mMainHash
      Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index)
    • mMainNames

      private ByteBasedPName[] mMainNames
      Array that contains PName instances matching entries in mMainHash. Contains nulls for unused entries.
    • mCollList

      private ByteBasedPNameTable.Bucket[] mCollList
      Array of heads of collision bucket chains; size dynamically
    • mCollCount

      private int mCollCount
      Total number of PNames in collision buckets (included in mCount along with primary entries)
    • mCollEnd

      private int mCollEnd
      Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker)
    • mNeedRehash

      private transient boolean mNeedRehash
      This flag is set if, after adding a new entry, it is deemed that a rehash is warranted if any more entries are to be added.
    • mMainHashShared

      private boolean mMainHashShared
      Flag that indicates whether underlying data structures for the main hash area are shared or not. If they are, then they need to be handled in copy-on-write way, i.e. if they need to be modified, a copy needs to be made first; at this point it will not be shared any more, and can be modified.

      This flag needs to be checked both when adding new main entries, and when adding new collision list queues (i.e. creating a new collision list head entry)

    • mMainNamesShared

      private boolean mMainNamesShared
    • mCollListShared

      private boolean mCollListShared
      Flag that indicates whether underlying data structures for the collision list are shared or not. If they are, then they need to be handled in copy-on-write way, i.e. if they need to be modified, a copy needs to be made first; at this point it will not be shared any more, and can be modified.

      This flag needs to be checked when adding new collision entries.

  • Constructor Details

    • ByteBasedPNameTable

      public ByteBasedPNameTable(int hashSize)
    • ByteBasedPNameTable

      ByteBasedPNameTable(ByteBasedPNameTable parent)
      Constructor used when creating a child instance
  • Method Details

    • mergeFromChild

      public boolean mergeFromChild(ByteBasedPNameTable child)
    • markAsShared

      public void markAsShared()
    • nuke

      public void nuke()
      Method used by test code, to reset state of the name table.
    • size

      public int size()
      Specified by:
      size in class NameTable
    • maybeDirty

      public boolean maybeDirty()
      Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.
      Specified by:
      maybeDirty in class NameTable
    • findSymbol

      public ByteBasedPName findSymbol(int hash, int firstQuad, int secondQuad)
      Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.

      Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)

      Parameters:
      firstQuad - int32 containing first 4 bytes of the pname; if the whole name less than 4 bytes, padded with zero bytes in front (zero MSBs, ie. right aligned)
      secondQuad - int32 containing bytes 5 through 8 of the pname; if less than 8 bytes, padded with up to 4 zero bytes in front (zero MSBs, ie. right aligned)
      Returns:
      PName matching the symbol passed (or constructed for it)
    • findSymbol

      public ByteBasedPName findSymbol(int hash, int[] quads, int qlen)
      Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.

      Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.

      Parameters:
      quads - Array of int32s, each of which contain 4 bytes of encoded name
      qlen - Number of int32s, starting from index 0, in quads parameter
      Returns:
      PName matching the symbol passed (or constructed for it)
    • addSymbol

      public ByteBasedPName addSymbol(int hash, String symbolStr, int colonIx, int firstQuad, int secondQuad)
    • addSymbol

      public ByteBasedPName addSymbol(int hash, String symbolStr, int colonIx, int[] quads, int qlen)
    • calcHash

      public static final int calcHash(int firstQuad)
    • calcHash

      public static final int calcHash(int firstQuad, int secondQuad)
    • calcHash

      public static final int calcHash(int[] quads, int qlen)
    • calcQuads

      public static int[] calcQuads(byte[] wordBytes)
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • doAddSymbol

      private void doAddSymbol(int hash, ByteBasedPName symbol)
    • rehash

      private void rehash()
    • findBestBucket

      private int findBestBucket()
      Method called to find the best bucket to spill a PName over to: usually the first bucket that has only one entry, but in general first one of the buckets with least number of entries
    • unshareMain

      private void unshareMain()
      Method that needs to be called, if the main hash structure is (may be) shared. This happens every time something is added, even if addition is to the collision list (since collision list index comes from lowest 8 bits of the primary hash entry)
    • unshareCollision

      private void unshareCollision()
    • unshareNames

      private void unshareNames()
    • expandCollision

      private void expandCollision()