Tokenizer Improvements
-
Changed the default Tokenizer Function for tokenizer to char_separator.
-
Added more tests to the examples including tests with just InputIterators
-
Changed offset_separator, char_delimiters_separator, char_separator to use
tok.assign when the iterator for the sequence they are dealing with is at least
a forward iterator. If it is not, then they use tok+= as they do currently.
-
TokenizerFunctions operator() and reset() are now const functions. If they need
to maintain information beyond the position of the iterators, they typedef
mutable_type to the type of the variable they need. A reference to mutable_type
will get passed in to them. By making operator() and reset() const, this
enables the TokenizerFunction to be shared across all instances of iterators
for 1 tokenizer.
Acknowledgements
I would like to thankGennadiy E. Rozental for suggesting using tok.assign
instead of tok+=. I would also like to thank George A. Heintzelman
for his idea of distinguishing the const and non-const aspects of
TokenizerFunction.
Downloading and Usage
Download from here. To use, unzip it and
put it in your include path before the regular boost library.
Comments
I would love to hear whatever comments you have. I can be reached at
jbandela@ufl.edu and I also read the boost list. Please post comments
to the boost list with "Tokenizer Improvements" somewhere in the subject
line so I can easily see find them.