** stringi package NEWS and CHANGELOG **
========================================


* 0.3-1 (2014-11-06) **CRAN**

   * [IMPORTANT CHANGE] #87: `%>%` overlapped with the pipe operator from
     the `magrittr` package; now each operator like `%>%` has been renamed `%s>%`.

   * [IMPORTANT CHANGE] #108: Now the BreakIterator (for text boundary analysis)
     may be better controlled via `stri_opts_brkiter()` (see options `type`
     and `locale` which aim to replace now-removed `boundary` and `locale` parameters
     to `stri_locate_boundaries`, `stri_split_boundaries`, `stri_trans_totitle`,
     `stri_extract_words`, `stri_locate_words`).

   * [NEW FUNCTIONS] #109: `stri_count_boundaries` and `str_count_words`
     count the number of text boundaries in a string.

   * [NEW FUNCTIONS] #41: `stri_startswith_*` and `stri_endswith_*`
     determine whether a string starts or ends with a given pattern.

   * [NEW FEATURE] #102: stri_replace_all_* gained a `vectorize_all` parameter,
     which defaults to TRUE for backward compatibility.

   * [NEW FUNCTION] #91: `stri_subset_*`, a convenient and more efficient
     substitute for `str[stri_detect_*(str, ...)]`, added.

   * [NEW FEATURE] #100: `stri_split_fixed`, `stri_split_charclass`,
     `stri_split_regex`, `stri_split_coll` gained a `tokens_only` parameter,
     which defaults to `FALSE` for backward compatibility.

   * [NEW FUNCTION] #105: `stri_list2matrix` converts lists of atomic vectors
     to character matrices, useful in connection with `stri_split`
     and `stri_extract`.

   * [NEW FEATURE] #107: `stri_split_*` now allow setting an `omit_empty=NA` argument.

   * [NEW FEATURE] #106: `stri_split` and `stri_extract_all` gained a `simplify`
     argument (if `TRUE`, then `stri_list2matrix(..., byrow=TRUE)`
     is called on the resulting list.

   * [NEW FUNCTION] #77: `stri_rand_lipsum` generates
     (pseudo)random dummy *lorem ipsum* text.

   * [NEW FEATURE] #98: `stri_trans_totitle` gained a `opts_brkiter`
     parameter; it indicates which ICU BreakIterator should be used when
     performing case mapping.

   * [NEW FEATURE] `stri_wrap` gained a new parameter: `normalize`.

   * [BUGFIX] #86: `stri_*_fixed`, `stri_*_coll`, and `stri_*_regex` could
     give incorrect results if one of search strings were of length 0.

   * [BUGFIX] #99: `stri_replace_all' did not use the `replacement` arg.

   * [BUGFIX] #94: `R CMD check` should no longer fail if `icudt` download failed.

   * [BUGFIX] #112: Some of the objects were not PROTECTed from
     being garbage collected, which might have caused spontaneous SEGFAULTS.

   * [BUGFIX] Some collator's options were not passed correctly to ICU services.

   * [BUGFIX] Memory leaks causes as detected by
     `valgrind --tool=memcheck --leak-check=full` have been removed.

   * [DOCUMENTATION] Significant extensions/clean ups in the stringi manual.


* 0.2-5 (2014-05-16) **CRAN**

   * icudt-dependent examples are no longer run if `icudt` is not available.


* 0.2-4 (2014-05-15) **CRAN**

   * [BUGFIX] Issues with loading of misaligned addresses in `stri_*_fixed`.


* 0.2-3 (2014-05-14) **CRAN**

   * [IMPORTANT CHANGE] `stri_cmp*` now do not allow for passing
      `opts_collator=NA`. From now on, `stri_cmp_eq`, `stri_cmp_neq`,
      and the new operators `%===%`, `%!==%`, `%stri===%`, and `%stri!==%`
      are locale-independent operations, which base on code point comparisons.
      New functions `stri_cmp_equiv` and `stri_cmp_nequiv`
      (and from now on also `%==%`, `%!=%`, `%stri==%`, and `%stri!=%`)
      test for canonical equivalence.

   * [IMPORTANT CHANGE] `stri_*_fixed` search functions now perform
      a locale-independent exact (byte-wise, of course after conversion to UTF-8)
      pattern search. All the Collator-based, locale-dependent search routines
      are now available via `stri_*_coll`. The reason for this is that
      ICU USearch has currently very poor performance and in many search tasks
      in fact it is sufficient to do exact pattern matching.

   * `stri_*_fixed` now use a tweaked Knuth-Morris-Pratt search algorithm,
      which improves the search performance drastically.


* 0.2-2 (2014-05-01) **devel**

   * [IMPORTANT CHANGE] `stri_enc_nf*` and `stri_enc_isnf*` function families
      have been renamed to `stri_trans_nf*` and `stri_trans_isnf*`,
      respectively. This is because they deal with text transforming,
      and not with character encoding. Moreover, all such operations may
      also be performed by ICU's Transliterator (see below).

   * [NEW FUNCTION] `stri_trans_general` and `stri_trans_list` give access
      to ICU's Transliterator: may be used to perform very general
      text transforms.

   * [NEW FUNCTION `stri_split_boundaries` utilizes ICU's BreakIterator
      to split strings at specific text boundaries. Moreover,
      stri_locate_boundaries indicates positions of these boundaries.

   * [NEW FUNCTION] `stri_extract_words` uses ICU's BreakIterator to
      extract all words from a text. Additionally, `stri_locate_words`
      locates start and end positions of words in a text.

   * [NEW FUNCTION] `stri_pad`, `stri_pad_left`, `stri_pad_right`,
      and `stri_pad_both` pad a string with a specific code point.

   * [NEW FUNCTION] `stri_wrap` breaks paragraphs of text into lines.
     Two algorithms (greedy and minimal raggedness) are available.


* 0.2-1 (2014-04-18) **devel**

   * [IMPORTANT CHANGE] `stri_*_charclass` search functions now
     rely solely on ICU's UnicodeSet patterns. All previously accepted
     charclass identifiers became invalid. However, new patterns
     should now be more familiar to the users (they are regex-like).
     Moreover, we observe a very nice performance gain.

   * [IMPORTANT CHANGE] `stri_sort` now does not include `NA`s
     in output vectors by default, for compatibility with `sort()`.
     Moreover, currently none of the input vector's attributes are preserved.

   * [NEW FUNCTION] `stri_unique extracts` unique elements from
     a character vector.

   * [NEW FUNCTIONS] `stri_duplicated` and `stri_duplicated_any`
     determine duplicate elements in a character vector.

   * [NEW FUNCTION] `stri_replace_na` replaces `NA`s in a character vector
      with a given string, useful for emulating e.g. R's `paste()` behavior.

   * [NEW FUNCTION] `stri_rand_shuffle` generates a random permutation
     of code points in a string.

   * [NEW FUNCTION] `stri_rand_strings` generates random strings.

   * [NEW FUNCTIONS] New functions and binary operators for string comparison:
     `stri_cmp_eq`, `stri_cmp_neq`, `stri_cmp_lt`, `stri_cmp_le`,
     `stri_cmp_gt`, `stri_cmp_ge`, `%==%`, `%!=%`, `%<%`, `%<=%`, `%>%`, `%>=%`.

   * [NEW FUNCTION] `stri_enc_mark` reads declared encodings of character
     strings as seen by stringi.

   * [NEW FUNCTION] `stri_enc_tonative(str)` is an alias to
     `stri_encode(str, NULL, NULL)`.

   * [NEW FEATURE] `stri_order` and `stri_sort` now have an additional argument
     `na_last` (defaults to `TRUE` and `NA`, respectively).

   * [NEW FEATURE] `stri_replace_all_charclass`, `stri_extract_all_charclass`,
     and `stri_locate_all_charclass` now have a new arg, `merge`
     (defaults to `FALSE` for backward-compatibility). It may be used
     to e.g. replace sequences of white spaces with a single space.

   * [NEW FEATURE] `stri_enc_toutf8` now has a new `validate` arg (defaults
     to FALSE for backward-compatibility). It may be used in a (rare) case
     in which a user wants to fix an invalid UTF-8 byte sequence.
     stri_length (among others) now detect invalid UTF-8 byte sequences.

   * [NEW FEATURE] All binary operators `%???%` now also have aliases `%stri???%`.

   * Performance improvements in `StriContainerUTF8` and `StriContainerUTF16`
     (they affect most other functions).

   * Significant performance improvements in `stri_join`, `stri_flatten`,
     `stri_cmp`, `stri_trans_to*`, and others.

   * Added 3rd mirror site for our icudt binary distribution.

   * `U_MISSING_RESOURCE_ERROR` message in `StriException` now suggests
     calling `stri_install_check()`.

   * [BUGFIX] UTF-8 BOMs are now silently removed from input strings.

   * [BUGFIX] no more attempts to re-encode UTF-8 encoded strings
     if native encoding=UTF-8 in `StriContainerUTF8`.

   * [BUGFIX] possible memory leaks when throwing errors via `Rf_error()`.

   * [BUGFIX] `stri_order` and `stri_cmp` could return incorrect results
     for `opts_collator=NA`.

   * [BUGFIX] `stri_sort` did not guarantee to return strings in UTF-8.


* 0.1-25 (2014-03-12) **CRAN**

    * LICENCE tweaks.

    * Initial CRAN release.


* 0.1-24 (2014-03-11) **devel**

    * Fixed bugs detected with `ASan` and `UBSan`,
      e.g. fixed `CharClass::gcmask` type (`enum` -> `uint32_t`)
      (reported by `UBSan`).

    * Fixed array over-runs detected with `valgrind` in `string8.h`.

    * Fixed unitialized class fields in `StriContainerUTF8`
      (reported by `valgrind`).


* 0.1-23 (2014-03-11) **devel**

    * License changed to BSD-3-clause, COPYRIGHTS updated.

    * icudt is not shipped with stringi anymore;
      it is now downloaded in install.libs.R from one of our servers.

    * New functions: `stri_install_check()`, `stri_install_icudt()`.


* 0.1-22 (2014-02-20) **devel**

   * System ICU is used on systems which do have one (version >= 50 needed).
     ICU is autodetected with `pkg-config` in `./configure`.
     Pass `'--disable-pkg-config'` to `./configure` to force building
     ICU from sources.

   * icudt52b (custom subset) is now shipped with stringi
     (for big-endian, ASCII systems).


* 0.1-21 (2014-02-19) **devel**

   * Fixed some Solaris-related issues while preparing stringi
     for CRAN submission.


* 0.1-20 (2014-02-17) **devel**

   * ICU4C 52.1 sources included (common, i18n, stubdata + icu52dt.dat
     loaded dynamically). Compilation via Makevars.

   * stringi now does not depend on any external libraries.


* 0.1-11 (2013-11-16) **devel**

   * ICU4C is now statically linked on Windows.

   * First OS X binary build.

   * The package is being intensively tested by our students @ FMIS WUT.


* 0.1-10 (2013-11-13) **devel**

   * Using pkg-config via ./configure to look for ICU4C libs.


* 0.1-6 (2013-07-05) **devel**

   * First Windows binary build.

   * Compilation passed on Oracle Sun Studio compiler collection.

   * By now we have implemented most of the functionality
     scheduled for milestone 0.1.


* 0.1-1 (2013-01-05) **devel**

   * The stringi project has been established on GitHub.
