aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorIvan Enderlin <ivan.enderlin@hoa-project.net>2015-01-23 19:27:04 +0100
committerIvan Enderlin <ivan.enderlin@hoa-project.net>2015-01-23 19:27:04 +0100
commitda2d2d9b39133cac522592cf62511b58346b5546 (patch)
tree2626af80ce7175b0d3a9883396216141ac1af2b4
parent1bb808ab1b282fafd47ac7012c34f6e681abab9b (diff)
parent2f826c2436f9bc495c8310bfe3127d9c3997d8fd (diff)
downloadUstring-da2d2d9b39133cac522592cf62511b58346b5546.zip
Ustring-da2d2d9b39133cac522592cf62511b58346b5546.tar.gz
Ustring-da2d2d9b39133cac522592cf62511b58346b5546.tar.bz2
Merge branch 'doc_en' into incoming
-rw-r--r--Documentation/En/Index.xyl604
-rw-r--r--Documentation/Fr/Index.xyl33
2 files changed, 617 insertions, 20 deletions
diff --git a/Documentation/En/Index.xyl b/Documentation/En/Index.xyl
index 6fa3e8f..1fd45ca 100644
--- a/Documentation/En/Index.xyl
+++ b/Documentation/En/Index.xyl
@@ -3,8 +3,608 @@
<overlay xmlns="http://hoa-project.net/xyl/xylophone">
<yield id="chapter">
- <p class="warning">This chapter is not yet translated. Contributions are
- welcomed!</p>
+ <p>Strings can sometimes be <strong>complex</strong>, especially when they use
+ the <code>Unicode</code> encoding format. The <code>Hoa\String</code> library
+ provides several operations on UTF-8 strings.</p>
+
+ <h2 id="Table_of_contents">Table of contents</h2>
+
+ <tableofcontents id="main-toc" />
+
+ <h2 id="Introduction" for="main-toc">Introduction</h2>
+
+ <p>When we manipulate strings, the <a href="http://unicode.org/">Unicode</a>
+ format establishes itself because of its <strong>compatibility</strong> with
+ historical formats (like ASCII) and its capacity to understand a
+ <strong>large</strong> range of characters and symbols for all cultures and
+ all regions in the world. PHP provides several tools to manipulate such
+ strings, like the following extensions:
+ <a href="http://php.net/mbstring"><code>mbstring</code></a>,
+ <a href="http://php.net/iconv"><code>iconv</code></a> or also the excellent
+ <a href="http://php.net/intl"><code>intl</code></a> which is based on
+ <a href="http://icu-project.org/">ICU</a>, the reference implementation of
+ Unicode. Unfortunately, sometimes we have to mix these extensions to achieve
+ our aims and at the cost of a certain <strong>complexity</strong> along with
+ a regrettable <strong>verbosity</strong>.</p>
+ <p>The <code>Hoa\String</code> library answers to these issues by providing a
+ <strong>simple</strong> way to manipulate strings with
+ <strong>performance</strong> and <strong>efficiency</strong> in minds. It
+ also provides some evoluated algorithms to perform <strong>search</strong>
+ operations on strings.</p>
+
+ <h2 id="Unicode_strings" for="main-toc">Unicode strings</h2>
+
+ <p>The <code>Hoa\String\String</code> class represents a
+ <strong>UTF-8</strong> Unicode strings and allows to manipulate it easily.
+ This class implements the
+ <a href="http://php.net/arrayaccess"><code>ArrayAccess</code></a>,
+ <a href="http://php.net/countable"><code>Countable</code></a> and
+ <a href="http://php.net/iteratoraggregate"><code>IteratorAggregate</code></a>
+ interfaces. We are going to use three examples in three different languages:
+ French, Arab and Japanese. Thus:</p>
+ <pre><code class="language-php">$french = new Hoa\String\String('Je t\'aime');
+$arabic = new Hoa\String\String('أحبك');
+$japanese = new Hoa\String\String('私はあなたを愛して');</code></pre>
+ <p>Now, let's see what we can do on these three strings.</p>
+
+ <h3 id="String_manipulation" for="main-toc">String manipulation</h3>
+
+ <p>Let's start with <strong>elementary</strong> operations. If we would like
+ to <strong>count</strong> the number of characters (not bytes), we will use
+ the <a href="http://php.net/count"><code>count</code> function</a>. Thus:</p>
+ <pre><code class="language-php">var_dump(
+ count($french),
+ count($arabic),
+ count($japanese)
+);
+
+/**
+ * Will output:
+ * int(9)
+ * int(4)
+ * int(9)
+ */</code></pre>
+ <p>When we speak about text position, it is not suitable to speak about the
+ right or the left, but rather about a <strong>beginning</strong> or an
+ <strong>end</strong>, and based on the <strong>direction</strong> of writing.
+ We can know this direction thanks to the
+ <code>Hoa\String\String::getDirection</code> method. It returns the value of
+ one of the following constants:</p>
+ <ul>
+ <li><code>Hoa\String\String::LTR</code>, for left-to-right, if the text is
+ written from the left to the right,</li>
+ <li><code>Hoa\String\String::RTL</code>, for right-to-left, if the text is
+ written from the right to the left.</li>
+ </ul>
+ <p>Let's observe the result with our examples:</p>
+ <pre><code class="language-php">var_dump(
+ $french->getDirection() === Hoa\String\String::LTR, // is left-to-right?
+ $arabic->getDirection() === Hoa\String\String::RTL, // is right-to-left?
+ $japanese->getDirection() === Hoa\String\String::LTR // is left-to-right?
+);
+
+/**
+ * Will output:
+ * bool(true)
+ * bool(true)
+ * bool(true)
+ */</code></pre>
+ <p>The result of this method is computed thanks to the
+ <code>Hoa\String\String::getCharDirection</code> static method which computes
+ the direction of only one character.</p>
+ <p>If we would like to <strong>concatenate</strong> another string to the end
+ or to the beginning, we will respectively use the
+ <code>Hoa\String\String::append</code> and
+ <code>Hoa\String\String::prepend</code> methods. These methods, like most of
+ the ones which modifies the string, return the object itself, in order to
+ chain the calls. For instance:</p>
+ <pre><code class="language-php">echo $french->append('… et toi, m\'aimes-tu ?')->prepend('Mam\'zelle ! ');
+
+/**
+ * Will output:
+ * Mam'zelle ! Je t'aime… et toi, m'aimes-tu ?
+ */</code></pre>
+ <p>We also have the <code>Hoa\String\String::toLowerCase</code> and
+ <code>Hoa\String\String::toUpperCase</code> methods to, respectively, set the
+ case of the string to lower or upper. For instance:</p>
+ <pre><code class="language-php">echo $french->toUpperCase();
+
+/**
+ * Will output:
+ * MAM'ZELLE ! JE T'AIME… ET TOI, M'AIMES-TU ?
+ */</code></pre>
+ <p>We can also add characters to the beginning or to the end of the string to
+ reach a <strong>minimum</strong> length. This operation is frequently called
+ the <em>padding</em> (for historical reasons dating back to typewriters).
+ That's why we have the <code>Hoa\String\String::pad</code> method which takes
+ three arguments: the minimum length, characters to add and a constant
+ indicating whether we have to add at the end or at the beginning of the string
+ (respectively <code>Hoa\String\String::END</code>, by default, and
+ <code>Hoa\String\String::BEGINNING</code>).</p>
+ <pre><code class="language-php">echo $arabic->pad(20, ' ');
+
+/**
+ * Will output:
+ * أحبك
+ */</code></pre>
+ <p>A similar operation allows to remove, by default, <strong>spaces</strong>
+ at the beginning and at the end of the string thanks to the
+ <code>Hoa\String\String::trim</code> method. For example, to retreive our
+ original Arabic string:</p>
+ <pre><code class="language-php">echo $arabic->trim();
+
+/**
+ * Will output:
+ * أحبك
+ */</code></pre>
+ <p>If we would like to remove other characters, we can use its first argument
+ which must be a regular expression. Finally, its second argument allows to
+ specify from what side we would like to remove character: at the beginning, at
+ the end or both, still by using the <code>Hoa\String\String::BEGINNING</code>
+ and <code>Hoa\String\String::END</code> constants.</p>
+ <p>If we would like to remove other characters, we can use its first argument
+ which must be a regular expression. Finally, its second argument allows to
+ specify the side where to remove characters: at the beginning, at the end or
+ both, still by using the <code>Hoa\String\String::BEGINNING</code> and
+ <code>Hoa\String\String::END</code> constants. We can combine these constants
+ to express “both sides”, which is the default value:
+ <code class="language-php">Hoa\String\String::BEGINNING |
+ Hoa\String\String::END</code>. For example, to remove all the numbers and the
+ spaces only at the end, we will write:</p>
+ <pre><code class="language-php">$arabic->trim('\s|\d', Hoa\String\String::END);</code></pre>
+ <p>We can also <strong>reduce</strong> the string to a
+ <strong>sub-string</strong> by specifying the position of the first character
+ followed by the length of the sub-string to the
+ <code>Hoa\String\String::reduce</code> method:</p>
+ <pre><code class="language-php">echo $french->reduce(3, 6)->reduce(2, 4);
+
+/**
+ * Will output:
+ * aime
+ */</code></pre>
+ <p>If we would like to get a specific character, we can rely on the
+ <code>ArrayAccess</code> interface. For instance, to get the first character
+ of each of our examples (from their original definitions):</p>
+ <pre><code class="language-php">var_dump(
+ $french[0],
+ $arabic[0],
+ $japanese[0]
+);
+
+/**
+ * Will output:
+ * string(1) "J"
+ * string(2) "أ"
+ * string(3) "私"
+ */</code></pre>
+ <p>If we would like the last character, we will use the -1 index. The index is
+ not bounded to the length of the string. If the index exceeds this length,
+ then a <em>modulo</em> will be applied.</p>
+ <p>We can also modify or remove a specific character with this method. For
+ example:</p>
+ <pre><code class="language-php">$french->append(' ?');
+$french[-1] = '!';
+echo $french;
+
+/**
+ * Will output:
+ * Je t'aime !
+ */</code></pre>
+ <p>Another very useful method is the <strong>ASCII</strong> transformation.
+ Be careful, this is not always possible, according to your settings. For
+ example:</p>
+ <pre><code class="language-php">$title = new Hoa\String\String('Un été brûlant sur la côte');
+echo $title->toAscii();
+
+/**
+ * Will output:
+ * Un ete brulant sur la cote
+ */</code></pre>
+ <p>We can also transform from Arabic or Japanese to ASCII. Symbols, like
+ Mathemeticals symbols or emojis, are also transformed:</p>
+ <pre><code class="language-php">$emoji = new Hoa\String\String('I ❤ Unicode');
+$maths = new Hoa\String\String('∀ i ∈ ℕ');
+
+echo $arabic->toAscii(), "\n",
+ $japanese->toAscii(), "\n",
+ $emoji->toAscii(), "\n",
+ $maths->toAscii(), "\n";
+
+/**
+ * Will output:
+ * ahbk
+ * sihaanatawo aishite
+ * I (heavy black heart)️ Unicode
+ * (for all) i (element of) N
+ */</code></pre>
+ <p>In order this method to work correctly, the
+ <a href="http://php.net/intl"><code>intl</code></a> extension needs to be
+ present, so that the
+ <a href="http://php.net/transliterator"><code>Transliterator</code></a> class
+ is present. If it does not exist, the
+ <a href="http://php.net/normalizer"><code>Normalizer</code></a> class must
+ exist. If this class does not exist neither, the
+ <code>Hoa\String\String::toAscii</code> method can still try a transformation,
+ but it is less efficient. To activate this last solution, <code>true</code>
+ must be passed as a single argument. This <em lang="fr">tour de force</em> is
+ not recommended in most cases.</p>
+ <p>We also find the <code>getTransliterator</code> method which returns a
+ <code>Transliterator</code> object, or <code>null</code> if this class does
+ not exist. This method takes a transliteration identifier as argument. We
+ suggest to <a href="http://userguide.icu-project.org/transforms/general">read
+ the documentation about the transliterator of ICU</a> to understand this
+ identifier. The <code>transliterate</code> method allows to transliterate the
+ current string based on an identifier and a beginning index and an end
+ one. This method works the same way than the
+ <a href="http://php.net/transliterator.transliterate"><code>Transliterator::transliterate</code></a>
+ method.</p>
+ <p>More generally, to change the <strong>encoding</strong> format, we can use
+ the <code>Hoa\String\String::transcode</code> static method, with a string as
+ first argument, the original encoding format as second argument and the
+ expected encoding format as third argument (UTF-8 by default). The get the
+ list of encoding formats, we have to refer to the
+ <a href="http://php.net/iconv"><code>iconv</code></a> extension or to use the
+ following command line in a terminal:</p>
+ <pre><code class="language-php">$ iconv --list</code></pre>
+ <p>To know if a string is encoded in UTF-8, we can use the
+ <code>Hoa\String\String::isUtf8</code> static method; for instance:</p>
+ <pre><code class="language-php">var_dump(
+ Hoa\String\String::isUtf8('a'),
+ Hoa\String\String::isUtf8(Hoa\String\String::transcode('a', 'UTF-8', 'UTF-16'))
+);
+
+/**
+ * Will output:
+ * bool(true)
+ * bool(false)
+ */</code></pre>
+ <p>We can <strong>split</strong> the string into several sub-strings by using
+ the <code>Hoa\String\String::split</code> method. As first argument, we have a
+ regular expression (of kind <a href="http://pcre.org/">PCRE</a>), then an
+ integer representing the maximum number of elements to return and finally a
+ combination of constants. These constants are the same as the ones of
+ <a href="http://php.net/preg_split"><code>preg_split</code></a>.</p>
+ <p>By default, the second argument is set to -1, which means infinity, and the
+ last argument is set to <code>PREG_SPLIT_NO_EMPTY</code>. Thus, if we would
+ like to get all the words of a string, we will write:</p>
+ <pre><code class="language-php">print_r($title->split('#\b|\s#'));
+
+/**
+ * Will output:
+ * Array
+ * (
+ * [0] => Un
+ * [1] => ete
+ * [2] => brulant
+ * [3] => sur
+ * [4] => la
+ * [5] => cote
+ * )
+ */</code></pre>
+ <p>If we would like to <strong>iterate</strong> over all the
+ <strong>characters</strong>, it is recommended to use the
+ <code>IteratorAggregate</code> method, being the
+ <code>Hoa\String\String::getIterator</code> method. Let's see on the Arabic
+ example:</p>
+ <pre><code class="language-php">foreach($arabic as $letter)
+ echo $letter, "\n";
+
+/**
+ * Will output:
+ * أ
+ * ح
+ * ب
+ * ك
+ */</code></pre>
+ <p>We notice that the iteration is based on the text direction, it means that
+ the first element of the iteration is the first letter of the string starting
+ from the beginning.</p>
+ <p>Of course, if we would like to get an array of characters, we can use the
+ <a href="http://php.net/iterator_to_array"><code>iterator_to_array</code></a>
+ PHP function:</p>
+ <pre><code class="language-php">print_r(iterator_to_array($arabic));
+
+/**
+ * Will output:
+ * Array
+ * (
+ * [0] => أ
+ * [1] => ح
+ * [2] => ب
+ * [3] => ك
+ * )
+ */</code></pre>
+
+ <h3 id="Comparison_and_search" for="main-toc">Comparison and search</h3>
+
+ <p>Strings can also be <strong>compared</strong> thanks to the
+ <code>Hoa\String\String::compare</code> method:</p>
+ <pre><code class="language-php">$string = new Hoa\String\String('abc');
+var_dump(
+ $string->compare('wxyz')
+);
+
+/**
+ * Will output:
+ * string(-1)
+ */</code></pre>
+ <p>This methods returns -1 if the initial string comes before (in the
+ alphabetical order), 0 if it is identical and 1 if it comes after. If we
+ would like to use all the power of the underlying mechanism, we can call the
+ <code>Hoa\String\String::getCollator</code> static method (if the
+ <a href="http://php.net/Collator"><code>Collator</code></a> class exists, else
+ <code>Hoa\String\String::compare</code> will use a simple byte to bytes
+ comparison without taking care of the other parameters). Thus, if we would
+ like to sort an array of strings, we will write:</p>
+ <pre><code class="language-php">$strings = array('c', 'Σ', 'd', 'x', 'α', 'a');
+Hoa\String\String::getCollator()->sort($strings);
+print_r($strings);
+
+/**
+ * Could output:
+ * Array
+ * (
+ * [0] => a
+ * [1] => c
+ * [2] => d
+ * [3] => x
+ * [4] => α
+ * [5] => Σ
+ * )
+ */</code></pre>
+ <p>Comparison between two strings depends on the <strong>locale</strong>, it
+ means of the localization of the system, like the language, the country, the
+ region etc. We can use the
+ <a href="@hack:chapter=Locale"><code>Hoa\Locale</code> library</a> to modify
+ these data, but it's not a dependence of <code>Hoa\String</code>.</p>
+ <p>We can also know if a string <strong>matches</strong> a certain pattern,
+ still expressed with a regular expression. To achieve that, we will use the
+ <code>Hoa\String\String::match</code> method. This method relies on the
+ <a href="http://php.net/preg_match"><code>preg_match</code></a> and
+ <a href="http://php.net/preg_match_all"><code>preg_match_all</code></a> PHP
+ functions, but by modifying the pattern's options to ensure the Unicode
+ support. We have the following parameters: the pattern, a variable passed by
+ reference to collect the matches, flags, an offset and finally a boolean
+ indicating whether the search is global or not (respectively if we have to use
+ <code>preg_match_all</code> or <code>preg_match</code>). By default, the
+ search is not global.</p>
+ <p>Thus, we will check that our French example contains <code>aime</code> with
+ a direct object complement:</p>
+ <pre><code class="language-php">$french->match('#(?:(?&amp;lt;direct_object>\w)[\'\b])aime#', $matches);
+var_dump($matches['direct_object']);
+
+/**
+ * Will output:
+ * string(1) "t"
+ */</code></pre>
+ <p>This method returns <code>false</code> if an error is raised (for example
+ if the pattern is not correct), 0 if no match has been found, the number of
+ matches else.</p>
+ <p>Similarly, we can <strong>search</strong> and <strong>replace</strong>
+ sub-strings by other sub-strings based on a pattern, still expressed with a
+ regular expression. To achieve that, we will use the
+ <code>Hoa\String\String::replace</code> method. This method uses the
+ <a href="http://php.net/preg_replace"><code>preg_replace</code></a> and
+ <a href="http://php.net/preg_replace_callback"><code>preg_replace_callback</code></a>
+ PHP functions, but still by modifying the pattern's options to ensure the
+ Unicode support. As first argument, we find one or more patterns, as second
+ argument, one or more replacements and as last argument the limit of
+ replacements to apply. If the replacement is a callable, then the
+ <code>preg_replace_callback</code> function will be used.</p>
+ <p>Thus, we will modify our French example to be more polite:</p>
+ <pre><code class="language-php">$french->replace('#(?:\w[\'\b])(?&amp;lt;verb>aime)#', function ( $matches ) {
+
+ return 'vous ' . $matches['verb'];
+});
+
+echo $french;
+
+/**
+ * Will output:
+ * Je vous aime
+ */</code></pre>
+ <p>The <code>Hoa\String\String</code> class provides constants which are
+ aliases of existing PHP constants and ensure a better readability of the
+ code:</p>
+ <ul>
+ <li><code>Hoa\String\String::WITHOUT_EMPTY</code>, alias of
+ <code>PREG_SPLIT_NO_EMPTY</code>,</li>
+ <li><code>Hoa\String\String::WITH_DELIMITERS</code>, alias of
+ <code>PREG_SPLIT_DELIM_CAPTURE</code>,</li>
+ <li><code>Hoa\String\String::WITH_OFFSET</code>, alias of
+ <code>PREG_OFFSET_CAPTURE</code> and
+ <code>PREG_SPLIT_OFFSET_CAPTURE</code>,</li>
+ <li><code>Hoa\String\String::GROUP_BY_PATTERN</code>, alias of
+ <code>PREG_PATTERN_ORDER</code>,</li>
+ <li><code>Hoa\String\String::GROUP_BY_TUPLE</code>, alias of
+ <code>PREG_SET_ORDER</code>.</li>
+ </ul>
+ <p>Because they are strict aliases, we can write:</p>
+ <pre><code class="language-php">$string = new Hoa\String\String('abc1 defg2 hikl3 xyz4');
+$string->match(
+ '#(\w+)(\d)#',
+ $matches,
+ Hoa\String\String::WITH_OFFSET
+ | Hoa\String\String::GROUP_BY_TUPLE,
+ 0,
+ true
+);</code></pre>
+
+ <h3 id="Characters" for="main-toc">Characters</h3>
+
+ <p>The <code>Hoa\String\String</code> class offers static methods working on a
+ single Unicode character. We have already mentionned the
+ <code>getCharDirection</code> method which allows to know the
+ <strong>direction</strong> of a character. We also have the
+ <code>getCharWidth</code> which counts the <strong>number of columns</strong>
+ necessary to print a single character. Thus:</p>
+ <pre><code class="language-php">var_dump(
+ Hoa\String\String::getCharWidth(Hoa\String\String::fromCode(0x7f)),
+ Hoa\String\String::getCharWidth('a'),
+ Hoa\String\String::getCharWidth('㽠')
+);
+
+/**
+ * Will output:
+ * int(-1)
+ * int(1)
+ * int(2)
+ */</code></pre>
+ <p>This method returns -1 or 0 if the character is not
+ <strong>printable</strong> (for instance, if this is a control character, like
+ <code>0x7f</code> which corresponds to <code>DELETE</code>), 1 or more if this
+ is a character that can be printed. In our example, <code>㽠</code> requires
+ 2 columns to be printed.</p>
+ <p>To get more semantics, we have the
+ <code>Hoa\String\String::isCharPrintable</code> method which allows to know
+ whether a character is printable or not.</p>
+ <p>If we would like to count the number of columns necessary for a whole
+ string, we have to use the <code>Hoa\String\String::getWidth</code> method.
+ Thus:</p>
+ <pre><code class="language-php">var_dump(
+ $french->getWidth(),
+ $arabic->getWidth(),
+ $japanese->getWidth()
+);
+
+/**
+ * Will output:
+ * int(9)
+ * int(4)
+ * int(18)
+ */</code></pre>
+ <p>Try this in your terminal with a <strong>monospaced</strong> font. You will
+ observe that Japanese requires 18 columns to be printed. This measure is very
+ useful if we would like to know the length of a string to position it
+ efficiently.</p>
+ <p>The <code>getCharWidth</code> method is different of <code>getWidth</code>
+ because it includes control characters. This method is intended to be used,
+ for example, with terminals (please, see the
+ <a href="@hack:chapter=Console"><code>Hoa\Console</code> library</a>).</p>
+ <p>Finally, if this time we are not interested by Unicode characters but
+ rather by <strong>machine</strong> characters <code>char</code> (being
+ 1 byte), we have an extra operation. The
+ <code>Hoa\String\String::getBytesLength</code> method will count the
+ <strong>length</strong> of the string in bytes:</p>
+ <pre><code class="language-php">var_dump(
+ $arabic->getBytesLength(),
+ $japanese->getBytesLength()
+);
+
+/**
+ * Will output:
+ * int(8)
+ * int(27)
+ */</code></pre>
+ <p>If we compare these results with the ones of the
+ <code>Hoa\String\String::count</code> method, we understand that the Arabic
+ characters are encoded with 2 bytes whereas Japanese characteres are encoded
+ with 3 bytes. We can also get a specific byte thanks to the
+ <code>Hoa\String\String::getByteAt</code> method. Once again, the index is not
+ bounded.</p>
+
+ <h3 id="Code-point" for="main-toc">Code-point</h3>
+
+ <p>Each character is represented by an integer, called a
+ <strong>code-point</strong>. To get the code-point of a character, we can
+ use the <code>Hoa\String\String::toCode</code> static method, and to get a
+ character based on its code-point, we can use the
+ <code>Hoa\String\String::fromCode</code> static method. We also have the
+ <code>Hoa\String\String::toBinaryCode</code> method which returns the binary
+ representation of a character. Let's take an example:</p>
+ <pre><code class="language-php">var_dump(
+ Hoa\String\String::toCode('Σ'),
+ Hoa\String\String::toBinaryCode('Σ'),
+ Hoa\String\String::fromCode(0x1a9)
+);
+
+/**
+ * Will output:
+ * int(931)
+ * string(32) "1100111010100011"
+ * string(2) "Σ"
+ */</code></pre>
+
+ <h2 id="Search_algorithms" for="main-toc">Search algorithms</h2>
+
+ <p>The <code>Hoa\String</code> library provides sophisticated
+ <strong>search</strong> algorithms on strings through the
+ <code>Hoa\String\Search</code> class.</p>
+ <p>We will study the <code>Hoa\String\Search::approximated</code> algorithm
+ which searches a sub-string in a string up to <strong><em>k</em>
+ differences</strong> (a difference is an addition, a deletion or a
+ modification). Let's take the classical example of a DNA representation: We
+ will search all the sub-strings approximating <code>GATAA</code> with
+ 1 difference (maximum) in <code>CAGATAAGAGAA</code>. So, we will write:</p>
+ <pre><code class="language-php">$x = 'GATAA';
+$y = 'CAGATAAGAGAA';
+$k = 1;
+$search = Hoa\String\Search::approximated($y, $x, $k);
+$n = count($search);
+
+echo 'Try to match ', $x, ' in ', $y, ' with at most ', $k, ' difference(s):', "\n";
+echo $n, ' match(es) found:', "\n";
+
+foreach($search as $position)
+ echo ' • ', substr($y, $position['i'], $position['l'), "\n";
+
+/**
+ * Will output:
+ * Try to match GATAA in CAGATAAGAGAA with at most 1 difference(s):
+ * 4 match(es) found:
+ * • AGATA
+ * • GATAA
+ * • ATAAG
+ * • GAGAA
+ */</code></pre>
+ <p>This methods returns an array of arrays. Each sub-array represents a result
+ and contains three indexes: <code>i</code> for the position of the first
+ character (byte) of the result, <code>j</code> for the position of the last
+ character and <code>l</code> for the length of the result (simply
+ <code>j</code> - <code>i</code>). Thus, we can compute the results by using
+ our initial string (here <code class="language-php">$y</code>) and its
+ indexes.</p>
+ <p>With our example, we have four results. The first is <code>AGATA</code>,
+ being <code>GATA<em>A</em></code> with one moved character, and
+ <code>AGATA</code> exists in <code>C<em>AGATA</em>AGAGAA</code>. The second
+ result is <code>GATAA</code>, our sub-string, which well and truly exists in
+ <code>CA<em>GATAA</em>GAGAA</code>. The third result is <code>ATAAG</code>,
+ being <code><em>G</em>ATAA</code> with one moved character, and
+ <code>ATAAG</code> exists in <code>CAG<em>ATAAG</em>AGAA</code>. Finally, the
+ last result is <code>GAGAA</code>, being <code>GA<em>T</em>AA</code> with one
+ modified character, and <code>GAGAA</code> exists in
+ <code>CAGATAA<em>GAGAA</em></code>.</p>
+ <p>Another example, more concrete this time. We will consider the
+ <code>--testIt --foobar --testThat --testAt</code> string (which represents
+ possible options of a command line), and we will search <code>--testot</code>,
+ an option that should have been given by the user. This option does not exist
+ as it is. We will then use our search algorithm with at most 1 difference.
+ Let's see:</p>
+ <pre><code class="language-php">$x = 'testot';
+$y = '--testIt --foobar --testThat --testAt';
+$k = 1;
+$search = Hoa\String\Search::approximated($y, $x, $k);
+$n = count($search);
+
+// …
+
+/**
+ * Will output:
+ * Try to match testot in --testIt --foobar --testThat --testAt with at most 1 difference(s)
+ * 2 match(es) found:
+ * • testIt
+ * • testAt
+ */</code></pre>
+ <p>The <code>testIt</code> and <code>testAt</code> results are true options,
+ so we can suggest them to the user. This is a mechanism user by
+ <code>Hoa\Console</code> to suggest corrections to the user in case of a
+ mistyping.</p>
+
+ <h2 id="Conclusion" for="main-toc">Conclusion</h2>
+
+ <p>The <code>Hoa\String</code> library provides facilities to manipulate
+ strings encoded with the Unicode format, but also to make sophisticated search
+ on strings.</p>
</yield>
</overlay>
diff --git a/Documentation/Fr/Index.xyl b/Documentation/Fr/Index.xyl
index 2632e03..ca146c5 100644
--- a/Documentation/Fr/Index.xyl
+++ b/Documentation/Fr/Index.xyl
@@ -8,7 +8,7 @@
La bibliothèque <code>Hoa\String</code> propose plusieurs opérations sur des
chaînes de caractères UTF-8.</p>
- <h2 id="Table_des_matieres">Table des matières</h2>
+ <h2 id="Table_of_contents">Table des matières</h2>
<tableofcontents id="main-toc" />
@@ -35,8 +35,7 @@
des opérations de <strong>recherche</strong> sur des chaînes de
caractères.</p>
- <h2 id="Chaine_de_caracteres_Unicode" for="main-toc">Chaîne de caractères
- Unicode</h2>
+ <h2 id="Unicode_strings" for="main-toc">Chaîne de caractères Unicode</h2>
<p>La classe <code>Hoa\String\String</code> représente une chaîne de
caractères Unicode <strong>UTF-8</strong> et permet de la manipuler
@@ -51,8 +50,7 @@ $arabic = new Hoa\String\String('أحبك');
$japanese = new Hoa\String\String('私はあなたを愛して');</code></pre>
<p>Maintenant, voyons les opérations possibles sur ces trois chaînes.</p>
- <h3 id="Manipulation_de_la_chaine" for="main-toc">Manipulation de la
- chaîne</h3>
+ <h3 id="String_manipulation" for="main-toc">Manipulation de la chaîne</h3>
<p>Commençons par les opérations <strong>élémentaires</strong>. Si nous
voulons <strong>compter</strong> le nombre de caractères (et non pas
@@ -208,8 +206,8 @@ echo $title->toAscii();
* Un ete brulant sur la cote
*/</code></pre>
<p>Nous pouvons aussi transformer de l'arabe ou du japonais vers de l'ASCII.
- Les symboles, comme les symboles Mathématique ou les emojis, sont aussi
- transformer :</p>
+ Les symboles, comme les symboles Mathématiques ou les emojis, sont aussi
+ transformés :</p>
<pre><code class="language-php">$emoji = new Hoa\String\String('I ❤ Unicode');
$maths = new Hoa\String\String('∀ i ∈ ℕ');
@@ -225,7 +223,6 @@ echo $arabic->toAscii(), "\n",
* I (heavy black heart)️ Unicode
* (for all) i (element of) N
*/</code></pre>
-
<p>Pour que cette méthode fonctionne correctement, il faut que l'extension
<a href="http://php.net/intl"><code>intl</code></a> soit présente, pour que la
classe <a href="http://php.net/transliterator"><code>Transliterator</code></a>
@@ -327,7 +324,7 @@ echo $arabic->toAscii(), "\n",
* )
*/</code></pre>
- <h3 id="Comparaison_et_recherche" for="main-toc">Comparaison et recherche</h3>
+ <h3 id="Comparison_and_search" for="main-toc">Comparaison et recherche</h3>
<p>Les chaînes peuvent également être <strong>comparées</strong> entre elles
grâce à la méthode <code>Hoa\String\String::compare</code> :</p>
@@ -341,12 +338,12 @@ var_dump(
* string(-1)
*/</code></pre>
<p>Cette méthode retourne -1 si la chaîne initiale vient avant (par ordre
- alphabétique), 0 si elle est identique, et 1 si elle vient après. Si nous
+ alphabétique), 0 si elle est identique et 1 si elle vient après. Si nous
voulons utiliser la pleine
puissance du mécanisme sous-jacent, nous pouvons appeler la méthode statique
<code>Hoa\String\String::getCollator</code> (si la classe
- <a href="http://php.net/Collator"><code>Collator</code></a> existe, auquel
- cas, <code>Hoa\String\String::compare</code> utilisera une comparaison simple
+ <a href="http://php.net/Collator"><code>Collator</code></a> existe, sinon
+ <code>Hoa\String\String::compare</code> utilisera une comparaison simple
octet par octets sans tenir compte d'autres paramètres). Ainsi, si nous
voulons trier un tableau de chaînes, nous écrirons plutôt :</p>
<pre><code class="language-php">$strings = array('c', 'Σ', 'd', 'x', 'α', 'a');
@@ -449,7 +446,7 @@ $string->match(
<h3 id="Characters" for="main-toc">Caractères</h3>
<p>La classe <code>Hoa\String\String</code> offre des méthodes statiques
- travaillant sur un seul caractère Unicode. Nous avons déjà évoquer la méthode
+ travaillant sur un seul caractère Unicode. Nous avons déjà évoqué la méthode
<code>getCharDirection</code> qui permet de connaître la
<strong>direction</strong> d'un caractère. Nous trouvons aussi
<code>getCharWidth</code> qui calcule le <strong>nombre de colonnes</strong>
@@ -492,7 +489,7 @@ $string->match(
<p>Essayez dans un terminal avec une police <strong>mono-espacée</strong>.
Vous verrez que le japonais demande 18 colonnes pour s'afficher. Cette mesure
est très utile si nous voulons connaître la largeur d'une chaîne pour la
- positionner efficacement.</p>
+ positionner correctement.</p>
<p>La méthode <code>getCharWidth</code> est différente de
<code>getWidth</code> car elle prend en compte des caractères de contrôles.
Elle est destinée à être utilisée, par exemple, avec des terminaux (voir
@@ -542,7 +539,7 @@ $string->match(
* string(2) "Σ"
*/</code></pre>
- <h2 id="Algorithmes_de_recherche" for="main-toc">Algorithmes de recherche</h2>
+ <h2 id="Search_algorithms" for="main-toc">Algorithmes de recherche</h2>
<p>La bibliothèque <code>Hoa\String</code> propose des algorithmes de
<strong>recherches</strong> sophistiquées sur les chaînes de caractères à
@@ -624,9 +621,9 @@ $n = count($search);
<h2 id="Conclusion" for="main-toc">Conclusion</h2>
- <p>La classe <code>Hoa\String</code> propose des facilités pour manipuler des
- chaînes encodées au format Unicode, mais aussi pour effectuer des recherches
- sophistiquées sur des chaînes.</p>
+ <p>La bibliothèque <code>Hoa\String</code> propose des facilités pour
+ manipuler des chaînes encodées au format Unicode, mais aussi pour effectuer
+ des recherches sophistiquées sur des chaînes.</p>
</yield>
</overlay>