aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorIvan Enderlin <ivan.enderlin@hoa-project.net>2015-01-08 18:15:49 +0100
committerIvan Enderlin <ivan.enderlin@hoa-project.net>2015-01-09 21:28:14 +0100
commit6d7fbc73e4db46bf28c19bf00d032cb13c469ace (patch)
tree091f0baab8acfb4223b258a88d14b4b1aa358c31 /Documentation
parent16860c720979e169d90301d964c12aa890ee2a50 (diff)
downloadUstring-6d7fbc73e4db46bf28c19bf00d032cb13c469ace.zip
Ustring-6d7fbc73e4db46bf28c19bf00d032cb13c469ace.tar.gz
Ustring-6d7fbc73e4db46bf28c19bf00d032cb13c469ace.tar.bz2
Translate #String_manipulation.
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/En/Index.xyl282
1 files changed, 282 insertions, 0 deletions
diff --git a/Documentation/En/Index.xyl b/Documentation/En/Index.xyl
index 2cc7090..4cef07e 100644
--- a/Documentation/En/Index.xyl
+++ b/Documentation/En/Index.xyl
@@ -32,5 +32,287 @@
also provides some evoluated algorithms to perform <strong>search</strong>
operations on strings.</p>
+ <h2 id="Unicode_strings" for="main-toc">Unicode strings</h2>
+
+ <p>The <code>Hoa\String\String</code> class represents a
+ <strong>UTF-8</strong> Unicode strings and allows to manipulate it easily.
+ This class implements the
+ <a href="http://php.net/arrayaccess"><code>ArrayAccess</code></a>,
+ <a href="http://php.net/countable"><code>Countable</code></a> and
+ <a href="http://php.net/iteratoraggregate"><code>IteratorAggregate</code></a>
+ interfaces. We are going to use three examples in three different languages:
+ French, Arab and Japanese. Thus:</p>
+ <pre><code class="language-php">$french = new Hoa\String\String('Je t\'aime');
+$arabic = new Hoa\String\String('أحبك');
+$japanese = new Hoa\String\String('私はあなたを愛して');</code></pre>
+ <p>Now, let's see what we can do on these three strings.</p>
+
+ <h3 id="String_manipulation" for="main-toc">String manipulation</h3>
+
+ <p>Let's start with <strong>elementary</strong> operations. If we would like
+ to <strong>count</strong> the number of characters (not bytes), we will use
+ the <a href="http://php.net/count"><code>count</code> function</a>. Thus:</p>
+ <pre><code class="language-php">var_dump(
+ count($french),
+ count($arabic),
+ count($japanese)
+);
+
+/**
+ * Will output:
+ * int(9)
+ * int(4)
+ * int(9)
+ */</code></pre>
+ <p>When we speak about text position, it is not suitable to speak about the
+ right or the left, but rather about a <strong>beginning</strong> or an
+ <strong>end</strong>, and based on the <strong>direction</strong> of writing.
+ We can know this direction thanks to the
+ <code>Hoa\String\String::getDirection</code> method. It returns the value of
+ one of the following constants:</p>
+ <ul>
+ <li><code>Hoa\String\String::LTR</code>, for left-to-right, if the text is
+ written from the left to the right,</li>
+ <li><code>Hoa\String\String::RTL</code>, for right-to-left, if the text is
+ written from the right to the left.</li>
+ </ul>
+ <p>Let's observe the result with our examples:</p>
+ <pre><code class="language-php">var_dump(
+ $french->getDirection() === Hoa\String\String::LTR, // is left-to-right?
+ $arabic->getDirection() === Hoa\String\String::RTL, // is right-to-left?
+ $japanese->getDirection() === Hoa\String\String::LTR // is left-to-right?
+);
+
+/**
+ * Will output:
+ * bool(true)
+ * bool(true)
+ * bool(true)
+ */</code></pre>
+ <p>The result of this method is computed thanks to the
+ <code>Hoa\String\String::getCharDirection</code> static method which computes
+ the direction of only one character.</p>
+ <p>If we would like to <strong>concatenate</strong> another string to the end
+ or to the beginning, we will respectively use the
+ <code>Hoa\String\String::append</code> and
+ <code>Hoa\String\String::prepend</code> methods. These methods, like most of
+ the ones which modifies the string, return the object itself, in order to
+ chain the calls. For instance:</p>
+ <pre><code class="language-php">echo $french->append('… et toi, m\'aimes-tu ?')->prepend('Mam\'zelle ! ');
+
+/**
+ * Will output:
+ * Mam'zelle ! Je t'aime… et toi, m'aimes-tu ?
+ */</code></pre>
+ <p>We also have the <code>Hoa\String\String::toLowerCase</code> and
+ <code>Hoa\String\String::toUpperCase</code> methods to, respectively, set the
+ case of the string to lower or upper. For instance:</p>
+ <pre><code class="language-php">echo $french->toUpperCase();
+
+/**
+ * Will output:
+ * MAM'ZELLE ! JE T'AIME… ET TOI, M'AIMES-TU ?
+ */</code></pre>
+ <p>We can also add characters to the beginning or to the end of the string to
+ reach a <strong>minimum</strong> length. This operation is frequently called
+ the <em>padding</em> (for historical reasons dating back to typewriters).
+ That's why we have the <code>Hoa\String\String::pad</code> method which takes
+ three arguments: the minimum length, characters to add and a constant
+ indicating whether we have to add at the end or at the beginning of the string
+ (respectively <code>Hoa\String\String::END</code>, by default, and
+ <code>Hoa\String\String::BEGINNING</code>).</p>
+ <pre><code class="language-php">echo $arabic->pad(20, ' ');
+
+/**
+ * Will output:
+ * أحبك
+ */</code></pre>
+ <p>A similar operation allows to remove, by default, <strong>spaces</strong>
+ at the beginning and at the end of the string thanks to the
+ <code>Hoa\String\String::trim</code> method. For example, to retreive our
+ original Arabic string:</p>
+ <pre><code class="language-php">echo $arabic->trim();
+
+/**
+ * Will output:
+ * أحبك
+ */</code></pre>
+ <p>If we would like to remove other characters, we can use its first argument
+ which must be a regular expression. Finally, its second argument allows to
+ specify from what side we would like to remove character: at the beginning, at
+ the end or both, still by using the <code>Hoa\String\String::BEGINNING</code>
+ and <code>Hoa\String\String::END</code> constants.</p>
+ <p>If we would like to remove other characters, we can use its first argument
+ which must be an regular expression. Finally, its second argument allows to
+ specify the side where to remove characters: at the beginning, at the end or
+ both, still by using the <code>Hoa\String\String::BEGINNING</code> and
+ <code>Hoa\String\String::END</code> constants. We can combine these constants
+ to express “both sides”, which is the default value:
+ <code class="language-php">Hoa\String\String::BEGINNING |
+ Hoa\String\String::END</code>. For example, to remove all the numbers and the
+ spaces only at the end, we will write:</p>
+ <pre><code class="language-php">$arabic->trim('\s|\d', Hoa\String\String::END);</code></pre>
+ <p>We can also <strong>reduce</strong> the string to a
+ <strong>sub-string</strong> by specifying the position of the first character
+ followed by the length of the sub-string to the
+ <code>Hoa\String\String::reduce</code> method:</p>
+ <pre><code class="language-php">echo $french->reduce(3, 6)->reduce(2, 4);
+
+/**
+ * Will output:
+ * aime
+ */</code></pre>
+ <p>If we would like to get a specific character, we can rely on the
+ <code>ArrayAccess</code> interface. For instance, to get the first character
+ of each of our examples (from their original definitions):</p>
+ <pre><code class="language-php">var_dump(
+ $french[0],
+ $arabic[0],
+ $japanese[0]
+);
+
+/**
+ * Will output:
+ * string(1) "J"
+ * string(2) "أ"
+ * string(3) "私"
+ */</code></pre>
+ <p>If we would like the last character, we will use the -1 index. The index is
+ not bounded to the length of the string. If the index exceeds this length,
+ then a <em>modulo</em> will be applied.</p>
+ <p>We can also modify or remove a specific character with this method. For
+ example:</p>
+ <pre><code class="language-php">$french->append(' ?');
+$french[-1] = '!';
+echo $french;
+
+/**
+ * Will output:
+ * Je t'aime !
+ */</code></pre>
+ <p>Another very useful method is the <strong>ASCII</strong> transformation.
+ Be careful, this is not always possible, according to your settings. For
+ example:</p>
+ <pre><code class="language-php">$title = new Hoa\String\String('Un été brûlant sur la côte');
+echo $title->toAscii();
+
+/**
+ * Will output:
+ * Un ete brulant sur la cote
+ */</code></pre>
+ <p>We can also transform from Arabic or Japanese to ASCII. Symbols, like
+ Mathemetical symbols or emojis, are also transformed:</p>
+ <pre><code class="language-php">$emoji = new Hoa\String\String('I ❤ Unicode');
+$maths = new Hoa\String\String('∀ i ∈ ℕ');
+
+echo $arabic->toAscii(), "\n",
+ $japanese->toAscii(), "\n",
+ $emoji->toAscii(), "\n",
+ $maths->toAscii(), "\n";
+
+/**
+ * Will output:
+ * ahbk
+ * sihaanatawo aishite
+ * I (heavy black heart)️ Unicode
+ * (for all) i (element of) N
+ */</code></pre>
+ <p>In order this method to work correctly, the
+ <a href="http://php.net/intl"><code>intl</code></a> extension needs to be
+ present, so that the
+ <a href="http://php.net/transliterator"><code>Transliterator</code></a> class
+ is present. If it does not exist, the
+ <a href="http://php.net/normalizer"><code>Normalizer</code></a> class must
+ exist. If this class does not exist neither, the
+ <code>Hoa\String\String::toAscii</code> method can still try a transformation,
+ but it is less efficient. To activate this last solution, <code>true</code>
+ must be passed as a single argument. This <em lang="fr">tour de force</em> is
+ not recommended in most cases.</p>
+ <p>We also find the <code>getTransliterator</code> method which returns a
+ <code>Transliterator</code> object, or <code>null</code> if this class does
+ not exist. This method takes a transliteration identifier as argument. We
+ suggest to <a href="http://userguide.icu-project.org/transforms/general">read
+ the documentation about the transliterator of ICU</a> to understand this
+ identifier. The <code>transliterate</code> method allows to transliterate the
+ current string based on an identifier and a beginning index and an end
+ one. This method works the same way than the
+ <a href="http://php.net/transliterator.transliterate"><code>Transliterator::transliterate</code></a>
+ method.</p>
+ <p>More generally, to change the <strong>encoding</strong> format, we can use
+ the <code>Hoa\String\String::transcode</code> static method, with a string as
+ first argument, the original encoding format as second argument and the
+ expected encoding format as third argument (UTF-8 by default). The get the
+ list of encoding formats, we have to refer to the
+ <a href="http://php.net/iconv"><code>iconv</code></a> extension or to use the
+ following command line in a terminal:</p>
+ <pre><code class="language-php">$ iconv --list</code></pre>
+ <p>To know if a string is encoded in UTF-8, we can use the
+ <code>Hoa\String\String::isUtf8</code> static method; for instance:</p>
+ <pre><code class="language-php">var_dump(
+ Hoa\String\String::isUtf8('a'),
+ Hoa\String\String::isUtf8(Hoa\String\String::transcode('a', 'UTF-8', 'UTF-16'))
+);
+
+/**
+ * Will output:
+ * bool(true)
+ * bool(false)
+ */</code></pre>
+ <p>We can <strong>split</strong> the string into several sub-strings by using
+ the <code>Hoa\String\String::split</code> method. As first argument, we have a
+ regular expression (of kind <a href="http://pcre.org/">PCRE</a>), then an
+ integer representing the maximum number of elements to return and finally a
+ combination of constants. These constants are the same as the ones of
+ <a href="http://php.net/preg_split"><code>preg_split</code></a>.</p>
+ <p>By default, the second argument is set to -1, which means infinity, and the
+ last argument is set to <code>PREG_SPLIT_NO_EMPTY</code>. Thus, if we would
+ like to get all the words of a string, we will write:</p>
+ <pre><code class="language-php">print_r($title->split('#\b|\s#'));
+
+/**
+ * Will output:
+ * Array
+ * (
+ * [0] => Un
+ * [1] => ete
+ * [2] => brulant
+ * [3] => sur
+ * [4] => la
+ * [5] => cote
+ * )
+ */</code></pre>
+ <p>If we would like to <strong>iterate</strong> over all the
+ <strong>characters</strong>, it is recommended to use the
+ <code>IteratorAggregate</code> method, being the
+ <code>Hoa\String\String::getIterator</code> method. Let's see on the Arabic
+ example:</p>
+ <pre><code class="language-php">foreach($arabic as $letter)
+ echo $letter, "\n";
+
+/**
+ * Will output:
+ * أ
+ * ح
+ * ب
+ * ك
+ */</code></pre>
+ <p>We notice that the iteration is based on the text direction, it means that
+ the first element of the iteration is the first letter of the string starting
+ from the beginning.</p>
+ <p>Of course, if we would like to get an array of characters, we can use the
+ <a href="http://php.net/iterator_to_array"><code>iterator_to_array</code></a>
+ PHP function:</p>
+ <pre><code class="language-php">print_r(iterator_to_array($arabic));
+
+/**
+ * Will output:
+ * Array
+ * (
+ * [0] => أ
+ * [1] => ح
+ * [2] => ب
+ * [3] => ك
+ * )
+ */</code></pre>
</yield>
</overlay>