Channel: drupal.org - Site administrators

↧

Tokenizer settings for the Arabic language.

June 4, 2013, 3:38 am

≫ Next: Documentation of Law theme.

≪ Previous: Documentation for daycare theme

Tokenizer defaults are not friendly with non English strings. If you enable tokenizer on content with arabic strings, you may risk losing the entire arabic letters and having an english only search index.

To resolve this issue, The following values can to be useful to tokenize Arabic long strings:

Whitespace Characters:

[\p{P}|\p{C}|\p{Z}|\p{S}]

Ignorable Characters:

[\p{M}|ـ]

These settings worked well in mixed Arabic-English content, as well as in Arabic only or English only content.

The only problem found so far when using these settings is that the decimal point gets stripped, so a string like "1234.4567" would become "1234 4567"

These settings use Unicode Character Properties.

↧

Trending Articles

Moondru Mudichu 20-07-2016 – Polimer tv Serial

July 20, 2016, 9:25 am

Black Angus Grilled Artichokes

July 16, 2016, 4:37 pm

Skeng & Tommy Lee Sparta – Disappear Season (feat. Nicki Minaj) – Single...

March 10, 2025, 12:17 pm

Ricky Martin – Vente Pa’ Ca (feat. A-Lin) – iTunes Plus AAC M4A – Single

February 21, 2017, 12:17 am

New Malayalam kambi Audio Talk Sussiyude cycle paditham

August 13, 2014, 8:45 am

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

December 17, 2013, 6:12 pm

Conman who lived a life of luxury is jailed

August 1, 2014, 11:10 pm

Former bodybuilder found guilty of attacking ex-partner

July 29, 2013, 6:15 pm

that noob abjure/glariel nv post any thread tonight?

April 23, 2019, 8:12 am

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

August 20, 2016, 5:13 pm

Thread: Zombicide:: General:: Combining 1st & 2nd into one plsysblr game

June 29, 2021, 7:37 pm

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

May 17, 2020, 2:04 pm

TWRP on S20+ SM - G985F (Exynos) ? One ui 5.1 / android 13

December 11, 2023, 5:12 pm

Reading test, level 2

January 24, 2013, 7:31 am

Adolescence Paragraph

November 23, 2019, 7:44 am

My Sisters Plan For Me To Smell Her Feet (Fiction): Part 1,2,3 and 4!!!

October 26, 2014, 4:39 am

Boyz Gone Wild UPDATE! Find Out Why Pastor Emmanuel Richardson Will Never Be...

February 12, 2016, 12:12 pm

UDA's 'Big Bill' does a runner to Spain as feud looms

September 2, 2014, 11:54 pm

22-06-2015 – Moondru Mudichu Serial

June 21, 2015, 6:54 am

Landlord faces up to indecent assault

March 8, 2016, 2:00 am

© 2025 //www.rssing.com