11/19/2014

esProc Helps with Computation in MongoDB – Sorting in Local Language

MongoDB uses unicode, instead of the coding for a certain local language, to sort data in this language (i.e. Chinese). Together with esProc, MongoDB can realize sorting in local language conveniently (i.e. sort Chinese according to Chinese phonetic alphabet). The following will teach you the method in detail by taking Chinese as an example.

person - a set in MongoDB - stores names and genders as follows:
> db.person.find()
{ "_id" : ObjectId("544e4e070f03ad39eb2bf498"), "name" : "宋江", "gender" : ""}
{ "_id" : ObjectId("544e4e070f03ad39eb2bf499"), "name" : "李逵", "gender" : ""}
{ "_id" : ObjectId("544e4e070f03ad39eb2bf49a"), "name" : "吴用", "gender" : ""}
{ "_id" : ObjectId("544e4e070f03ad39eb2bf49b"), "name" : "晁盖", "gender" : ""}
{ "_id" : ObjectId("544e4e070f03ad39eb2bf49c"), "name" : "公孙胜", "gender" : "" }
{ "_id" : ObjectId("544e4e070f03ad39eb2bf49d"), "name" : "鲁智深", "gender" : "" }
{ "_id" : ObjectId("544e4e070f03ad39eb2bf49e"), "name" : "武松", "gender" : ""}
{ "_id" : ObjectId("544e4e070f03ad39eb2bf49f"), "name" : "阮小二", "gender" : "" }
{ "_id" : ObjectId("544e4e070f03ad39eb2bf4a0"), "name" : "杨志", "gender" : ""}
{ "_id" : ObjectId("544e4e070f03ad39eb2bf4a1"), "name" : "孙二娘", "gender" : "" }
{ "_id" : ObjectId("544e4e070f03ad39eb2bf4a2"), "name" : "扈三娘", "gender" : "" }
{ "_id" : ObjectId("544e4e080f03ad39eb2bf4a3"), "name" : "燕青", "gender" : ""}
Sort the data using MongoDB's sort function rather than the Chinese phonetic alphabet:
> db.person.find({},{"name":1,"gender":1,"_id":0}).sort({"name":1})
{ "name" : "公孙胜", "gender" : "" }
{ "name" : "吴用", "gender" : "" }
{ "name" : "孙二娘", "gender" : "" }
{ "name" : "宋江", "gender" : "" }
{ "name" : "扈三娘", "gender" : "" }
{ "name" : "晁盖", "gender" : "" }
{ "name" : "李逵", "gender" : "" }
{ "name" : "杨志", "gender" : "" }
{ "name" : "武松", "gender" : "" }
{ "name" : "燕青", "gender" : "" }
{ "name" : "阮小二", "gender" : "" }
{ "name" : "鲁智深", "gender" : "" }


The esProc script helping with MongoDB computation is as follows:

A1Connect to the MongoDB database. The IP and port number is localhost:27017, the database name is test and both the user name and the password are test. If any other parameters are needed, write them in line with the format mongo://ip:port/db?arg=value&…

A2Fetch data from the MongoDB database using find function to create a cursor. The collection is person. The filtering criterion is null and the specified keys are name and gender. It can be seen that this find function is similar to the find function of MongoDB. By fetching and processing data in batches, the esProc cursor can avoid the memory overflow caused by big data importing.

A3Since the data here are small, fetch function will fetch them all at once.

A4Sort the data by name in ascending order, using sort function. Chinese is used in the data sorting. For the other localized languages esProc supports, please see below.

The result of operation is:

One thing to note is that esProc doesn't provide the java driver of MongoDB. To access MongoDB with esProc, the latter (a driver of 2.12.2 version or above is required, i.e. mongo-java-driver-2.12.2.jar) should be put into the [esProc installation directory]\common\jdbc beforehand.

The script for computation in MongoDB with the assistance of esProc is easy to integrate with Java program. By adding another line of code – A5, which is result A4, the result in the form of resultset can be output to Java program. For detailed code, please refer to esProc Tutorial. In the same way, to access MongoDB by calling esProc code with Java program also requires putting the java driver of MongoDB into the classpath of Java program.

The java driver of MongoDB can be downloaded from the following URL: https://github.com/mongodb/mongo-java-driver/releases

esProc supports all the following Languages:
ja_JP       Japanese   Japan
es_PE     Spanish       Peru
en            English
ja_JP_JP   Japanese Japan
es_PA     Spanish       Panama
sr_BA     Serbian       Bosnia and Herzegovina
mk           Macedonian
es_GT    Spanish       Guatemala
ar_AE     Arabic         United Arab Emirates
no_NO   Norwegian          Norway
sq_AL     Albanian     Albania
bg            Bulgarian
ar_IQ      Arabic  Iraq
ar_YE     Arabic  Yemen
hu            Hungarian
pt_PT     Portuguese  Portugal
el_CY      Greek  Cyprus
ar_QA    Arabic  Qatar
mk_MK  Macedonian       Macedonia
sv            Swedish
de_CH    German      Switzerland
en_US    English        United States
fi_FI        Finnish        Finland
is             Icelandic
cs            Czech         
en_MT   English        Malta
sl_SI        Slovenian   Slovenia
sk_SK      Slovak         Slovakia
it              Italian
tr_TR      Turkish        Turkey
zh            Chinese
th            Thai
ar_SA     Arabic         Saudi Arabia
no            Norwegian         
en_GB    English        United Kingdom
sr_CS      Serbian       Serbia and Montenegro
lt              Lithuanian
ro            Romanian
en_NZ    English        New Zealand
no_NO_NY       Norwegian       Norway    Nynorsk
lt_LT       Lithuanian Lithuania
es_NI     Spanish       Nicaragua
nl             Dutch         
ga_IE      Irish   Ireland
fr_BE      French        Belgium
es_ES     Spanish       Spain
ar_LB     Arabic         Lebanon
ko            Korean
fr_CA      French        Canada
et_EE     Estonian     Estonia
ar_KW    Arabic         Kuwait
sr_RS      Serbian       Serbia
es_US     Spanish       United States
es_MX   Spanish       Mexico
ar_SD     Arabic         Sudan
in_ID      Indonesian          Indonesia         
ru            Russian
lv             Latvian
es_UY    Spanish       Uruguay
lv_LV       Latvian       Latvia
iw            Hebrew
pt_BR     Portuguese         Brazil
ar_SY      Arabic         Syria
hr            Croatian
et            Estonian
es_DO    Spanish       Dominican Republic
fr_CH     French        Switzerland
hi_IN      Hindi  India
es_VE     Spanish       Venezuela
ar_BH    Arabic         Bahrain
en_PH    English        Philippines
ar_TN     Arabic         Tunisia
fi              Finnish
de_AT     German      Austria
es            Spanish
nl_NL      Dutch          Netherlands
es_EC     Spanish       Ecuador
zh_TW   Chinese      Taiwan
ar_JO     Arabic         Jordan
be            Belarusian
is_IS        Icelandic    Iceland
es_CO    Spanish       Colombia
es_CR    Spanish       Costa Rica
es_CL     Spanish       Chile
ar_EG     Arabic         Egypt
en_ZA    English        South Africa
th_TH     Thai    Thailand
el_GR     Greek          Greece
it_IT        Italian         Italy
ca            Catalan
hu_HU   Hungarian Hungary
fr             French
en_IE      English        Ireland
uk_UA    Ukrainian   Ukraine
pl_PL      Polish Poland
fr_LU      French        Luxembourg
nl_BE      Dutch          Belgium
en_IN     English        India
ca_ES     Catalan      Spain
ar_MA   Arabic         Morocco
es_BO    Spanish       Bolivia
en_AU    English        Australia
sr             Serbian
zh_SG     Chinese      Singapore
pt            Portuguese        
uk            Ukrainian
es_SV     Spanish       El Salvador
ru_RU    Russian      Russia
ko_KR     Korean        South Korea
vi             Vietnamese
ar_DZ     Arabic         Algeria
vi_VN     Vietnamese        Vietnam
sr_ME    Serbian       Montenegro
sq            Albanian
ar_LY      Arabic         Libya
ar            Arabic
zh_CN    Chinese      China
be_BY    Belarusian Belarus
zh_HK    Chinese      Hong Kong
ja             Japanese
iw_IL      Hebrew      Israel
bg_BG    Bulgarian   Bulgaria
in             Indonesian         
mt_MT  Maltese     Malta
es_PY     Spanish       Paraguay
sl             Slovenian
fr_FR      French        France
cs_CZ     Czech          Czech Republic
it_CH      Italian         Switzerland
ro_RO    Romanian  Romania
es_PR     Spanish       Puerto Rico
en_CA    English        Canada
de_DE    German      Germany
ga            Irish
de_LU    German      Luxembourg
de            German
es_AR    Spanish       Argentina        
sk            Slovak
ms_MY  Malay         Malaysia
hr_HR    Croatian     Croatia
en_SG    English        Singapore
da            Danish
mt           Maltese
pl             Polish
ar_OM   Arabic         Oman
tr             Turkish
th_TH_TH         Thai Thailand   TH
el             Greek         
ms           Malay
sv_SE      Swedish      Sweden
da_DK    Danish        Denmark
es_HN    Spanish       Honduras          

No comments:

Post a Comment