Amakhompyutha, Izinhlelo
UTF-8 - uphawu lokufaka ikhodi
Unicode isekela cishe zonke wezinhlamvu ekhona. Indlela engcono kakhulu yo- ngamakhodi Unicode lophawu kuyinto UTF-8 encoding. Isekela ukuhambisana ne-ASCII, ukumelana ukuhlanekezela idatha, ukusebenza kanye kalula ukucutshungulwa. Kodwa izinto zokuqala kuqala.
ifomu lokufaka ikhowudi
Ezamakhompiyutha usebenza izinto nje kuphela njengoba izinombolo abstract zezibalo, kanye nenhlanganisela amayunithi wesitoreji nokusingatha fixed-usayizi idatha - ibhayithi futhi amazwi 32-bhithi. Ifaka ikhodi ejwayelekile kumelwe athathe lesi uma sinquma indlela ukwethula inani lezinhlamvu.
Lwezinhlelo zama-computer, i-integers egcinwe kumemori amaseli 8 izingcezu (1 ibhayithi), 16 noma 32 izingcezu. ifomu ngamunye uchaza Unicode ukuqonda, okuyinto ukulandelana inkumbulo cell inenombolo elihambisana uphawu oluthile. Esikhathini ejwayelekile kukhona izinhlobo ezintathu ezahlukene esephepheni ze-Unicode 8, 16 no-32-bit amabhlogo. Ngakho, baziwa ngokuthi UTF-8, UTF-16 futhi UTF-32. Igama UTF imele Unicode Uguquko Ifomethi. Ngamunye amafomu ezintathu ngamakhodi izindlela ilingana ukumelwa Unicode uhlamvu kunezinzuzo zokusebenza ezihlukile.
Idatha sobhalomfihlo kungenziwa isetshenziselwa ukumelela lonke izinhlamvu Unicode ejwayelekile. Ngakho, ababona nje ehambisanayo ngokugcwele izixazululo yezizathu ezihlukahlukene, usebenzisa izindlela ezihlukene zokubiza esephepheni. esephepheni ngalinye lingaba nangokunembile ibe kunoma yiziphi ezinye ezimbili ngaphandle kokulahleka kwedatha.
nenalozheniya isimiso
Ngamunye ukufaka ikhodi amafomu Unicode ithuthukiswe kucatshangelwa non ukugqagqana esinqunyiwe. Ngokwesibonelo, i-Windows-932 wakha izinhlamvu ngendlela eyodwa noma ezimbili amabhayithi yekhodi. Ubude ukulandelana incike ibhayithi kuqala, ukuze ekuholeni ibhayithi ngemagugu uchungechunge ezimbili-Byte futhi olulodwa ibhayithi disjoint. Nokho, ukubaluleka ibhayithi olulodwa nezilandelayo ibhayithi ukulandelana kungenzeka uqondane. Lokhu kusho isibonelo ukuthi D yokusesha uhlamvu (ikhodi 44) Ungayithola ngephutha bengena ingxenye ukulandelana ezimbili-Byte uhlamvu "D" yesibili (ikhodi 84 44). Ukuze uthole ukuthi lapho ukulandelana lilungile, uhlelo kufanele acabangele amabhayithi odlule.
Isimo eziyinkimbinkimbi, uma eziholayo nezilandelayo amabhayithi umdlalo. Lokhu kusho ukuthi ukuze ususe ubumbaxambili kuyoba Ukubheka reverse engakafiki ekuqaleni kombhalo noma eyingqayizivele ikhodi ukulandelana. Lokhu akukhona nje kahle, kodwa bavikelekile amaphutha kungenzeka, kusukela eyodwa kuphela ibhayithi okungalungile ukuze umbhalo ogcwele uye engasafundeki.
Ifomethi ukuguqulwa Unicode ugwema le nkinga ngoba ukubaluleka ehola, edonsa, futhi iyunithi elilodwa isitoreji akuzona ulwazi olufanayo. Lokhu kuqinisekisa ukuthi zonke Unicode sokufuna futhi ukuqhathanisa, ningalokothi nikhathale Imiphumela yokuphambuka ngenxa ukuqondana izingxenye ezahlukene ikhodi uhlamvu. Iqiniso lokuthi lezi zinhlobo esephepheni bagcine nenalozheniya isimiso, ihlukanisa kubo kusuka kwamanye-East Asian multi-Byte lokubhala.
Esinye isici nonintersection Unicode lokubhala ngobumfihlo wukuthi umlingisi ngamunye has a emngceleni ecacile neqondile. Lokhu aqede isidingo scan i nenani elinganqunyiwe eliku izimpawu odlule. Lesi sici ngezinye izikhathi ibizwa ngokuthi wekhodi self-clocking. Ukuhlanekezela ikhodi amayunithi bayofaka ukuhlanekezela ngohlamvu olulodwa kuphela, nezinhlamvu ezungeze kukhona isekhona. Esikhathini-8-bit format ukuguqulwa, uma amaphuzu pointer kuya Byte, eziqala 10xxxxxx (ikhodi kanambambili) ukuthola kokuqala uphawu iyadingeka ukuze eyodwa kuya kwemithathu ukuhwebelana reverse.
ukungaguquguquki
Unicode Consortium isekela ngokuzeleko zonke izinhlobo 3 lokubhala. Akubalulekile nje ukuba nimelane UTF-8 kanye Unicode, njengoba wonke amafomethi ukuguqulwa - ngokulinganayo izindlela ezivumelekile samuntu Unicode uhlamvu-wekhodi ejwayelekile.
Byte-orientation
Ukumela izinhlamvu UTF-32 uzodinga-32-bit ikhodi iyunithi, okuyinto ezokwenzeka ikhodi Unicode. UTF-16 - oyedwa amabili-16-bit amayunithi. A UTF-8 isebenzisa kuze 4 amabhayithi.
UTF-8 encoding iklanyelwe ukuba iyahambisana izinhlelo ASCII ezisekelwe ibhayithi ngamakhasimende. Iningi isofthiwe ekhona futhi umkhuba kolwazi nobuchwepheshe isikhathi eside wencike ukumelwa lezinhlamvu ukulandelana amabhayithi. yezifiso Multiple incike ukuqinisela ka ASCII wekhodi futhi isebenzisa noma ugwema ekhethekile zokulawula. Indlela elula ukuzivumelanisa nezimo Unicode can, usebenzisa i-8-bit esephepheni ukumela izinhlamvu Unicode, iyiphi okulingana ASCII uhlamvu noma uhlamvu control. Kuze kube yimanje, futhi kuba UTF-8 encoding.
Ubude variable
UTF-8 - esephepheni ubude variable, ehlanganisa-8-bit isitoreji amayunithi, izingcezu eliphezulu okuyinto akhombise ukuthi iyiphi ingxenye ukulandelana ngasinye ibhayithi ngabanye fanele. Omunye ububanzi bamanani eyabelwe kungxenye lokuqala ukulandelana ikhodi, elinye - for the next. Lokhu kunikeza wekhodi disjointness.
ASCII
UTF-8 encoding isekelwe ngokugcwele ASCII amakhodi (0x00-0x7F). Lokhu kusho ukuthi ze-Unicode U + 0000-U + 007F aguqulwa ku ibhayithi olulodwa 0x00-0x7F UTF-8 futhi ngaleyo ndlela babe izakhamuzi zaseBosnia zifana kusuka ASCII. Ngaphezu kwalokho, ukuze ugweme ambiguity ukubaluleka 0x00-0x7F ungazange usebenzise noma imiphi ngaphezulu ibhayithi ukumelwa eyodwa ze-Unicode. Ukuze encode izimpawu neideograficheskih ngaphandle ASCII, usebenzisa ukulandelana amabhayithi ezimbili. Izimpawu isukela U + 0800-U + FFFF amelwa amabhayithi ezintathu, futhi amakhodi ezengeziwe njengoba kungabantu abangaphezulu kuka U + FFFF zidinga amabhayithi ezine.
sphere kwesicelo
UTF-8 encoding ngokuvamile inikezwa ukukhetha protocol HTML, nokunye okunjalo.
-XML isibe standard kuqala nge sisekela ngokuzeleko UTF-8 encoding. izinhlangano Amazinga batuse ke. inkinga Ukusekela ikheli URL ukuthi lihlukile ASCII-izinhlamvu, yaxazululwa lapho W3C Consortium kanye IETF iqembu ubunjiniyela wafika isivumelwano phezu esephepheni of bonke amakheli URL kuphela UTF-8.
Oluvumelana nge ASCII kusiza kwathatha ngesofthiwe entsha. UTF-8 isebenza kakhulu umbhalo abahleli, kuhlanganise JEdit, Emacs, BBEdit, Eclipse, futhi "incwajana" uhlelo lokusebenza le-Windows. Ayikho enye indlela wekhodi Unicode abakwazi ukuqhosha isaphothi efana ithuluzi.
esephepheni inzuzo ukuthi siqukethe ukulandelana amabhayithi. UTF-8 string kulula ukusebenza C nezinye izilimi izinhlelo. Lena kuphela indlela ukuqonda, oda akudingi amalebula amabhayini BOM noma isimemezelo wekhodi XML.
self-ukuvumelanisa
Endaweni isebenzisa-8-bit Amatshwayo ukucutshungulwa iqhathaniswa nezinye multi-Byte wezinhlamvu, UTF-8 ine izinzuzo ezilandelayo:
- Eyokuqala ikhodi ibhayithi ukulandelana liqukethe ulwazi mayelana ubude bayo. Lokhu kwandisa ukusebenza kahle kwe-search ngqo.
- Lula ekutholeni ekuqaleni uphawu endaweni eqala ibhayithi kukhawulwe ibanga esinqunyiwe yamanani.
- Awekho amanani empambana ibhayithi.
Qhathanisa izinzuzo
UTF-8 encoding kuyinto ezihlangene. Kodwa uma isetshenziselwa ngamakhodi izinhlamvu-East Asian (Chinese, isiNorway, ngokubhala Chinese usebenzisa izimpawu) asetshenziswa 3-Byte ukulandelana. Futhi UTF-8 encoding zingaphansi nezinye izinhlobo esephepheni processing speed. A imigqa kanambambili ukuhlunga ukhiqiza umphumela ofanayo njengoba kanambambili ukuhlunga Unicode.
Isikimu uphawu lokufaka ikhodi
Isikimu uphawu lokufaka ikhodi yakhiwa ifomu lokufaka ikhodi izimpawu indlela ibhayithi olulodwa ikhodi indawo amayunithi. Ukuze sithole uhlelo lokufaka ikhodi Unicode ejwayelekile inikeza ukusetshenziswa kokuqala ibhayithi oda uphawu (BOM, Byte oda uphawu).
Lapho BOM e-UTF-8 sici ithegi kukhawulwe kuphela kukhulunywa ukusetshenziswa izinhlobo esephepheni. Izinkinga ekunqumeni UTF-8 endian okwenzani, njengoba layo usayizi wekhodi iyunithi ingenye ibhayithi. Ukusebenzisa BOM ngoba lolu hlobo esephepheni is futhi kudingeka noma Kunconywa. BOM kungenzeka embhalweni elizoguqulwa kusuka kwamanye codings usebenzisa ibhayithi oda uphawu noma isignesha UTF-8 encoding. Ingabe ekulandelaneni 3 amabhayithi EF BB 16 16 BF 16.
Indlela ukusetha UTF-8 encoding
I -HTML esephepheni UTF-8 efakwe nge ikhodi elandelayo:
ikhanda
Meta http-okulinganayo = "Okuqukethwe-Uhlobo" okuqukethwe = "text / html; charset = utf-8" ˃
Ngo PHP UTF-8 encoding isethwe usebenzisa unhlokweni () umsebenzi ekuqaleni ifayela ngemuva ukubeka iphutha okukhipha ezingeni Inani:
˂? Php
error_reporting (-1);
unhlokweni ( "okuqukethwe-Type: text / html; charset = utf-8);
Ukuze uxhume database MySQL UTF-8 encoding isethwe:
˂? Php
mysql_set_charset ( 'utf8');
Ukufaka ikhodi CSS-ifayela kuyinto UTF-8 ezingu-ecacisiwe kanje:
@charset "utf-8";
Uma ulondoloza amafayela zonke izinhlobo ukukhetha UTF-8 encoding ngaphandle BOM, kungenjalo isayithi ngeke zisebenze. Ukwenza lokhu DreamWeave kudingeka ukhethe into yemenyu "Ukulungiswa - Page Properties - Isihloko / Ifaka ikhodi" ukuze ushintshe umbhalo wekhodi ukuze UTF-8. Kulandele kabusha ikhasi, susa uphawu isheke kusukela "Xhuma Unicode isignesha (BOM)» bese ukusebenzisa izinguquko. Uma noma yimuphi umbhalo ekhasini noma egciniwe laqanjwa olunye uhlobo esephepheni, kubalulekile ukuba ufake kabusha noma kabusha encode. Uma usebenza nge izinkulumo njalo, qiniseka ukusebenzisa lwewashi u.
Ungase futhi ugcine ifayela ku UTF-8 encoding ku "incwajana" ye-Windows. Ngemva kokukhetha le into yemenyu "Ifayela - Londoloza njenge ..." ukufaka ifomu ledzingekako ukuyiqonda nokuyikhumbula ukulondoloza ifayela e-UTF-8.
Ngo umbhalo umhleli incwajana ++, uma isethiwe ngaphandle UTF-8, nge imenyu kwento "Guqulela UTF-8 ngaphandle BOM» ukushintsha uhlamvu bese ugcine e-UTF-8.
akukho okunye
Emongweni kwembulunga yonke, lapho imingcele zezombusazwe namalimi kahle, wezinhlamvu anezici wendawo, kukhona ukusetshenziswa kancane. Unicode kuyinto lophawu olulodwa esekela zonke kwasendaweni. A UTF-8 - isibonelo ukuqaliswa ngendlela efanele Unicode, okuyinto:
- It isekela amathuluzi anhlobonhlobo, kuhlanganise ukuhambisana ASCII wekhodi;
- Kuyinto ukumelana idatha ukuhlanekezela;
- elula ekwelapheni;
- kuyinto platform ezizimele.
Njengoba sekunama-the-UTF-8 mpikiswano ngalokho uhlobo lokufaka ikhodi noma lophawu kungcono, kuba lutho.
Similar articles
Trending Now