Articles

Chii chinonzi vector dhatabhesi, mashandiro avanoita uye musika ungangoita

Vector dhatabhesi imhando yedatabase inochengeta data seyakakwira-dimensional vectors, ari masvomhu anomiririra ezvimiro kana hunhu. 

Aya mavekita anowanzo kugadzirwa nekushandisa imwe mhando yekumisikidza basa kune mbishi data, senge zvinyorwa, mifananidzo, odhiyo, vhidhiyo, nezvimwe.

Vector databases inogona kuva definite sechishandiso chinoisa uye kuchengetedza vector embeds kuti itore nekukurumidza uye kufanana kutsvaga, iine maficha akaita semetadata kusefa uye yakachinjika kuyera.

Inofungidzirwa nguva yekuverenga: 9 minuti

Kukura Investor Kufarira

Mumavhiki apfuura, kwave nekuwedzera kwekudyara kufarira mune vector dhatabhesi. Kubva kutanga kwa2023 takaona kuti:

  • vector database kutanga Weaviate Akawana $50 miriyoni muSeries B mari;
  • pine koni yakasimudza $100 miriyoni muSeries B mari pamutengo wemamiriyoni mazana manomwe nemakumi mashanu emadhora;
  • Chroma , chirongwa chakavhurika sosi, chakakwidza madhora gumi nemasere emamiriyoni ekumisikidza dhatabhesi;

Ngationei zvakadzama kuti vector databases chii.

Vectors sekumiririra data

Vector dhatabhesi inotsamira zvakanyanya pane vector embedding, rudzi rwekumiririra data iyo inotakura mukati mayo ruzivo rwesemantic yakakosha kuti AI iwane kunzwisisa uye kuchengetedza ndangariro yenguva refu yekukwevera pairi kuita mabasa akaomarara. 

Vector embeds

Vector embeds vakaita semepu, asi pachinzvimbo chekuti vatiratidze zviri munyika, vanotiratidza pane zvinhu mune chimwe chinhu chinonzi. vector nzvimbo. Vector space imhando yenhandare hombe yekutamba iyo zvese zvine nzvimbo yazvo yekutamba. Fungidzira kuti une boka remhuka: katsi, imbwa, shiri nehove. Tinogona kugadzira vector embed yemufananidzo wega wega nekupa iyo yakakosha chinzvimbo panzvimbo yekutamba. Katsi inogona kunge iri mune imwe kona, imbwa kune rumwe rutivi. Shiri yaigona kunge iri mudenga uye hove dzaigona kunge dziri mudziva. Iyi nzvimbo ine multidimensional nzvimbo. Chiyero chimwe nechimwe chinoenderana nemhando dzakasiyana dzazvo, semuenzaniso, hove dzine zvimbi, shiri dzine mapapiro, katsi nembwa dzine makumbo. Chimwe chinhu chazvo chingava chokuti hove ndedzemvura, shiri zvikurukuru ndedzekudenga, uye katsi nembwa pasi. Kana tave nemavekita aya, tinogona kushandisa nzira dzemasvomhu kuaisa mumapoka zvichienderana nekufanana kwawo. Zvichienderana neruzivo rwatinarwo,

Saka, mavheji embeddings akafanana nemepu inotibatsira kuwana kufanana pakati pezvinhu zviri muvector space. Sekubatsira kunoita mepu kufamba pasirese, vector embeds inobatsira kufamba munhandare yekutamba.

Pfungwa yakakosha ndeyekuti embeds iyo semantically yakafanana kune imwe neimwe ine diki chinhambwe pakati pavo. Kuti uone kuti akafanana sei, tinogona kushandisa vector kureba mabasa akadai seEuclidean chinhambwe, cosine chinhambwe, nezvimwe.

Vector databases vs vector raibhurari

The vector raibhurari chengetedza zvakamisikidzwa zvemavheji muma indexes mundangariro, kuitira kuti uite kutsvaga kwakafanana. Vector maraibhurari ane anotevera maitiro / zvisingakwanisi:

  1. Chengeta mavector chete : Vector maraibhurari anongochengeta embeddings yemavheji uye kwete zvinhu zvine chekuita nazvo kwavakagadzirwa. Izvi zvinoreva kuti kana isu tichibvunza, raibhurari yevector inopindura nemavheji akakodzera uye maID echinhu. Izvi zvinomisa sezvo iwo chaiwo ruzivo rwakachengetwa muchinhu uye kwete id. Kuti tigadzirise dambudziko iri, tinofanira kuchengeta zvinhu munzvimbo yechipiri. Tinogona kuzoshandisa maID akadzoserwa nemubvunzo toafananidza nezvinhu kuti tinzwisise zvabuda.
  2. Index data haichinji : Ma indexes anogadzirwa nevector raibhurari haachinjiki. Izvi zvinoreva kuti kana tangounza kunze data redu uye kuvaka index, hatigone kuita chero shanduko (hapana kutsva kuisa, kudzima, kana shanduko). Kuita shanduko kune yedu index, isu tichafanirwa kuivaka patsva kubva pakutanga
  3. Mubvunzo uchirambidza kunze kwenyika : Mazhinji mavheta maraibhurari haagone kubvunzwa paunenge uchipinza data. Isu tinofanirwa kupinza zvese zve data zvinhu zvedu kutanga. Saka index inogadzirwa mushure mekunge zvinhu zvatengeswa kunze kwenyika. Izvi zvinogona kuve dambudziko kumaapplication anoda mamirioni kana kunyange mabhiriyoni ezvinhu kuti zviunzwe kunze kwenyika.

Kune akawanda vector kutsvaga maraibhurari aripo: FAISS yeFacebook, Kutsamwisa neSpotify uye ScanNN neGoogle. FAISS inoshandisa nzira yekubatanidza, Annoy inoshandisa miti uye ScanNN inoshandisa vector compression. Pane kutengeserana kwekuita kune yega yega, yatinogona kusarudza zvichienderana nekushandisa kwedu uye maitiro metrics.

CRUD

Chinhu chikuru chinosiyanisa vector dhatabhesi kubva kune vector raibhurari kugona kuchengetedza, kugadzirisa uye kudzima data. Vector dhatabhesi vane CRUD rutsigiro yakakwana (gadzira, verenga, gadziridza uye bvisa) iyo inogadzirisa zvipimo zvevector raibhurari.

  1. Archive vectors uye zvinhu : Databases inogona kuchengeta zvese data zvinhu uye mavheji. Sezvo ese ari maviri akachengetwa, tinogona kusanganisa kutsvaga kwevector nemafirita akarongeka. Mafirita anotitendera kuti tive nechokwadi chekuti vavakidzani vepedyo vanofananidza metadata sefa.
  2. Mutability : sevector databases inotsigira zvizere crud, tinokwanisa kuwedzera, kubvisa kana kugadzirisa zviri nyore mundekisi yedu mushure mekunge yagadzirwa. Izvi zvinonyanya kubatsira kana uchishanda neanogara achichinja data.
  3. Kutsvaga-nguva chaiyo : Kusiyana nemaraibhurari evector, dhatabhesi inotitendera kubvunza uye kugadzirisa data redu panguva yekuunza. Sezvo isu tichirodha mamirioni ezvinhu, iyo data inotengeswa kunze kwenyika inoramba ichiwanikwa zvizere uye ichishanda, saka haufanirwe kumirira kuti kupinza kupedze kuti utange kushanda pane zvagara zviripo.

Muchidimbu, dhatabhesi yevector inopa mhinduro yepamusoro yekubata vector embeds nekugadzirisa izvo zvinogumira zvega-ine mavector indices sezvakakurukurwa mumapoinzi apfuura.

Asi chii chinoita kuti vector dhatabhesi ive nani kune echinyakare dhatabhesi?

Vector databases vs echinyakare dhatabhesi

Madhatabheti echinyakare akagadzirirwa kuchengetedza uye kudzoreredza data rakarongeka uchishandisa mamodheru ane hukama, zvinoreva kuti akagadziridzwa kumibvunzo zvichienderana nemakoramu nemitsara yedata. Kunyange zvichikwanisika kuchengetedza mavheti ekumisikidza mune echinyakare dhatabhesi, aya dhatabhesi haana kugadziridzwa kuitira vector mashandiro uye haakwanise kuita tsvakiridzo dzakafanana kana mamwe mabasa akaomarara pamaseti makuru nemazvo.

Izvi zvinodaro nekuti dhatabhesi dzechinyakare dzinoshandisa nzira dzekunongedza zvichienderana nemhando dzedata dzakareruka, senge tambo kana nhamba. Aya maindexing matekiniki haakodzere vector data, ine hupamhi hwepamusoro uye inoda hunyanzvi hwekunongedza senge inverted indexes kana miti yepakati.

Zvakare, dhatabhesi dzechinyakare hadzina kugadzirwa kuti dzibate huwandu hukuru hwe data isina kurongeka kana semi-yakarongeka inowanzobatanidzwa nevector embeds. Semuenzaniso, mufananidzo kana faira rekuteerera rinogona kuva nemamiriyoni emapoinzi edata, ayo echinyakare dhatabhesi haagone kubata nemazvo.

Vector dhatabhesi, kune rumwe rutivi, yakanyatsogadzirirwa kuchengetedza uye kudzoreredza vector data uye inogadziridzwa kutsvaga kwakafanana uye mamwe mabasa akaomarara pamaseti makuru. Ivo vanoshandisa hunyanzvi hwekunongedza hunyanzvi uye maalgorithms akagadzirirwa kushanda neakakwira-dimensional data, zvichiita kuti zviite zvakanyanya kupfuura zvechinyakare dhatabhesi yekuchengetedza uye kudzoreredza vector embeds.

Zvino zvawaverenga zvakanyanya nezve vector dhatabhesi, unogona kunge uchishamisika, vanoshanda sei? Ngatitarisei.

Vector database inoshanda sei?

Isu tese tinoziva mashandiro ehukama dhatabhesi: vanochengeta tambo, nhamba, uye mamwe marudzi e scalar data mumitsara nemakoramu. Kune rimwe divi, dhatabhesi yevector inoshanda pamavheji, saka nzira yainokwenenzverwa nekubvunzwa yakatosiyana.

Mune echinyakare dhatabhesi, isu tinowanzo kubvunza mitsetse mudhatabhesi uko kukosha kunowanzoenderana nemubvunzo wedu chaizvo. Mune vector dhatabhesi, isu tinoshandisa yakafanana metric kuti titsvage vheji inonyanya kufanana nemubvunzo wedu.

Vector dhatabhesi inoshandisa musanganiswa we akati wandei algorithms ayo ese anotora chikamu mupedyo kutsvaga muvakidzani (ANN). Aya maalgorithms anokwirisa kutsvaga nehashing, quantization, kana graph-based kutsvaga.

Aya maalgorithms akaunganidzwa kuita pombi inopa nekukurumidza uye kwakaringana kudzoreredza kwevavakidzani vevector vakabvunzwa. Sezvo iyo vector dhatabhesi ichipa mhedzisiro mhedzisiro, iyo mikuru tradeoffs yatinofunga iri pakati pechokwadi nekumhanya. Kana mhedzisiro yacho yakanyatsojeka, mubvunzo unononoka. Nekudaro, yakanaka sisitimu inogona kupa yekupedzisira-yekukurumidza kutsvaga nepedyo-yakakwana kunyatso.

  • Indexing : Iyo vector dhatabhesi inoratidzira mavheji uchishandisa algorithm senge PQ, LSH kana HNSW. Iyi nhanho inosanganisa mavector ne data data iyo inobvumira kukurumidza kutsvaga.
  • Query : vector dhatabhesi inoenzanisa iyo indexed query vector neindexed vectors mudataset kuti uwane vavakidzani vepedyo (kushandisa yakafanana metric inoshandiswa neiyo index)
  • Post-processing : Mune zvimwe zviitiko, dhatabhesi yevector inotora vavakidzani vepedyo vekupedzisira kubva mudhatabheti uye vobva vagadzirisa kuti vadzose mhinduro yekupedzisira. Danho iri rinogona kusanganisira kuisa patsva vavakidzani vepedyo uchishandisa chiyero chakasiyana chekufanana.

Benefits

Vector dhatabhesi chishandiso chine simba chekutsvaga kwakafanana uye mamwe mabasa akaomarara pamaseti makuru edata, asingagone kuitwa nemazvo uchishandisa echinyakare dhatabhesi. Kuvaka inoshanda vector dhatabhesi, embeds yakakosha, sezvo ichitora semantic zvinoreva data uye inogonesa kutsvaga kwakafanana. Kusiyana nemaraibhurari evector, vector dhatabhesi dzakagadzirwa kuti dzikwane yedu yekushandisa kesi, ichiita kuti ive yakanakira maapplication uko kuita uye scalability kwakakosha. Nekukwira kwemichina yekudzidza uye hungwaru hwekugadzira, vector dhatabhesi dziri kuwedzera kukosha kune dzakasiyana siyana dzekushandisa dzinosanganisira recommender masisitimu, kutsvaga kwemifananidzo, kufanana semantic uye rondedzero inoenderera. Sezvo munda uchiramba uchishanduka, tinogona kutarisira kuona zvakatowanda zvekushandisa zvevector dhatabhesi mune ramangwana.

Ercole Palmeri

Innovation newsletter
Usarasikirwa nenhau dzakanyanya kukosha dzekuvandudza. Nyora kuti uvagamuchire neemail.

Zvinyorwa zvekare

Veeam inoratidzira yakanyatso tsigiro yerudzikinuro, kubva padziviriro kusvika pakupindura uye kupora

Coveware neVeeam icharamba ichipa cyber kubira chiitiko mhinduro masevhisi. Coveware ichapa forensics uye kugadzirisa kugona…

23 April 2024

Green uye Digital Revolution: Sei Predictive Maintenance iri Kushandura Oiri & Gasi Indasitiri

Kufanofungidzira kugadzirisa kuri kushandura chikamu cheoiri & gasi, nemaitiro matsva uye akasimba ekutarisira zvidyarwa.…

22 April 2024

UK antitrust regulator inosimudza BigTech alarm pamusoro peGenAI

Iyo UK CMA yakapa yambiro nezvemaitiro eBig Tech mumusika wehungwaru hwekugadzira. Ikoko…

18 April 2024

Casa Green: shanduko yesimba kune ramangwana rakagadzikana muItari

Chirevo che "Case Green", chakagadzirwa neEuropean Union kuti chiwedzere kushanda nesimba kwezvivakwa, chapedza hurongwa hwayo hwemutemo ne…

18 April 2024