In recent years, we have seen great advances in machine learning and artificial intelligence that could usher in a new era of progress. In the area of natural language processing, three algorithms have been the cornerstone of this innovation: GPT, BERT, and T5.
近年來(lái),我們看到機(jī)器學(xué)習(xí)和人工智能領(lǐng)域取得了巨大進(jìn)步,這可能會(huì)迎來(lái)一個(gè)進(jìn)步的新時(shí)代。在自然語(yǔ)言處理領(lǐng)域,三種算法是這一創(chuàng)新的基石:GPT、BERT 和 T5。
In the past couple of years, there have been some revolutionary advances in machine learning (ML) and artificial intelligence (AI). These advances are demonstrating that ML and AI are moving from science fiction to science fact and that they have the capacity for transformational change across many industries. From DALL-E and Lensa demonstrating how machines can create art to ChatGPT demonstrating that machines can write articles, poetry, song lyrics, and even programming code, this domain is on the precipice of huge advances.
在過(guò)去幾年中,機(jī)器學(xué)習(xí)(ML)和人工智能(AI)領(lǐng)域取得了一些革命性的進(jìn)步。這些進(jìn)步表明機(jī)器學(xué)習(xí)和人工智能正在從科幻小說(shuō)轉(zhuǎn)變?yōu)榭茖W(xué)事實(shí),并且它們有能力在許多行業(yè)實(shí)現(xiàn)轉(zhuǎn)型變革。從 DALL-E 和 Lensa 展示機(jī)器如何創(chuàng)造藝術(shù),到 ChatGPT 展示機(jī)器可以寫文章、詩(shī)歌、歌詞,甚至編程代碼,這個(gè)領(lǐng)域正處于巨大進(jìn)步的邊緣。
|
Underlying these amazing demonstrations of the business value of ML and AI is a set of technologies that fall into the family of neural networks called transformers. As an analytics leader, you don’t necessarily have to understand all the technical details associated with how these are programmed and the inner workings of their code, but it is important to understand what they are and what makes them unique.
機(jī)器學(xué)習(xí)和人工智能商業(yè)價(jià)值的這些令人驚嘆的展示背后是屬于稱為 Transformer 的神經(jīng)網(wǎng)絡(luò)家族的一組技術(shù)。作為分析領(lǐng)導(dǎo)者,您不一定必須了解與這些編程方式及其代碼的內(nèi)部工作原理相關(guān)的所有技術(shù)細(xì)節(jié),但了解它們是什么以及它們的獨(dú)特之處非常重要。
In 2017, a group of researchers at Google and the University of Toronto developed a new type of neural network architecture: the transformer. Originally, the goal of this team was to enable machine translation, but their findings have gone beyond just translation and have revolutionized multiple arenas in the ML world. Unlike the recurrent neural nets (RNNs) of the past, which were feed-forward in nature and expected data to arrive in a sequential manner, these transformers allowed the data to be distributed and parallelized. This means they can process huge amounts of data and can train very large models.
2017 年,谷歌和多倫多大學(xué)的一組研究人員開(kāi)發(fā)了一種新型神經(jīng)網(wǎng)絡(luò)架構(gòu):Transformer。最初,該團(tuán)隊(duì)的目標(biāo)是實(shí)現(xiàn)機(jī)器翻譯,但他們的發(fā)現(xiàn)不僅僅局限于翻譯,還徹底改變了 ML 世界的多個(gè)領(lǐng)域。過(guò)去的循環(huán)神經(jīng)網(wǎng)絡(luò) (RNN) 本質(zhì)上是前饋的,并且預(yù)期數(shù)據(jù)以順序方式到達(dá),而這些轉(zhuǎn)換器則不同,這些轉(zhuǎn)換器允許數(shù)據(jù)分布和并行化。這意味著他們可以處理大量數(shù)據(jù)并可以訓(xùn)練非常大的模型。
What Makes the Transformer Special
變壓器有何特別之處
There are three concepts that enable these transformers to succeed where the RNN didn’t: positional encoding, attention, and self-attention.
三個(gè)概念使這些 Transformer 能夠在 RNN 無(wú)法做到的地方取得成功:位置編碼、注意力和自注意力。
Positional encoding removes the need to process one word of a sentence at a time. Each word in the corpus is encoded to have both the text of the word and the position in the sentence. This allows the model to be built in a distributed fashion across multiple processors and leverage mass parallelization.
位置編碼無(wú)需一次處理句子中的一個(gè)單詞。語(yǔ)料庫(kù)中的每個(gè)單詞都被編碼為具有該單詞的文本和在句子中的位置。這使得模型可以跨多個(gè)處理器以分布式方式構(gòu)建,并利用大規(guī)模并行化。
Attention is a very important concept for machine translation. When translating language, it is not enough to just translate the words. The process needs to see patterns of word placement in the input and output sentences on the training content and mirror those patterns when performing machine translation on new phrases. This ability to leverage these patterns is at the core of attention. In addition to sentence word position matching, this pattern-matching concept applies to word gender determination, plurality, and other rules of grammar associated with translation.
注意力對(duì)于機(jī)器翻譯來(lái)說(shuō)是一個(gè)非常重要的概念。翻譯語(yǔ)言時(shí),僅僅翻譯單詞是不夠的。該過(guò)程需要查看訓(xùn)練內(nèi)容的輸入和輸出句子中的單詞放置模式,并在對(duì)新短語(yǔ)執(zhí)行機(jī)器翻譯時(shí)反映這些模式。這種利用這些模式的能力是注意力的核心。除了句子單詞位置匹配之外,該模式匹配概念還適用于單詞性別確定、復(fù)數(shù)以及與翻譯相關(guān)的其他語(yǔ)法規(guī)則。
Self-attention is the mechanism in a neural network where features are identified from within the data itself. In computer vision problems and convolutional neural nets (CNN), the neural network can identify features such as object edges and shapes from within unlabeled data and use these in the model. In natural language processing (NLP), self-attention finds similar patterns in the unlabeled data that represents parts of speech, grammar rules, homonyms, synonyms, and antonyms. These features extracted from within the data are then used to better train the neural network for future processing.
自注意力是神經(jīng)網(wǎng)絡(luò)中的一種機(jī)制,其中的特征是從數(shù)據(jù)本身中識(shí)別的。在計(jì)算機(jī)視覺(jué)問(wèn)題和卷積神經(jīng)網(wǎng)絡(luò) (CNN) 中,神經(jīng)網(wǎng)絡(luò)可以從未標(biāo)記的數(shù)據(jù)中識(shí)別對(duì)象邊緣和形狀等特征,并在模型中使用這些特征。在自然語(yǔ)言處理(NLP)中,自注意力在表示詞性、語(yǔ)法規(guī)則、同音異義詞、同義詞和反義詞的未標(biāo)記數(shù)據(jù)中找到相似的模式。然后,使用從數(shù)據(jù)中提取的這些特征來(lái)更好地訓(xùn)練神經(jīng)網(wǎng)絡(luò)以進(jìn)行未來(lái)的處理。
With these concepts, multiple groups have built large language models that leverage these transformers to do some incredible machine learning tasks related to NLP.
有了這些概念,多個(gè)團(tuán)隊(duì)構(gòu)建了大型語(yǔ)言模型,利用這些轉(zhuǎn)換器來(lái)完成一些與 NLP 相關(guān)的令人難以置信的機(jī)器學(xué)習(xí)任務(wù)。
The Top Three Transformer Models
排名前三的變壓器型號(hào)
GPT stands for Generative Pre-trained Transformer. GPT-3 is the third generation of this transformer model and is the one gaining momentum today with an anticipated GPT-4 on the near-term horizon. GPT-3 was developed by OpenAI using 45TB of text data, or the equivalent of almost all the content on the public web.
GPT 代表生成式預(yù)訓(xùn)練變壓器。 GPT-3 是該 Transformer 模型的第三代,目前勢(shì)頭強(qiáng)勁,預(yù)計(jì)近期將推出 GPT-4。 GPT-3 由 OpenAI 使用 45TB 文本數(shù)據(jù)(相當(dāng)于公共網(wǎng)絡(luò)上的幾乎所有內(nèi)容)開(kāi)發(fā)。
GPT-3 is a neural network that has over 175 billion machine learning parameters that allow it to effectively perform natural language processing and natural language generation (NLG). The results of the GPT model are very human-like in word usage, sentence structure, and grammar. This model is the cornerstone of the GPTChat released by OpenAI to demonstrate how this model can solve real-world problems.
GPT-3 是一種神經(jīng)網(wǎng)絡(luò),擁有超過(guò) 1750 億個(gè)機(jī)器學(xué)習(xí)參數(shù),使其能夠有效地執(zhí)行自然語(yǔ)言處理和自然語(yǔ)言生成 (NLG)。 GPT模型的結(jié)果在詞語(yǔ)使用、句子結(jié)構(gòu)和語(yǔ)法方面都非常像人類。該模型是 OpenAI 發(fā)布的 GPTChat 的基石,用于演示該模型如何解決現(xiàn)實(shí)世界的問(wèn)題。
BERT stands for Bidirectional Encoder Representations from Transformers. In this neural net, every output element is connected to every input element. This enables the bidirectional nature of the language model. In past language models, the text was processed sequentially either left-to-right or right-to-left, but only in a single direction. The BERT framework was pre-trained by Google using all the unlabeled text from Wikipedia but can be further refined with other question-and-answer data sets.
BERT 代表 Transformers 的雙向編碼器表示。在這個(gè)神經(jīng)網(wǎng)絡(luò)中,每個(gè)輸出元素都連接到每個(gè)輸入元素。這使得語(yǔ)言模型具有雙向性質(zhì)。在過(guò)去的語(yǔ)言模型中,文本是按從左到右或從右到左的順序處理的,但只是在一個(gè)方向上。 BERT 框架由 Google 使用維基百科中的所有未標(biāo)記文本進(jìn)行預(yù)訓(xùn)練,但可以使用其他問(wèn)答數(shù)據(jù)集進(jìn)一步完善。
The BERT model aims to understand the context and meaning of words within a sentence. BERT can be leveraged for tasks such as semantic role labeling of words, sentence classification, or word disambiguation based on the sentence context. BERT can support interaction in over 70 languages. Google leverages BERT as a core component of many of its products, including developer-facing services in the Google Cloud Platform.
BERT 模型旨在理解句子中單詞的上下文和含義。 BERT 可用于諸如單詞語(yǔ)義角色標(biāo)記、句子分類或基于句子上下文的單詞消歧等任務(wù)。 BERT 可以支持 70 多種語(yǔ)言的交互。谷歌利用 BERT 作為其許多產(chǎn)品的核心組件,包括谷歌云平臺(tái)中面向開(kāi)發(fā)人員的服務(wù)。
T5 stands for Text-to-Text Transfer Transformer. T5 was developed by Google in 2019. Researchers were looking for an NLP model that would leverage transfer learning and have the features of a transformer, therefore this is called a transfer transformer. This model is different from the BERT model in that it uses both an encoder and decoder so its inputs and outputs are both text strings. This is where the text-to-text portion of the model is derived.
T5 代表文本到文本傳輸轉(zhuǎn)換器。 T5 是 Google 于 2019 年開(kāi)發(fā)的。研究人員正在尋找一種能夠利用遷移學(xué)習(xí)并具有 Transformer 功能的 NLP 模型,因此稱為 Transfer Transformer。該模型與 BERT 模型的不同之處在于它同時(shí)使用編碼器和解碼器,因此其輸入和輸出都是文本字符串。這是模型的文本到文本部分的導(dǎo)出位置。
The model was trained, leveraging both unsupervised and supervised methods, on a large portion of the Common Crawl data set. T5 was designed to be transferable to other use cases by using its model as a base and then transferring it and fine-tuning it to solve domain-specific tasks.
該模型利用無(wú)監(jiān)督和監(jiān)督方法在大部分 Common Crawl 數(shù)據(jù)集上進(jìn)行了訓(xùn)練。 T5 被設(shè)計(jì)為可以通過(guò)使用其模型作為基礎(chǔ)來(lái)轉(zhuǎn)移到其他用例,然后轉(zhuǎn)移它并對(duì)其進(jìn)行微調(diào)以解決特定于領(lǐng)域的任務(wù)。
Common Use Cases 常見(jiàn)用例
Because of the transformer revolution we are experiencing, many NLP problems and use cases are being solved using these new and improved methods. This makes it possible for businesses to more effectively perform tasks that require text summarization, question answering, automatic text classification, text comparison, text and sentence prediction, natural language querying (including voice search), and message blocking based on policy violations (e.g., offensive or vulgar material, profanity).
由于我們正在經(jīng)歷變壓器革命,許多 NLP 問(wèn)題和用例正在使用這些新的和改進(jìn)的方法來(lái)解決。這使得企業(yè)能夠更有效地執(zhí)行需要文本摘要、問(wèn)答、自動(dòng)文本分類、文本比較、文本和句子預(yù)測(cè)、自然語(yǔ)言查詢(包括語(yǔ)音搜索)以及基于策略違規(guī)(例如,冒犯性或粗俗的內(nèi)容、臟話)。
As companies experience the power of these new models, many additional use cases will be identified, and businesses will find ways to derive value from integrating them into their existing and new products. We will see more products arrive on the market with intelligent features leveraging these three models.
當(dāng)公司體驗(yàn)到這些新模型的強(qiáng)大功能時(shí),將會(huì)發(fā)現(xiàn)許多其他用例,并且企業(yè)將找到通過(guò)將它們集成到現(xiàn)有產(chǎn)品和新產(chǎn)品中來(lái)獲取價(jià)值的方法。我們將看到更多具有利用這三種模型的智能功能的產(chǎn)品進(jìn)入市場(chǎng)。
Looking Forward 期待
At this stage, many of these algorithms are still in the demonstration and experimentation phase, but companies such as Microsoft and Google are actively looking at ways to incorporate them into other products to make them better, smarter, and more capable of interacting in an intelligent manner with users. The AI revolution that is upon us will possibly define the coming decade much in the same way that the introduction of the internet defined the 1990s and 2000s, so it is important to understand what these algorithms are and start to identify where on your strategic road map they should be planned.
現(xiàn)階段,其中許多算法仍處于演示和實(shí)驗(yàn)階段,但微軟和谷歌等公司正在積極尋找將它們整合到其他產(chǎn)品中的方法,使它們更好、更智能、更有能力以智能方式進(jìn)行交互。與用戶的方式。即將到來(lái)的人工智能革命可能會(huì)像互聯(lián)網(wǎng)的引入定義 20 世紀(jì) 90 年代和 2000 年代一樣定義未來(lái)十年,因此了解這些算法是什么并開(kāi)始確定戰(zhàn)略路線圖上的位置非常重要它們應(yīng)該是有計(jì)劃的。
聯(lián)系客服