มาลองเล่น Gemini AI Model จาก Google DeepMind กันแบบฉ่ำๆ

7 min readDec 17, 2023

Generated from Bing DALL-E 3 with Microsoft Designer Editor

ปล่อยออกมาแล้วสำหรับ Large Language Model ตัวล่าสุดจากพี่ใหญ่ในวงการอย่าง Google DeepMind ผู้เคยเขย่าวงการ AI ด้วยการสร้างสรรค์ AlphaGo ที่ทำให้ทุกคน อึ้ง ทึ่ง เสียว กับพลังของ AI ที่สามารถเอาชนะ Ke Jie ผู้เป็น World Champion ของการเล่นโกะไปได้อย่างน่ามหัศจรรย์ ทำให้การปล่อย LLM จาก Google DeepMind เป็นสิ่งที่ทุกคนต่างรอคอยว่ามันจะแจ่มว้าวสักแค่ไหน

ข้อมูลที่ Google ขิงไว้บนเว็บ (https://deepmind.google/technologies/gemini) ว่าตัวโมเดล Gemini Ultra มีความสามารถเหนือกว่ามนุษย์ และ GPT-4 จาก OpenAI ด้วยการทดสอบ MMLU (Massive Multitask Language Understanding) ยิ่งทำให้เราคันไม้คันมือ แต่พี่กูเกิ้ลก็ยังไม่ได้ปล่อยออกมา รอต้นปีหน้า 2024 กันเลย แต่วันนี้เราสามารถใช้ Gemini Pro ผ่าน Bard และ Google AI Studio (Makersuite เดิม)ให้เรามาลองเล่นไปพรางๆก่อน วันนี้เราเลยต้องมาขอลองเล่นกันหน่อยว่าจะทำให้ประทับใจมากน้อยซักเพียงใด :)

https://deepmind.google/technologies/gemini/#capabilities

ทำความรู้จักกับกลุ่มดาวคนคู่ (Gemini)

ก่อนอื่นเลยเราคงต้องมาทำความรู้จักกับกลุ่มดาวคนคู่ของชาวราศีเมถุนกันก่อน เอ้.. ไม่ใช่ละ.. มารู้จักกับโมเดล Gemini นี้กันก่อนนน

ตัวโมเดลนี้ Google ได้โปรยไว้ว่าเป็น state-of-the-art (SOTA) โมเดลที่สร้างใหม่หมดตั้งแต่เริ่มเพื่อการเป็น Multimodal โดยเฉพาะ ซึ่งหมายความว่าไม่ใช่แค่เพียงข้อมูลที่เป็นข้อความเท่านั้น แต่รวมไปถึงรูปภาพ วีดีโอ เสียง หรือ Code ต่างๆ ที่ใช้ในการ Pretrained พร้อมกันที่จะทำให้โมเดล Gemini นี้ฉลาดไปกว่าโมเดลอื่นๆ รวมไปถึงการรองรับมากถึง 38 ภาษา (มีภาษาไทยด้วย ต้องลองของกันหน่อยแล้ว) และถ้าไปอ่าน Document ที่ทาง Google เขียนไว้ โมเดลนี้มีความสามารถของ Function Calling แล้วววว เย้!

https://ai.google.dev/available_regions#available_languages

Function calling | Vertex AI | Google Cloud

Preview Vertex AI Gemini API is a Preview offering, subject to the "Pre-GA Offerings Terms" in the General Service…

cloud.google.com

นอกจากนี้ทาง Google ได้เตรียม API และ SDK เพื่อให้ชาว Dev อย่างเรามาใช้งาน Gemini กันฟรีแบบฉ่ำๆไปเลย ซึ่งต้องบอกเลยว่า 60QPM นี้เพียงพอกับชาว Dev ที่จะเกาๆให้หายคันอย่างเราสิฮ่ะ (ที่น่าสนใจเลยคือ Pricing model ของ Pay-as-you-go ของ Google คิดเป็นต่อ 1K Chars ไม่เหมือนเจ้าอื่นๆที่คิดเป็น Token ด้วย ซึ่งถ้าเป็นภาษาไทยแล้วอาจจะไม่ได้รับผลกระทบมากเพราะ Tokenize ของ GPT Model ก็แทบจะเป็นเกือบทุกตัวอักษรอยู่แล้ว) เห็นตรงนี้แล้วก็คันกันแล้วใช่ม้าา อย่าเสียเวลาเลย ไปเกากันเลยย

https://ai.google.dev/pricing (as of Dec 2023)

Gemini เอ็กซ์กับ Langchain

สำหรับการทดลองในวันนี้เราจะทดสอบคู่ไปกับ LLM Orchastrator ชื่อดังอย่าง Langchain เนื่องจากว่า Langchain ได้ปล่อย version ใหม่ออกมาพอดิบพอดีเพื่อรองรับการใช้โมเดล Gemini ผ่าน SDK ของ Google เราเลยถือโอกาสมาทดสอบกันแบบที่เคยเล่นกับโมเดล GPT กัน

ก่อนอื่นเลยเรามาเตรียมเครื่องไม้เครื่องมือให้เรียบร้อยกันซักหน่อย เมื่อเสร็จแล้วลอง Verify กันหน่อยนะครับว่า langchain-core มี langchain-google-genai เรียบร้อยแล้วเพียงเท่านี้เครื่องคุณพี่ก็พร้อมลุยไปด้วยกันแล้วฮ่ะ

pip install langchain_experimental langchain_core
pip install google-generativeai
pip install langchain-google-genai

pip show langchain langchain-core

ขั้นตอนสำคัญอีกอย่างเลยคือการเตรียม API Key ผ่าน Google AI Studio กันก่อน (ทุกคนอย่าไปใช้ผ่าน Vertex AI นะ เคยใช้ PaLM2 เจ้าตัวโมเดล Bison ผ่าน Vertex AI มันคิดตังค์เฉย 55) และหลังจากที่เราได้ API Key แล้วก็เก็บเอาไว้เตรียมเอาไปใช้ให้หนำใจกันไปเลย

https://makersuite.google.com/app/apikey

ก่อนอื่นเลยเรามาเริ่มจากใช้ SDK เบื้องต้นกันซักหน่อย ไหนว่ารองรับภาษาไทยเลยขอคุยไทยกับน้องหน่อยซิ๊

import os
import google.generativeai as genai

os.environ["GOOGLE_API_KEY"] = <API-KEY>
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

prompt = 'สวัสดีเจ้า เธอคือใคร ชื่ออะไร'
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(prompt)
print(response.text)

และแล้วคำทักทายแรกจาก Gemini ก็ตอบมาหาเราตามนี้

สวัสดีคะ ฉันคือ Ai-Assistant ฉันเป็นผู้ช่วยอัจฉริยะที่ใช้ปัญญาประดิษฐ์เพื่อช่วยเหลือผู้คน เหมือนเป็นเพื่อนผู้ช่วยที่สามารถพูดคุยและช่วยเหลือผู้คนในงานต่างๆ ได้หลากหลาย ฉันสามารถตอบคำถามหลายๆ อย่าง ค้นหาข้อมูลจากอินเทอร์เน็ต และแม้กระทั่งแปลภาษา ฉันได้รับการฝึกฝนให้มีความเข้าใจภาษาและบริบทที่หลากหลาย เพื่อให้ฉันสามารถสื่อสารกับผู้คนได้อย่างเป็นธรรมชาติและมีประสิทธิภาพ ฉันพร้อมที่จะช่วยเหลือคุณในทุกๆ เรื่องที่ฉันทำได้

เป็นผู้หญิงซะด้วยแต่ไม่ยอมบอกชื่อนะ ถือว่าตอบไม่ครบหักคะแนน อีกอย่างต้อง ‘สวัสดีค่ะ’ สิ หักอีกคะแนน ฮ่าๆ ทีนี้เรามาลองเล่นผ่าน Langchain แบบเบสิกๆดูซิ๊

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.5)
result = llm.invoke("Google Gemini Pro ทำอะไรได้บ้าง?")

และแล้วน้องก็ตอบมาค่อนข้างที่จะกาวๆนิดๆแบบนี้

Google Gemini Pro เป็นอุปกรณ์ที่ขับเคลื่อนด้วย AI ซึ่งออกแบบมาเพื่อช่วยเหลือผู้ใช้ในการทำงานต่างๆ ได้อย่างมีประสิทธิภาพมากขึ้น โดยเฉพาะอย่างยิ่งในด้านการจัดการข้อมูลและการสื่อสาร
1. การจัดการอีเมล: Gemini Pro สามารถช่วยจัดการอีเมลได้อย่างมีประสิทธิภาพ โดยสามารถจัดเรียงอีเมลตามความสำคัญ แยกอีเมลที่สำคัญออกจากอีเมลที่ไม่สำคัญ และยังสามารถตอบกลับอีเมลโดยอัตโนมัติได้อีกด้วย
2. การจัดการปฏิทิน: Gemini Pro สามารถช่วยจัดการปฏิทินได้อย่างมีประสิทธิภาพ โดยสามารถช่วยจัดตารางนัดหมาย แจ้งเตือนการนัดหมายที่กำลังจะมาถึง และยังสามารถช่วยค้นหาเวลาที่ว่างสำหรับการนัดหมายได้อีกด้วย
3. การจัดการเอกสาร: Gemini Pro สามารถช่วยจัดการเอกสารได้อย่างมีประสิทธิภาพ โดยสามารถช่วยจัดเรียงเอกสาร จัดเก็บเอกสาร และยังสามารถช่วยค้นหาเอกสารที่ต้องการได้อย่างรวดเร็วอีกด้วย
4. การสื่อสาร: Gemini Pro สามารถช่วยในการสื่อสารได้อย่างมีประสิทธิภาพ โดยสามารถช่วยส่งข้อความ โทรออก และยังสามารถช่วยจัดการการประชุมทางวิดีโอได้อีกด้วย
5. การค้นหาข้อมูล: Gemini Pro สามารถช่วยค้นหาข้อมูลได้อย่างมีประสิทธิภาพ โดยสามารถช่วยค้นหาข้อมูลจากอินเทอร์เน็ต ค้นหาข้อมูลจากฐานข้อมูล และยังสามารถช่วยค้นหาข้อมูลจากเอกสารได้อีกด้วย
6. การแปลภาษา: Gemini Pro สามารถช่วยแปลภาษาได้อย่างมีประสิทธิภาพ โดยสามารถช่วยแปลภาษาจากภาษาหนึ่งไปยังอีกภาษาหนึ่งได้อย่างรวดเร็วและแม่นยำ
7. การวิเคราะห์ข้อมูล: Gemini Pro สามารถช่วยวิเคราะห์ข้อมูลได้อย่างมีประสิทธิภาพ โดยสามารถช่วยจัดเรียงข้อมูล แสดงข้อมูลในรูปแบบกราฟิก และยังสามารถช่วยค้นหาข้อมูลที่สำคัญได้อีกด้วย
8. การตัดสินใจ: Gemini Pro สามารถช่วยในการตัดสินใจได้อย่างมีประสิทธิภาพ โดยสามารถช่วยวิเคราะห์ข้อมูลและนำเสนอทางเลือกต่างๆ ให้กับผู้ใช้ได้

จากคำตอบเบื้องต้นของน้องทำให้รู้สึกว่าน้องอาจจะไม่ได้ถูก Trained ด้วยข้อมูลภาษาไทยที่มากพอเลย น้องเลยตอบแปลกๆเหมือนแปลจาก Google Translate ในทางกลับกันถ้าเราถามคำถามนี้เป็นภาษาอังกฤษ ด้วย Temperature ที่เท่ากันที่ 0.5 น้องก็ตอบออกมาได้ดูดีย์เลยทีเดียว

Google Gemini Pro is a next-generation natural language processing (NLP) model that has demonstrated remarkable capabilities in various language-related tasks. Here are some of the key capabilities of Google Gemini Pro:
1. Language Understanding: Gemini Pro possesses a deep understanding of human language, enabling it to comprehend the meaning and context of text effectively. It can analyze and extract information from unstructured text data, including sentiment, intent, entities, and relationships.
2. Text Generation: Gemini Pro is proficient in generating human-like text that is both coherent and informative. It can produce a wide range of text formats, including summaries, translations, creative content, and code snippets, based on the input it receives.
3. Machine Translation: Gemini Pro excels in translating text across multiple languages, preserving the meaning and context of the original content. It supports a diverse range of languages and can handle complex translations involving idioms, cultural references, and technical terminology.
4. Question Answering: Gemini Pro can answer questions posed in natural language by retrieving relevant information from a given context or knowledge base. It leverages its understanding of language and world knowledge to provide accurate and comprehensive answers.
5. Summarization: Gemini Pro is capable of summarizing long texts into concise and informative summaries. It identifies the key points and extracts the most relevant information, preserving the overall meaning and structure of the original text.
6. Paraphrasing: Gemini Pro can paraphrase text to convey the same meaning using different words or phrases. This capability is useful for generating alternative versions of text, improving clarity, or avoiding plagiarism.
7. Text Classification: Gemini Pro can classify text into predefined categories or labels based on its content. It analyzes the text’s features and assigns it to the most appropriate category, making it valuable for tasks like sentiment analysis, spam detection, and topic classification.
8. Named Entity Recognition: Gemini Pro can identify and extract named entities from text, such as people, organizations, locations, dates, and quantities. This capability enables various downstream tasks, including information extraction, knowledge graph construction, and question answering.
9. Sentiment Analysis: Gemini Pro can determine the sentiment or emotion expressed in a piece of text. It classifies text as positive, negative, or neutral based on its linguistic features and context, providing insights into public opinion, customer feedback, and brand sentiment.
10. Conversational AI: Gemini Pro is well-suited for building conversational AI systems, such as chatbots and virtual assistants. It can understand user queries, generate natural language responses, and engage in coherent and informative conversations.
These capabilities make Google Gemini Pro a powerful tool for a wide range of NLP applications, including machine translation, question answering, text summarization, sentiment analysis, and conversational AI.

หลังจากกนี้เลยอาจจะขอลองเป็นภาษาอังกฤษละกันนะครับจะได้เห็นศักยภาพที่แท้จริงของ Gemini ด้วยการเริ่มลอง Multi Chain ง่ายๆกันหน่อย

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser


model = ChatGoogleGenerativeAI(model="gemini-pro",
                             temperature=0.7)
prompt = ChatPromptTemplate.from_template(
    "Compare EV car between {car1} and {car2}"
)

output_parser = StrOutputParser()
chain = prompt | model | output_parser
chain.invoke({"car1": "Tesla Model 3", "car2": "Tesla Model Y"})

และน้องก็ตอบก็ออกมาได้โอเคเลยย แต่ข้อมูลของปีไหนก็ไม่รู้เนี่ย

Tesla Model 3
* Price: Starting at $46,990
* Range: Up to 353 miles
* Acceleration: 0–60 mph in 3.1 seconds
* Top speed: 162 mph
* Seating: 5 passengers
* Cargo space: 15 cubic feet
* Features: Autopilot, Full Self-Driving, Premium Interior, Performance Package
Tesla Model Y
* Price: Starting at $62,990
* Range: Up to 326 miles
* Acceleration: 0–60 mph in 3.5 seconds
* Top speed: 155 mph
* Seating: 5 passengers (standard), 7 passengers (optional)
* Cargo space: 68 cubic feet (with rear seats folded down)
* Features: Autopilot, Full Self-Driving, Premium Interior, Performance Package, Third Row Seating
Comparison
The Tesla Model 3 is a smaller, more affordable car than the Tesla Model Y. It has a shorter range, slower acceleration, and less cargo space. However, it is also less expensive and more efficient.
The Tesla Model Y is a larger, more versatile car than the Tesla Model 3. It has a longer range, faster acceleration, and more cargo space. It also has the option for a third row of seats. However, it is also more expensive and less efficient.
Which car is right for you?
The best car for you depends on your needs and budget. If you are looking for a small, affordable car with a long range, the Tesla Model 3 is a good option. If you are looking for a larger, more versatile car with more cargo space and the option for a third row of seats, the Tesla Model Y is a good option.

ก็ต้องถือว่าน้องรู้จัก เป็นติ่ง Tesla ได้โอเคอยู่ด้วยกันได้ ฮ่าๆ งั้นต่อไปเรามาลองทำ RAG (Retrieval Augmented Generation) ขำๆด้วย Langchain เล่นกันดูหน่อยซิ๊

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.schema.runnable import RunnableMap

model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.5)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

vectorstore = DocArrayInMemorySearch.from_texts(
    ["The Tesla Model S is a battery electric full-size car with a liftback body style built by Tesla, Inc. since 2012.",
     "The Tesla Model 3 is a battery electric compact executive car with a fastback body style built by Tesla, Inc. since 2017.",
     "The Tesla Model X is a battery electric mid-size luxury crossover SUV built by Tesla, Inc. since 2015.",
     "The Tesla Model Y is a battery electric mid-size crossover SUV built by Tesla, Inc. since 2020."],
    embedding=embeddings
)

retriever = vectorstore.as_retriever()

template = """Answer the question a a full sentence, based only on the following context:
{context}

Return you answer following this question

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
chain = RunnableMap({
    "context": lambda x: retriever.get_relevant_documents(x["question"]),
    "question": lambda x: x["question"]
}) | prompt | model | output_parser

chain.invoke({"question": "Which SUV car is first develop by Tesla?"})

แล้วบูมมม!! คำตอบที่ได้ออกมาถูกต้องตามข้อมูลที่เราให้ไปเป๊ะเวอร์

The first SUV car developed by Tesla is the Tesla Model X.

แต่ถ้าไม่ได้ให้ข้อมูลน้องไปน้องจะตอบกลับมาว่า (Gemini Pro as of Dec 2023)

Tesla has not developed an SUV car yet.

และที่จะลืมไม่ได้เลยที่จะทดลองเล่นคือให้น้องดูรูปแล้วถามซิ๊ว่าน้องเห็นรูปได้จริงๆแบบ gpt-4-vision ทำได้ป่ะหนิ ขิงไว้ดีนักต้องโดนชาเลจน์กันหน่อยย โดยเราจะใช้รูปของ Elon Musk หน้าแหกตอนกระจก Cybertruck แตกดูซิ๊น้องจะว่าไง

https://gizmodo.com/elon-musk-tesla-cybertruck-on-sale-2023-1849313166

from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-pro-vision")
image_url = "https://i.kinja-img.com/image/upload/c_fit,q_60,w_1315/3f85cb3d9cd994aee6ceba7f3c1fad2e.jpg"
message = HumanMessage(
    content=[
        {
            "type": "text",
            "text": "Why window is broken?",
        },
        {
            "type": "image_url",
            "image_url": image_url
        },
    ]
)

llm.invoke([message])

และน้องก็ตอบออกมาได้อย่างน่าอัศจรรย์ใจ

The bulletproof windows of the Tesla Cybertruck were broken during its unveiling. Elon Musk said that the window was hit by a metal ball bearing, which is harder than a bullet.

เท่าที่ลองเล่นหลายๆครั้ง คำตอบจะมีความหลากหลายในแต่ละครั้ง บางครั้งก็จะกาวๆหน่อย แต่โดยรวมตัวโมเดล Gemini น่าจะมีการนำข่าวเรื่องการเปิดตัว Cybertruck ไป Trained ด้วย ทำให้ทราบถึงเหตุการณ์นี้และน้องสามารถตอบออกมาได้ตามที่เวบข่าวหลายๆที่เขียนเอาไว้

เป็นอันพอหอมปากหอมคอสำหรับการเกาให้หายคันในวันนี้นี้ ต่อจากนี้เพื่อนๆสามารถเอาไปเล่นต่อกันให้ฉ่ำๆไปเลยนะฮ่ะเพราะมันฟรี ฮ่าๆ ต่อไปไม่ต้องเติม Credit บน OpenAI แล้ว แล้วเจอกันใหม่ครั้งหน้ากับ #แอลแอลเอ็มเดอะซีรี่ย์ 👋🏻

Reference ✨

Google AI chat models | 🦜️🔗 Langchain

Access Google AI's gemini and gemini-vision models, as well as other

python.langchain.com