Skip to main content
  1. Data Science Blog/

Model Garden of VertexAI

·3183 words·15 mins· loading · ·
AI Hardware & Infrastructure AI/ML Models Cloud Computing Google AI Platform Machine Learning (ML) GCP Cloud AI Models AI Model Management Cloud AI Services Machine Learning Models AI Development Cloud Computing

On This Page

Table of Contents
Share with :

All Resources to Learn Data Science

Model Garden of VertexAI:
#

Unlocking the Power of Google’s VertexAI: Exploring the World of Pre-Built Models for AI Tasks
#

Introduction:
#

Artificial Intelligence (AI) has transformed numerous industries, from healthcare and finance to e-commerce, logistic, eduction and entertainment. But the complexity of developing machine learning models often poses a challenge. As the demand for AI-powered solutions continues to rise, data scientists seek efficient ways to leverage pre-trained models or build custom models to address specific tasks. In this regard, Google’s VertexAI emerges as a robust platform that offers an extensive selection of pre-built models for a wide range of AI tasks. VertexAI platform has revolutionized the landscape by seamlessly leveraging LLM (Large Language Models) and Prompt Engineering techniques to perform complex machine learning tasks effortlessly. With VertexAI, data scientists can harness the power of state-of-the-art language models, such as LLM, to accelerate their ML development process. Additionally, the innovative concept of Prompt Engineering enables users to effectively communicate with the models, guiding them to deliver precise and accurate results. From computer vision and natural language processing to speech processing and structured tabular data analysis, Vertex AI’s repertoire includes over 100 models catering to diverse application domains. This article explores how Vertex AI, through its integration of LLM and Prompt Engineering, empowers users to effortlessly tackle intricate machine learning tasks across diverse domains, revolutionizing the AI development experience.

Foundation models:
#

Pre-trained multi-task models that can be further tuned or customized for specific tasks.

Sno.NameDetailsTask NameVision/ LanguageInput DataTypeModel Name
1PaLM 2 for TextFine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as: classification, extraction, summarization and content generation.Text Gen.LanguageTexttext-bison@001
2PaLM 2 for ChatFine-tuned to conduct natural conversation. Use this model to build and customize your own chatbot application.Text Gen.LanguageTextchat-bison@001
3Embeddings for textText embedding is an important NLP technique that converts textual data into numerical vectors that can be processed by machine learning algorithms, especially large models. These vector representations are designed to capture the semantic meaning and context of the words they represent.EmbeddingLanguageTexttextembedding-gecko@001
4Codey for Code CompletionGenerates code based on code prompts. Good for code suggestions and minimizing bugs in code.Code Gen.LanguageTextcode-gecko@001
5Codey for Code GenerationGenerates code based on natural language input. Good for writing functions, classes, unit tests, and more.Code Gen.LanguageTextcode-bison@001
6Codey for Code ChatGet code-related assistance through natural conversation. Good for questions about an API, syntax in a supported language, and more.Code Gen.LanguageTextcodechat-bison@001
7BERTNeural network-based technique for natural language processing. Use it to train your own question answering system and more.Text Gen.LanguageTextgoogle/bert-base-001
8InstructPix2PixGiven an input image and a text prompt that tells the model what to do, the instruct-pix2pix model follows the prompt to edit the image by generating a new one.Image Gen.Vision, LanguageText+Imagetimbrooks/instruct-pix2pix
9ControlNetControl image generation with text prompt and control image.Image Gen.Vision, LanguageTextlllyasviel/ControlNet
10BLIP2BLIP2 is for the image captioning and visual-question-answering tasks.Text Gen.Vision, LanguageImageSalesforce/blip2-opt-2.7b
11Stable Diffusion 1.4 (Keras)KerasCV implementation of stability.ai’s text-to-image model, Stable Diffusion 1.4.Image Gen.Vision, LanguageTextkeras/stable-diffusion-v1-4
12Embeddings for ImageGenerates vectors based on images, which can be used for downstream tasks like image classification, image search, and so on.EmbeddingVision,Imageimageembedding-001
13Label detector (PaLI zero-shot)Label Detector Zero-shot classifies images based on labels, represented as a list of text prompt strings, which are provided by the user, and calculates the confidence score of each labelâs presence in the image.ClassificationVision,Imageimagezeroshot-001
14Stable Diffusion v1-5Latent text-to-image diffusion model capable of generating photo-realistic images given a text input.Image Gen.Vision,Textrunwayml/stable-diffusion-v1-5
15Stable Diffusion InpaintingStable Diffusion Inpainting is a latent diffusion model capable of inpainting images given any text input and a mask image.Image Gen.Vision,Textrunwayml/stable-diffusion-inpainting
16BLIP image captioningA Vision-Language Pre-training (VLP) framework for image captioning.Text Gen.Vision,ImageSalesforce/blip-image-captioning-base
17BLIP VQAA Vision-Language Pre-training (VLP) framework for visual question answering (VQA).Text Gen.Vision,ImageSalesforce/blip-vqa-base
18CLIPNeural network capable of classifying images without prior training on the classes.ClassificationVision,Imageopenai/clip-vit-base-patch32
19OWL-ViTZero-shot, text-conditioned object detection model that can query an image with one or multiple text queries.Text Gen.Vision,Text+Imagegoogle/owlvit-base-patch32
20ViT GPT2Image captioning modelText Gen.Vision,Imagenlpconnect/vit-gpt2-image-captioning
21ViLT VQAVision-and-Language Transformer (ViLT) model fine-tuned on VQAv2.Text Gen.Vision,Imagedandelin/vilt-b32-finetuned-vqa
22LayoutLM for VQAFine-tuned for document understanding and information extraction tasks like form and receipt understanding.Info. ExtractionVision,Scan Docimpira/layoutlm-document-qa
23T5-FLANT5 (Text-To-Text Transfer Transformer) model with the T5-FLAN checkpoint.Text Gen.LanguageTextgoogle/t5-flan-001
24Sec-PaLM2The sec-palm model is a foundational model that has been pretrained on a variety of security-specific tasks. The model has broad security understanding across a number topics, such as threat intelligence, security operations, and malware analysis. It is ideal for analyzing, summarizing, and aggregating information across multiple security data sources, as well as generating rules and search queries from natural language input.Info. ExtractionLanguageTextgoogle/sec-palm-000
25ChirpChirp is a version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model.Speech Gen.Speechchirp-rnnt1

Fine-tunable models :
#

Models that data scientists can further fine-tune through a custom notebook or pipeline.

Sno.NameDetailsTask NameVision/ LanguageInput DataTypeModel Name
1Stable Diffusion InpaintingStable Diffusion Inpainting is a latent diffusion model capable of inpainting images given any text input and a mask image.Image Gen.Vision, LanguageTextrunwayml/stable-diffusion-inpainting
2ControlNetControl image generation with text prompt and control image.Image Gen.Vision,Text+Imagelllyasviel/ControlNet
3tfhub/EfficientNetV2EfficientNet V2 are a family of image classification models, which achieve better parameter efficiency and faster training speed than prior arts.ClassificationVision,Imagetensorflow-hub/efficientnetv2
4tfvision/vitThe Vision Transformer (ViT) is a transformer-based architecture for image classification.ClassificationVision,Imagetfvision/vit-s16
5tfvision/SpineNetSpineNet is an image object detection model generated using Neural Architecture Search.DetectionVision,Imagetfvision/spinenet49
6tfvision/YOLOYOLO algorithm is a one-stage object detection algorithm that can achieve real-time performance on a single GPU.DetectionVision,Imagetfvision/scaled-yolo
7DeepLabv3+ (with checkpoint)Semantic segmentation is the task of assigning a label to each pixel in an image, where each label corresponds to a specific class of object or scene element.SegmentationVision,Imagedeeplabv3plus-cityscapes-20230315
8ResNet (with checkpoint)Image classification model as described in the paper “Deep Residual Learning for Image Recognition”.ClassificationVision,Imageresnet50
9ResNet-RS (with checkpoint)Image classification model as described in the paper “Revisiting ResNets: Improved Training and Scaling Strategies”.ClassificationVision,ImageResNet-RS-50
10Faster R-CNN (Detectron2)Faster R-CNN is a deep convolutional network used for image object detection.DetectionVision,Imagedetectron2/faster-r-cnn
11MobileNet (TIMM)Small but powerful models optimized for mobile and embedded vision applications.ClassificationVision,Imagetimm/mobilenetv2_100
12EfficientNet (TIMM)A family of convolutional neural networks (CNNs) designed to be both accurate and efficient.ClassificationVision,Imagetimm/efficientnetv2_rw_s
13DeiTA convolution-free transformer for image classification.ClassificationVision,Imagetimm/deit_base_patch16_224
14BEiTA self-supervised learning framework for image representation learning inspired by BERT.ClassificationVision,Imagetimm/beit_base_patch16_224
15ViT (TIMM)Transformer-like architecture for image classification.ClassificationVision,Imagetimm/vit_base_patch16_224
16RetinaNet (Detectron2)RetinaNet is a one-stage object detection model that utilizes a feature pyramid network (FPN) on top of a ResNet and adds a focal loss function to address class imbalance during training.DetectionVision,Imagedetectron2/retinanet
17Mask R-CNN (Detectron2)Mask R-CNN is an instance segmentation model which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.DetectionVision,Imagedetectron2/mask-r-cnn
18ResNet (TIMM)A type of artificial neural network that is made up of residual blocks with skip connections.ClassificationVision,Imagetimm/resnet50
19ResNeSt (TIMM)An extension of the ResNet architecture that uses a new attention mechanism called split-attention.ClassificationVision,Imagetimm/resnest50d
20ConvNeXt (TIMM)A pure convolutional model that is an extension of the ResNet architecture that uses a new attention mechanism called Swin Transformer.ClassificationVision,Imagetimm/convnext_base
21CspNet (TIMM)A type of deep neural network that is an extension of the ResNet architecture that uses a new cross stage partial connection to reduce the number of parameters and computation cost without sacrificing accuracy.ClassificationVision,Imagetimm/cspdarknet53
22Inception (TIMM)Inception network is a deep neural network with an architectural design that consists of repeating components referred to as Inception modules.ClassificationVision,Imagetimm/inception_v4

Task-specific solutions:
#

Most of these pre-built models are ready to use off the shelf, and many can be customized using your own data.

Sno.NameDetailsTask NameVision/ LanguageInput DataTypeModel Name
1Entity analysisInspect text to identify and label persons, organizations, locations, events, products and more.ClassificationLanguageTextgoogle/language_v1-analyze_entities
2Content classificationUse Google’s state-of-the-art language technology to analyzes text content and returns content categories for the content. The latest version of Content Classification supports over 1,000 categories.ClassificationLanguageTextgoogle/language_v1-classify_text_v1
3Sentiment analysisSentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical score and magnitude values.ClassificationLanguageTextgoogle/language_v1-analyze_sentiment
4Entity sentiment analysisEntity Sentiment Analysis inspects the given text for known entities (proper nouns and common nouns), returns information about those entities, and identifies the prevailing emotional opinion of the entity within the text, especially to determine a writer’s attitude toward the entity as positive, negative, or neutral.ClassificationLanguageTextgoogle/language_v1-analyze_entity_sentiment
5Syntax analysisSyntactic analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens.ExtractionLanguageTextgoogle/language_v1-analyze_syntax
6Text ModerationText moderation analyzes a document and returns a list of harmful and sensitive categories that apply to the text found in the document.ClassificationLanguageTextgoogle/language_v1-moderate_text
7Text TranslationUse Google’s proven pre-trained text model to get text translations for 100+ languages.TranslationLanguageTextText Translation
8Occupancy analyticsDetect people and vehicles in a video or image, plus zone detection, dwell time, and more.DetectionVision,Image, Videogoogle/occupancy-analytics-001
9Person/vehicle detectorDetects and counts people and vehicles in video.DetectionVision,VideoPeople/vehicle detector
10Object detectorIdentify and locate objects in videoDetectionVision,VideoObject detector
11PPE detectorIdentify people and personal protective equipment (PPE).DetectionVision,ImagePPE detector
12Person blurMask or blur a person’s appearance in videoDetectionVision,VideoPeople blur
13Product recognizerIdentify products at the GTIN or UPC levelRecognitionVision,ImageProduct recognizer
14Tag recognizerExtract text in product and price tagsRecognitionVision,Scan DocTag recognizer
15Content moderation (Vision)Content Moderator (Vision) detects objectionable or unwanted content across predefined content labels (e.g., adult, violence, spoof) or custom labels provided by the user.ClassificationVision,Scan DocContent Moderation
16Face detector (Vision API)Face detector is a prebuilt Vision API model that detects multiple faces in media (images, video) and provides bounding polygons for the face and other facial “landmarks” along with their corresponding confidence values.DetectionVision,Image, VideoFace Detector
17Watermark detectorWatermark detector is a prebuilt model that detects watermarks in the input image.DetectionVision,Scan Docimagewatermarkdetector-001
18Text detector (Vision API)Text detector detects and extracts text from images. It uses optical character recognition (OCR) for an image to recognize text and convert it to machine coded text.DetectionVision, LanguageScan DocText Detector
19AutoML E2ETabular Workflow for End-to-End AutoML is the complete AutoML pipeline for classification and regression tasks.ClassificationTabularAutoML E2E
20Document AI OCR processorDocument OCR can identify and extract text from documents in over 200 printed languages and 50 handwritten languages.ExtractionDocumentpretrained-ocr-v1.2-2022-11-10
21Form ParserDocument AI Form Parser applies advanced machine learning technologies to extract key-value pairs, checkboxes, tables from documents in over 200+ languages.ExtractionDocumentpretrained-form-parser-v1.0-2020-09-23
22TabNetTabNet is a general model which performs well on a wide range of classification and regression tasks.ClassificationTabularTabNet

Task-specific LLM Prompts :
#

Customize language model outputs to meet specific needs. Prompts help to refine or enrich the outputs of the large language model selected.

Sno.NameDetailsTask NameVision/ LanguageInput DataTypeModel Name
1Object classificationClassify an object using a small number of examples (few-shot prompting).ClassificationVision,StructuredLLM Prompt
2Kindergarten Science TeacherYour name is Miles. You are an astronomer who is knowledgeable about the solar system. Respond in short sentences. Shape your response as if talking to a 10-years-old.Text Gen.LanguageFreeformLLM Prompt
3Online Return Customer ServiceA customer service chatbot that provides basic customer support and makes decisions on simple tasksText Gen.LanguageFreeformLLM Prompt
4Gluten Free AdvisorA chatbot that provides gluten free cooking recipes and diet plans.Text Gen.LanguageFreeformLLM Prompt
5Company Information GuideA informative chatbot that has a simple company background and allows customers to ask questions about those products.Text Gen.LanguageFreeformLLM Prompt
6Fictional Captain from the 1700sChat with a fictional character from the 1700s without any modern knowledge.Text Gen.LanguageFreeformLLM Prompt
7Support rep chat summarizationYou are a customer support manager and would like to quickly see what your team’s support calls are about.SummarizationLanguageFreeformLLM Prompt
8Summarize news articleNews takes too much time to read. You want a quicker way to get the summary. Let Vertex help you.SummarizationLanguageFreeformLLM Prompt
9Chat agent summarizationYou are a customer service center manager and you need to quickly see what your agents are talking about.SummarizationLanguageFreeformLLM Prompt
10Chat agent follow upYou are a customer service center manager. Sometimes your agents forget to note down follow ups. You want to automate follow up lists.Info. ExtractionLanguageFreeformLLM Prompt
11Transcript summarizationSummarize a block of text.SummarizationLanguageStructuredLLM Prompt
12Dialog summarizationSummarize a conversation.SummarizationLanguageStructuredLLM Prompt
13Hashtag tokenizationCreate and tokenize hashtags based on the provided text.Text Gen.LanguageStructuredLLM Prompt
14Title generationGenerate a title based on the provided text.ClassificationLanguageStructuredLLM Prompt
15Sentiment analysis about a personYou would like to see how reporters write about certain people. You have articles and would like to see if a certain person is written about positivly or negatively.ClassificationLanguageFreeformLLM Prompt
16Customer request classification, few-shotBased on customer your customer’s answer, you want to automate routing of your customer to the proper service queue. Use few-shot learning.ClassificationLanguageStructuredLLM Prompt
17Text classification few-shotYou are an intern at a library and your job is to classify hundreds of articles every day. You’d rather automate this and do something else.ClassificationLanguageStructuredLLM Prompt
18Article classificationYou are an intern at a library and your job is to classify hundreds of articles every day. You’d rather automate this and do something else.ClassificationLanguageFreeformLLM Prompt
19Classification headlineFew shot classification on a given topic.ClassificationLanguageStructuredLLM Prompt
20Sentiment analysisExplain the sentiment expressed in a body of text.ClassificationLanguageStructuredLLM Prompt
21Pixel Technical Specifications, one-shotGenerate technical specification from text of a Pixel phone into JSON, one-shot.Info. ExtractionLanguageStructuredLLM Prompt
22Wifi troubleshootingGiven description of the different status lights on the Google WiFi router, what should be the troubleshooting step.Text Gen.LanguageFreeformLLM Prompt
23Contract analysisYou are a partner of a law firm. Your associates are bored of reading contracts to find specific provisions when they can work on more intellectually challenging tasks.Info. ExtractionLanguageFreeformLLM Prompt
24Extractive Question AnsweringAnswer questions from given background texts.Text Gen.LanguageStructuredLLM Prompt
25Marketing generation PixelYou work in Google’s device marketing team and you need to create marketing pitch for the new Pixel 7 Pro. You have writers block and need help.Text Gen.LanguageFreeformLLM Prompt
26Ad copy generationYou are a marketer and want to create different versions of the same ad to target different audiences. You would like some suggestions.Text Gen.LanguageFreeformLLM Prompt
27Essay outlineGenerate an outline for an essay on a particular topic.Text Gen.LanguageFreeformLLM Prompt
28Correct grammarCorrect grammar in the text.Text Gen.LanguageFreeformLLM Prompt
29Ad copy from descriptionWrite an ad copy for something based on a description.Text Gen.LanguageFreeformLLM Prompt
30Write emails and lettersWrite an email or letter based on the specified content.Text Gen.LanguageFreeformLLM Prompt
31Reading comprehension testYour child is preparing for SAT verbal exam and needs more practice in reading comprehension.SummarizationLanguageFreeformLLM Prompt
32Generate memesGenerate memes based on a certain topic.Text Gen.LanguageFreeformLLM Prompt
33Interview questionsGenerate a list of interview questions targeting a specific position.Text Gen.LanguageFreeformLLM Prompt
34NamingGenerate ideas for names of a specified entity.Text Gen.LanguageFreeformLLM Prompt
35General tips and adviceGet tips and advice on general topics.Text Gen.LanguageFreeformLLM Prompt

Conclusion:
#

The realm of AI has witnessed remarkable advancements, thanks to platforms like Google’s VertexAI. By providing a vast array of pre-built models spanning computer vision, natural language processing, speech processing, and ML tasks on structured tabular data, VertexAI has simplified the development of AI solutions for a multitude of tasks. The platform’s comprehensive selection of models empowers data scientists to efficiently tackle image classification, object detection, sentiment analysis, speech recognition, and much more. Whether it’s creating voice assistants, automating customer support, analyzing visual data, or making data-driven predictions, Vertex AI’s models offer the versatility and performance required to succeed in today’s AI-driven landscape. As AI continues to transform industries, Google’s Vertex AI stands as a powerful tool that unlocks the potential of AI, enabling innovation and driving real-world impact across diverse domains.

By harnessing the power of Vertex AI and its pre-built models, businesses and developers can pave the way for intelligent applications that enhance efficiency, accuracy, and user experiences. With a commitment to ongoing research and development, Google’s Vertex AI is poised to continuously expand its model offerings, ensuring that users have access to cutting-edge AI capabilities and enabling them to push the boundaries of what is possible in the world of artificial intelligence.

Dr. Hari Thapliyaal's avatar

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Related

What is a Digital Twin?
·805 words·4 mins· loading
Industry Applications Technology Trends & Future Computer Vision (CV) Digital Twin Internet of Things (IoT) Manufacturing Technology Artificial Intelligence (AI) Graphics
What is a digital twin? # A digital twin is a virtual representation of a real-world entity or …
Frequencies in Time and Space: Understanding Nyquist Theorem & its Applications
·4103 words·20 mins· loading
Data Analysis & Visualization Computer Vision (CV) Mathematics Signal Processing Space Exploration Statistics
Applications of Nyquists theorem # Can the Nyquist-Shannon sampling theorem applies to light …
The Real Story of Nyquist, Shannon, and the Science of Sampling
·1146 words·6 mins· loading
Technology Trends & Future Interdisciplinary Topics Signal Processing Remove Statistics Technology Concepts
The Story of Nyquist, Shannon, and the Science of Sampling # In the early days of the 20th century, …
BitNet b1.58-2B4T: Revolutionary Binary Neural Network for Efficient AI
·2637 words·13 mins· loading
AI/ML Models Artificial Intelligence (AI) AI Hardware & Infrastructure Neural Network Architectures AI Model Optimization Language Models (LLMs) Business Concepts Data Privacy Remove
Archive Paper Link BitNet b1.58-2B4T: The Future of Efficient AI Processing # A History of 1 bit …
Ollama Setup and Running Models
·1753 words·9 mins· loading
AI and NLP Ollama Models Ollama Large Language Models Local Models Cost Effective AI Models
Ollama: Running Large Language Models Locally # The landscape of Artificial Intelligence (AI) and …