In this post, we will take a look at how we can use Google Cloud Vision from a Spring Boot application. With Google Cloud Vision it is possible to derive all kinds of things from images, like labels, face and text recognition, etc. As a bonus, some examples with Python are provided too.

1. Introduction

A good point to start experimenting with Cloud Vision is the Cloud Vision API Documentation. The documentation is comprehensive and the examples actually do work 😉 . In the next paragraphs, we will explore some of the image processing capabilities. We will do the following:

  • Retrieve labels of an image;
  • Detect text in an image;
  • Recognize a landmark;
  • Detect faces in an image.

You can use the provided client libraries which work just fine, but if you are already using Spring Boot, you can also use the provided starter which makes it even easier to use the API. We will use the Spring starter and at the end we have created the same examples with Python making use of the client as provided by Google.

A good reference for samples is the Spring Cloud GCP Vision API Sample.

The Java source code and the Python source code used in this post, are available at GitHub.

2. Create the GCP project

Before getting started, we need to set up some things in the Google Cloud Platform (GCP). If you don’t have an account already, you can create one for free. You will not be charged if you are only experimenting.

Create a new project in GCP.

Cloud Vision - new project

In the menu, navigate to ‘APIs & Services – Dashboard’ and click the ‘Enable APIs and Services’ button.

Cloud Vision - Enable APIs and Services button

Search for vision in the search bar and click the ‘Cloud Vision API’.

Cloud Vision - Cloud Vision API search result

Click the ‘ Enable’ button in order to enable the API.

Cloud Vision - Enable API

We are going to run the applications from our development machine. Therefore, we need to set up a service account. See a previous post in paragraph 3.4 how to do this.

3. Spring Boot and Cloud Vision

As always, our starting point is the Spring Initializr. We select JDK11 and Spring Web MVC. We will create some endpoints in order to start the experiments we are trying to do.

In the pom, we need to add the spring-cloud-gcp-starter-vision dependency and we also add the Spring Cloud GCP BOM.



Next, we add a VisionController where we inject the CloudVisionTemplate which we will use to access the Cloud Vision API. The VisionController will be extended with various endpoints for our experiments.

public class VisionController {

  private CloudVisionTemplate cloudVisionTemplate;


3.1 Retrieve Labels of an Image

Our first experiment is to retrieve the labels of an image. We will use the image of this cute little cat to do so.

Cloud Vision - cat

We add the getLabelDetection method in the VisionController. The CloudVisionTemplate has an analyzeImage method which requires the image and the information we want to retrieve from Cloud Vision as input parameters. In order to retrieve labels, we do so by means of Feature.Type.LABEL_DETECTION. From the response, we can retrieve the results with the getLabelAnnotationsList.

public String getLabelDetection() {

  Resource imageResource = this.resourceLoader.getResource("file:src/main/resources/cat.jpg");
  AnnotateImageResponse response = this.cloudVisionTemplate.analyzeImage(
                                      imageResource, Feature.Type.LABEL_DETECTION);

  return response.getLabelAnnotationsList().toString();


Run the application:

$ mvn spring-boot:run

Go to the URL http://localhost:8080/getLabelDetection to retrieve the results:

mid: "/m/01yrx" 
description: "Cat" 
score: 0.99598557 
topicality: 0.99598557 , 

We only list a small set of the results retrieved. The mid parameter is a unique machine-generated identifier corresponding to the entity’s Google Knowledge Graph entry. You can go to the Knowledge Graph and enter the mid and retrieve more information about this item. The description parameter is a short description. The score parameter gives us an indication of the confidence this item is detected (a score of 1 is very confident, a score of 0 is less confident). The topicality parameter should give us a different value than the score, but there seems to be an existing bug in the Cloud Vision API causing both values to be the same. The topicality should gives us an indication of the relevancy.

3.2 Detect Text in an Image

In our second experiment, we will try to retrieve some text from an image. We use the following image.

Cloud Vision - Text

We add the getTextDetection method to the VisionController. The only thing we change compared to the label detection is using Feature.Type.DOCUMENT_TEXT_DETECTION indicating that we are interested in the text this time. In the response, we retrieve the results with the getTextAnnotationsList method.

public String getTextDetection() {

  Resource imageResource = this.resourceLoader.getResource("file:src/main/resources/text.jpeg");
  AnnotateImageResponse response = this.cloudVisionTemplate.analyzeImage(
          imageResource, Feature.Type.DOCUMENT_TEXT_DETECTION);

  return response.getTextAnnotationsList().toString();


Go to the URL http://localhost:8080/getTextDetection to retrieve the results:

locale: "en" 
description: "MAKE\nTHIS DAY\nGREAT!\n" 
bounding_poly { 
vertices { x: 100 y: 74 } 
vertices { x: 404 y: 74 } 
vertices { x: 404 y: 272 } 
vertices { x: 100 y: 272 } } ,

Now the locale parameter indicates that the language is English and the location of the text block is indicated with the bounding_poly parameter. A full list of the returned parameters can be found here.

3.3 Recognize a Landmark

In our third experiment, we will verify whether a landmark can be recognized. We will be using the Atomium which is located in Brussels, Belgium.

Cloud Vision - Landmark

We add the getLandmarkDetection method to the VisionController. Again, the only thing we change is the Feature.Type.LANDMARK_DETECTION. In order to retrieve the results, we invoke the getLandmarkAnnotationsList method.

public String getLandmarkDetection() {

  Resource imageResource = this.resourceLoader.getResource("file:src/main/resources/landmark.jpeg");
  AnnotateImageResponse response = this.cloudVisionTemplate.analyzeImage(
          imageResource, Feature.Type.LANDMARK_DETECTION);

  return response.getLandmarkAnnotationsList().toString();

Go to the URL http://localhost:8080/getLandmarkDetection to retrieve the results:

mid: "/m/0kfhm" 
description: "Atomium" 
score: 0.65761214 
bounding_poly { 
vertices { x: 28 y: 227 } 
vertices { x: 424 y: 227 } 
vertices { x: 424 y: 595 } 
vertices { x: 28 y: 595 } } 
locations { lat_lng { latitude: 50.894919 longitude: 4.341466 } } ]

We now also retrieve the lat/lon coordinates and they are spot on 😉

3.4 Detect Faces in an Image

In our last experiment, we will recognize faces in an image. We will use the following image for this.

Cloud Vision - Faces

We add the getFaceDetection method to the VisionController. Again, the only thing we change is the Feature.Type.FACE_DETECTION. In order to retrieve the results, we invoke the getFaceAnnotationsList method.

public String getFaceDetection() throws IOException {

  Resource imageResource = this.resourceLoader.getResource("file:src/main/resources/faces.jpeg");
  Resource outputImageResource = this.resourceLoader.getResource("file:src/main/resources/output.jpg");
  AnnotateImageResponse response = this.cloudVisionTemplate.analyzeImage(
          imageResource, Feature.Type.FACE_DETECTION);

  writeWithFaces(imageResource.getFile().toPath(), outputImageResource.getFile().toPath(), response.getFaceAnnotationsList());

  return response.getFaceAnnotationsList().toString();


We will also draw rectangles around the faces based on the Vertices information that is returned and save the image.

private static void writeWithFaces(Path inputPath, Path outputPath, List faces)
        throws IOException {
  BufferedImage img =;
  annotateWithFaces(img, faces);
  ImageIO.write(img, "jpg", outputPath.toFile());

public static void annotateWithFaces(BufferedImage img, List faces) {
  for (FaceAnnotation face : faces) {
      annotateWithFace(img, face);

private static void annotateWithFace(BufferedImage img, FaceAnnotation face) {
  Graphics2D gfx = img.createGraphics();
  Polygon poly = new Polygon();
  for (Vertex vertex : face.getFdBoundingPoly().getVerticesList()) {
      poly.addPoint(vertex.getX(), vertex.getY());
  gfx.setStroke(new BasicStroke(5));
  gfx.setColor(new Color(0x00ff00));

Go to the URL http://localhost:8080/getFaceDetection in order to retrieve the results. We are not listing the output here, it is pretty large and similar to the previous results. And here is our resulting output image which is pretty cool, isn’t it?

Cloud Vision - faces recognized

4. Python Samples

We also tried to create the above examples in Python. The Python code can be found at GitHub and the examples are mainly taken from the provided samples from Google.

We will only list what we have done in order to make them run.

We have used PyCharm and have set up a virtual environment in our project with Python 3. We verified the Python version from within the PyCharm Terminal:

$ python --version
Python 3.7.1

We have installed the Google Cloud Vision packages:

$ pip install google-cloud-vision

We added the service-account.json file in our project directory and set the GOOGLE_APPLICATION_CREDENTIALS environment variable:

$ export GOOGLE_APPLICATION_CREDENTIALS=<path to the service-account.json file>/service-account.json

We also needed to install the Pillow package:

$ pip install Pillow

Now run the Python file and the detect_labels function shows us the following results:

$ python 
# Detect Labels #
Small to medium-sized cats
Domestic long-haired cat
Norwegian forest cat

5. Conclusion

In this post, we looked at how we can use the Google Cloud Vision API from a Spring Boot application. We experimented with features like label detection, text detection, landmark detection and face detection. The API is quite easy to use and the documentation is very good. Comprehensive sample code is provided for different programming languages and can be used as is.