Keywords

1 Introduction

The location of the eye and the extraction of its features, as iris, corners, and sclera, are an important area for computer vision and machine learning. It can be used in applications such as facial recognition, safety control, and driver behavior analysis [1].

Several works have been produced in eye location field. A great number of these researches are based on the physical properties of eyes, as in [2, 3]. The work [4] uses template matching for eye detection by founding the correlation of a template eye T with various overlapping regions of the face image.

The segmentation of sclera has been studied mainly in the biometrics systems field. The approach implemented in [5], uses Fuzzy C-means clustering, a clustering method which divides one cluster of data into two or more related clusters. Abhijit [6] uses CDM, to segment the skin around the sclera, and saturation level at HSV color space, to applied a threshold based on the intensity of its pixels.

This paper proposes a new process for human eye detection in face images and sclera segmentation using HOG descriptor (Histogram of oriented gradients), Random Forest and IFT (Image Foresting Transform).

The organization of the paper is as follows: in Sect. 2, the methodology proposed in this article is presented. In Sect. 3, we introduce results and, lastly, we present the conclusions in Sect. 4.

2 Methodology

The approach proposed in this paper is shown in Fig. 1. The method starts with the preprocessing to reduce lighting problems present in the images. The eye candidates are located based on the Color Distance Map (CDM) [7], and its features are extracted and selected by Histogram of Oriented Gradients (HOG) [9], and Best First (BF) [11], respectively. The classification is performed by Random Forest. The found eyes have its sclera segmented with Image Foresting Transform (IFT) [13]. The classifier was selected by Auto-Weka tool [16] based on the performance obtained.

Fig. 1.
figure 1

Methodology

2.1 Eye Location

In this section, we present the approach used to locate eyes in images of faces. The method starts with the preprocessing to reduce lighting problems, next the skin is segmented, followed by the detection of eye candidate and finally by the classification.

Preprocessing. To reduce or eliminate lighting problem in the images, the Color Badge [8] is applied. Color Badge is a novel tone mapping operator based on the Light Random Sprays Retinex algorithm. It converts high dynamic range images in low dynamic range images.

Skin Segmentation and Detection of Eye Candidates. The second step in the eye localization is the skin segmentation. In this stage, a mask (Fig. 2a) is created by classifying each pixel into skin or not-skin labels using the Color Distance Map (CDM) [7]. The mask is composed of two maps where natural lighting and flash lighting conditions are extracted. The maps are defined by Eqs. 1 and 2.

$$\begin{aligned} MAP1 =\left\{ \begin{array}{@{}ll@{}} 1, &{} \text {if} \ ((R> 95, G> 40, B> 20 ) \text { and} \\ &{} (\text {max}\ (R,G,B) - \text {min}\ (R,G,B)> 15) \text { and}\\ &{} (\text {}\ | R - G |> 15, R> G, R > B)) \\ 0, &{} \text {otherwise} \end{array}\right. \end{aligned}$$
(1)
$$\begin{aligned} MAP2 =\left\{ \begin{array}{@{}ll@{}} 1, &{} \text {if}\ ((R> 220, G> 210, B> 170) \text { and}\\ &{} (\text {}\ | R - G | \le 15, B < R, B > G)) \\ 0, &{} \text {otherwise} \end{array}\right. \end{aligned}$$
(2)

where R, G, and B are the red, green, and blue component values of a RGB image. Next, noise removal (Fig. 2b), hole filling with successive closing and opening operations, and the largest skin-color region extraction (Fig. 2d) are applied. The following step is to apply arithmetic operations methods to extract the eye candidates. Using Fig. 2c and the mask (Fig. 2d) obtained before, the operation described in Eq. 3 is implemented.

$$\begin{aligned} IMAGE = \left\{ \begin{array}{@{}ll@{}} 1, &{} \text {if}\ (MAP1 \text { and } MAP2) == 0 \text { and } MASK == 255\\ 0, &{} \text {otherwise} \end{array}\right. \end{aligned}$$
(3)

The remaining sets of pixels will have their mass center located. These centers will be considered the Eye candidates(Fig. 2f) and will have their features extracted with the HOG descriptor.

Fig. 2.
figure 2

The skin segmentation process (a) image, (b) segmented skin, (c) noise removal, (d) largest skin region, (e) Eq. 3 result and (f) ROI based on centers in original image. The black labels were used to preserve the identity of the individuals.

Feature Extraction and Selection. The HOG descriptor was introduced by Dalal and Triggs [9] as features for pedestrian recognition, although, it has demonstrated that it is capable of describing other objects [10].

The HOG algorithm result is a discrete group of features that describe the image. The number of cells and orientation bins defines the number of features. The configuration used on this work, defined empirically, was: window size of 16\(\,\times \,\)16 and cell size of 8\(\,\times \,\)8. Generating 144 attributes. Every eye candidate has its features extracted with HOG.

With HOG features extracted, we perform a feature selection to retain just the best features, the algorithm used for feature selection was the Best First (BF) [11]. BF algorithm searches the space of attribute subsets by exploring the most promising set with a backtracking facility [20]. The parameters used on BF were defined by the Auto-Weka.

Classification. The candidates classification were made using Random Forest (RF). RF is an ensemble learning algorithm for classification. It works by building a set of predictors trees where each tree is dependent on the values of a random vector sampled [19]. The RF implemented on WEKA [20] was used to generate the model for the classification. The dataset used was [12]. The Auto-Weka tool were used [16] to estimate the parameters.

2.2 Sclera Segmentation

Using the ROI (Region of interest) of the eye region obtained from the previous step, the sclera is segmented with IFT (Image Foresting Transform). IFT is a tool created to transform an image processing problem into a minimum-cost path forest problem using a graph derived from the image [13]. It works by creating a minimum-cost path that connects each node of the graph to a seed based on the similarity between neighbors. Building a forest that covers the whole image. The pixel of the image has 3 attributes: the cost of the path between seed and pixel, its predecessor pixel on the path, and the label of its seed. That way if a new minimum path is located, the label, cost, and predecessor are changed, [13]. With the set of seeds defined the IFT can separate foreground pixels from background pixels. To place the seeds in the sclera and the rest of the eye, the iris of the eye must be located.

The algorithm is executed in the saturation channel of the color model HSV, Fig. 3.

Fig. 3.
figure 3

Example a image in of eye in the S channel of the HSV model

Iris Location. The location of the iris is obtaining using Hough Transform to find circumferences on the gray scale image. The circle found by the Hough Transform has its center and radius refined by two additional steps, this procedure is based on [17].

Fig. 4.
figure 4

Circumference (A) and Radius (B) shift examples. (Color figure online)

With the approximate location of iris center obtained, the first step is the circumference shift. It consists in modifying the center of the found circumference to its nearest neighbors in order to obtain a periphery with the lowest intensity of their pixels. An example of this is step is show in Fig. 4a, where the circumference in yellow is the first location of the iris, the red circle is the location after the circumference shift application.

The second step is to increase and decrease the radius of the circle looking for the radius where the intensity of the pixels are smaller. An example of this is step is show in Fig. 4b, where the circumference in yellow is the first radius of the iris, the red circle is the radius after the radius shift application.

Seeds Placement. The background seeds were placed on the border of the image. The sclera seeds were obtain based on an adaptation of [14], that uses the eye geometry to find the location of the sclera. The 2 seeds corresponding to the sclera are placed on the right and left side of the iris at a distance of 1.1 and at 25 horizontal degrees of the found radius, this is elucidated in Fig. 5, where the black point indicates the 2 seeds to be used. With the background and foreground pixels defined, the regions will grow until the image is complete segmented.

Fig. 5.
figure 5

Seeds used for the sclera on the IFT

3 Results and Discussion

A number of experiments were performed to measure the reliability of all steps of the proposed method. The following sections explain the image databases used as well as the tests applied using the proposed method.

3.1 Image Databases

Two image databases were utilized. The dataset presented in [12] was used for the location of the eyes, as well as for the sclera segmentation. Images have a dimension 2048\(\,\times \,\)1536 pixels and contains faces of 45 individuals in 5 different poses (225 Images). This dataset was used in a strabismus research and is composed of individuals who have ocular deviations. The UBIRIS.v2 [15] database was used only on the sclera segmentation because it contains exclusively eye images. This dataset simulates un-constrained conditions with realistic noise factors (200 Images).

3.2 Experiments on the Eye Localization Method

The detection of the eye candidates were measured based on the percent of eyes on the dataset that were not considered a candidate. The database contains 450 eyes in its 225 images, of those, 97.7% were found by the method. Figure 6a show an example of an eye lost by the algorithm, where the blue rectangle indicates correctly found candidates.

Fig. 6.
figure 6

Example of eye detection method (Color figure online)

The eye location had an accuracy of 98.13%, a precision of 98.1% and a recall of 98.1%. Incorrect classifications are shown in Fig. 6b–c. The false-negative and false-positive were obtained mainly at candidates where the format of the eye is not totally comprehended inside the ROI. Figure 6b shows false-negative samples, while Fig. 6c shows false-positive samples, where the red rectangles correspond to candidates classified as non-eye, while the green rectangles correspond to candidates classified as eye.

3.3 Experiments on the Sclera Segmentation Method

To evaluate the method for sclera segmentation we compare the area present on the scleras segmented manually and automatically. The segmentation was measure based on [18] that defines 2 Equations to calculate the precision and recall, Eq. 4. Example of segmented scleras are shown in Fig. 7.

$$\begin{aligned} \begin{aligned} precision = NPAM/NPRS \\ recall = NPAM/NRMS \end{aligned} \end{aligned}$$
(4)

where NPAM = Number of pixels retrieved in the sclera region by the automatically segmented mask, NPRS = Number of pixels retrieved in the automatically segmented mask, and NRMS = Number of pixels in the sclera region in the manually segmented mask.

Fig. 7.
figure 7

Examples of sclera segmentation on the UBIRISv2 database. (A) Original image, (B) Manual segmented mask and (C) Automatically generated mask

The method had 86.02% of precision and 84.15% of recall on the UBIRISv2 database, and 80.39% of precision and 79.95% of recall on the database presented in [12]. Analyzing the results obtained in conjunction with the results present in [18], an improvement can be observed in relation to the best segmentation presented, which was 85.21% for precision and 80.21% for recall. Several papers also use the UBIRISv2 database to perform sclera segmentation, but the purpose of those articles consists mostly of biometric recognition and do not present measures focused on the evaluation of the quality of sclera segmentation, as [15].

4 Conclusions

In this paper, a new eye location and sclera segmentation method have been proposed. The method performs the detection of eyes, iris and realizes a segmentation of the sclera in face images.

The results for the eye location method using the [12] showed that the location method has a high accuracy and it is robust in the image of faces without a straight point for gaze reference. The sclera segmentation method was tested in both the [12] and the UBIRIS.v2 database. The segmented scleras show that the method tends to segment the most white parts of the sclera and tends to lose scleras where the white appearance is not present or is less evident.

The results shown that the variation of the illumination in the images compromised the segmentation in the two databases.

For future work, it is important to present more robust preprocessing techniques, in order to reduce the influence of lighting on segmentation. The number of seeds used for the sclera segmentation can also be evaluated, as that is no limit number for seeds.

Our research group acknowledges financial support from FAPEMA (GrantNumber: UNIVERSAL-01082/16), CNPQ (GrantNumber: 423493/2016-7) and FAPEMA/CAPES.