• Open access
  • Published: 05 December 2018

Application research of digital media image processing technology based on wavelet transform

  • Lina Zhang 1 ,
  • Lijuan Zhang 2 &
  • Liduo Zhang 3  

EURASIP Journal on Image and Video Processing volume  2018 , Article number:  138 ( 2018 ) Cite this article

7019 Accesses

12 Citations

Metrics details

With the development of information technology, people access information more and more rely on the network, and more than 80% of the information in the network is replaced by multimedia technology represented by images. Therefore, the research on image processing technology is very important, but most of the research on image processing technology is focused on a certain aspect. The research results of unified modeling on various aspects of image processing technology are still rare. To this end, this paper uses image denoising, watermarking, encryption and decryption, and image compression in the process of image processing technology to carry out unified modeling, using wavelet transform as a method to simulate 300 photos from life. The results show that unified modeling has achieved good results in all aspects of image processing.

1 Introduction

With the increase of computer processing power, people use computer processing objects to slowly shift from characters to images. According to statistics, today’s information, especially Internet information, transmits and stores more than 80% of the information. Compared with the information of the character type, the image information is much more complicated, so it is more complicated to process the characters on the computer than the image processing. Therefore, in order to make the use of image information safer and more convenient, it is particularly important to carry out related application research on image digital media. Digital media image processing technology mainly includes denoising, encryption, compression, storage, and many other aspects.

The purpose of image denoising is to remove the noise of the natural frequency in the image to achieve the characteristics of highlighting the meaning of the image itself. Because of the image acquisition, processing, etc., they will damage the original signal of the image. Noise is an important factor that interferes with the clarity of an image. This source of noise is varied and is mainly derived from the transmission process and the quantization process. According to the relationship between noise and signal, noise can be divided into additive noise, multiplicative noise, and quantization noise. In image noise removal, commonly used methods include a mean filter method, an adaptive Wiener filter method, a median filter, and a wavelet transform method. For example, the image denoising method performed by the neighborhood averaging method used in the literature [ 1 , 2 , 3 ] is a mean filtering method which is suitable for removing particle noise in an image obtained by scanning. The neighborhood averaging method strongly suppresses the noise and also causes the ambiguity due to the averaging. The degree of ambiguity is proportional to the radius of the field. The Wiener filter adjusts the output of the filter based on the local variance of the image. The Wiener filter has the best filtering effect on images with white noise. For example, in the literature [ 4 , 5 ], this method is used for image denoising, and good denoising results are obtained. Median filtering is a commonly used nonlinear smoothing filter that is very effective in filtering out the salt and pepper noise of an image. The median filter can both remove noise and protect the edges of the image for a satisfactory recovery. In the actual operation process, the statistical characteristics of the image are not needed, which brings a lot of convenience. For example, the literature [ 6 , 7 , 8 ] is a successful case of image denoising using median filtering. Wavelet analysis is to denoise the image by using the wavelet’s layering coefficient, so the image details can be well preserved, such as the literature [ 9 , 10 ].

Image encryption is another important application area of digital image processing technology, mainly including two aspects: digital watermarking and image encryption. Digital watermarking technology directly embeds some identification information (that is, digital watermark) into digital carriers (including multimedia, documents, software, etc.), but does not affect the use value of the original carrier, and is not easily perceived or noticed by a human perception system (such as a visual or auditory system). Through the information hidden in the carrier, it is possible to confirm the content creator, the purchaser, transmit the secret information, or determine whether the carrier has been tampered with. Digital watermarking is an important research direction of information hiding technology. For example, the literature [ 11 , 12 ] is the result of studying the image digital watermarking method. In terms of digital watermarking, some researchers have tried to use wavelet method to study. For example, AH Paquet [ 13 ] and others used wavelet packet to carry out digital watermark personal authentication in 2003, and successfully introduced wavelet theory into digital watermark research, which opened up a new idea for image-based digital watermarking technology. In order to achieve digital image secrecy, in practice, the two-dimensional image is generally converted into one-dimensional data, and then encrypted by a conventional encryption algorithm. Unlike ordinary text information, images and videos are temporal, spatial, visually perceptible, and lossy compression is also possible. These features make it possible to design more efficient and secure encryption algorithms for images. For example, Z Wen [ 14 ] and others use the key value to generate real-value chaotic sequences, and then use the image scrambling method in the space to encrypt the image. The experimental results show that the technology is effective and safe. YY Wang [ 15 ] et al. proposed a new optical image encryption method using binary Fourier transform computer generated hologram (CGH) and pixel scrambling technology. In this method, the order of pixel scrambling and the encrypted image are used as keys for decrypting the original image. Zhang X Y [ 16 ] et al. combined the mathematical principle of two-dimensional cellular automata (CA) with image encryption technology and proposed a new image encryption algorithm. The image encryption algorithm is convenient to implement, has good security, large key amount, good avalanche effect, high degree of confusion, diffusion characteristics, simple operation, low computational complexity, and high speed.

In order to realize the transmission of image information quickly, image compression is also a research direction of image application technology. The information age has brought about an “information explosion” that has led to an increase in the amount of data, so that data needs to be effectively compressed regardless of transmission or storage. For example, in remote sensing technology, space probes use compression coding technology to send huge amounts of information back to the ground. Image compression is the application of data compression technology on digital images. The purpose of image compression is to reduce redundant information in image data and store and transmit data in a more efficient format. Through the unremitting efforts of researchers, image compression technology is now maturing. For example, Lewis A S [ 17 ] hierarchically encodes the transformed coefficients, and designs a new image compression method based on the local estimation noise sensitivity of the human visual system (HVS). The algorithm can be easily mapped to 2-D orthogonal wavelet transform to decompose the image into spatial and spectral local coefficients. Devore R A [ 18 ] introduced a novel theory to analyze image compression methods based on wavelet decomposition compression. Buccigrossi R W [ 19 ] developed a probabilistic model of natural images based on empirical observations of statistical data in the wavelet transform domain. The wavelet coefficient pairs of the basis functions corresponding to adjacent spatial locations, directions, and scales are found to be non-Gaussian in their edges and joint statistical properties. They proposed a Markov model that uses linear predictors to interpret these dependencies, where amplitude is combined with multiplicative and additive uncertainty and indicates that it can interpret statistical data for various images, including photographic images, graphic images, and medical images. In order to directly prove the efficacy of the model, an image encoder called Embedded Prediction Wavelet Image Coder (EPWIC) was constructed in their research. The subband coefficients use a non-adaptive arithmetic coder to encode a bit plane at a time. The encoder uses the conditional probability calculated from the model to sort the bit plane using a greedy algorithm. The algorithm considers the MSE reduction for each coded bit. The decoder uses a statistical model to predict coefficient values based on the bits it has received. Although the model is simple, the rate-distortion performance of the encoder is roughly equivalent to the best image encoder in the literature.

From the existing research results, we find that today’s digital image-based application research has achieved fruitful results. However, this kind of results mainly focus on methods, such as deep learning [ 20 , 21 ], genetic algorithm [ 22 , 23 ], fuzzy theory, etc. [ 24 , 25 ], which also includes the method of wavelet analysis. However, the biggest problem in the existing image application research is that although the existing research on digital multimedia has achieved good research results, there is also a problem. Digital multimedia processing technology is an organic whole. From denoising, compression, storage, encryption, decryption to retrieval, it should be a whole, but the current research results basically study a certain part of this whole. Therefore, although one method is superior in one of the links, it is not necessary whether this method will be suitable for other links. Therefore, in order to solve this problem, this thesis takes digital image as the research object; realizes unified modeling by three main steps of encryption, compression, and retrieval in image processing; and studies the image processing capability of multiple steps by one method.

Wavelet transform is a commonly used digital signal processing method. Since the existing digital signals are mostly composed of multi-frequency signals, there are noise signals, secondary signals, and main signals in the signal. In the image processing, there are also many research teams using wavelet transform as a processing method, introducing their own research and achieving good results. So, can we use wavelet transform as a method to build a model suitable for a variety of image processing applications?

In this paper, the wavelet transform is used as a method to establish the denoising encryption and compression model in the image processing process, and the captured image is simulated. The results show that the same wavelet transform parameters have achieved good results for different image processing applications.

2.1 Image binarization processing method

The gray value of the point of the image ranges from 0 to 255. In the image processing, in order to facilitate the further processing of the image, the frame of the image is first highlighted by the method of binarization. The so-called binarization is to map the point gray value of the image from the value space of 0–255 to the value of 0 or 255. In the process of binarization, threshold selection is a key step. The threshold used in this paper is the maximum between-class variance method (OTSU). The so-called maximum inter-class variance method means that for an image, when the segmentation threshold of the current scene and the background is t , the pre-attraction image ratio is w0, the mean value is u0, the background point is the image ratio w1, and the mean value is u1. Then the mean of the entire image is:

The objective function can be established according to formula 1:

The OTSU algorithm makes g ( t ) take the global maximum, and the corresponding t when g ( t ) is maximum is called the optimal threshold.

2.2 Wavelet transform method

Wavelet transform (WT) is a research result of the development of Fourier transform technology, and the Fourier transform is only transformed into different frequencies. The wavelet transform not only has the local characteristics of the Fourier transform but also contains the transform frequency result. The advantage of not changing with the size of the window. Therefore, compared with the Fourier transform, the wavelet transform is more in line with the time-frequency transform. The biggest characteristic of the wavelet transform is that it can better represent the local features of certain features with frequency, and the scale of the wavelet transform can be different. The low-frequency and high-frequency division of the signal makes the feature more focused. This paper mainly uses wavelet transform to analyze the image in different frequency bands to achieve the effect of frequency analysis. The method of wavelet transform can be expressed as follows:

Where ψ ( t ) is the mother wavelet, a is the scale factor, and τ is the translation factor.

Because the image signal is a two-dimensional signal, when using wavelet transform for image analysis, it is necessary to generalize the wavelet transform to two-dimensional wavelet transform. Suppose the image signal is represented by f ( x , y ), ψ ( x ,  y ) represents a two-dimensional basic wavelet, and ψ a , b , c ( x ,  y ) represents the scale and displacement of the basic wavelet, that is, ψ a , b , c ( x ,  y ) can be calculated by the following formula:

According to the above definition of continuous wavelet, the two-dimensional continuous wavelet transform can be calculated by the following formula:

Where \( \overline{\psi \left(x,y\right)} \) is the conjugate of ψ ( x ,  y ).

2.3 Digital water mark

According to different methods of use, digital watermarking technology can be divided into the following types:

Spatial domain approach: A typical watermarking algorithm in this type of algorithm embeds information into the least significant bits (LSB) of randomly selected image points, which ensures that the embedded watermark is invisible. However, due to the use of pixel bits whose images are not important, the robustness of the algorithm is poor, and the watermark information is easily destroyed by filtering, image quantization, and geometric deformation operations. Another common method is to use the statistical characteristics of the pixels to embed the information in the luminance values of the pixels.

The method of transforming the domain: first calculate the discrete cosine transform (DCT) of the image, and then superimpose the watermark on the front k coefficient with the largest amplitude in the DCT domain (excluding the DC component), usually the low-frequency component of the image. If the first k largest components of the DCT coefficients are represented as D =, i  = 1, ..., k, and the watermark is a random real sequence W =, i  = 1, ..., k obeying the Gaussian distribution, then the watermark embedding algorithm is di = di(1 + awi), where the constant a is a scale factor that controls the strength of the watermark addition. The watermark image I is then obtained by inverse transforming with a new coefficient. The decoding function calculates the discrete cosine transform of the original image I and the watermark image I * , respectively, and extracts the embedded watermark W * , and then performs correlation test to determine the presence or absence of the watermark.

Compressed domain algorithm: The compressed domain digital watermarking system based on JPEG and MPEG standards not only saves a lot of complete decoding and re-encoding process but also has great practical value in digital TV broadcasting and video on demand (VOD). Correspondingly, watermark detection and extraction can also be performed directly in the compressed domain data.

The wavelet transform used in this paper is the method of transform domain. The main process is: assume that x ( m ,  n ) is a grayscale picture of M * N , the gray level is 2 a , where M , N and a are positive integers, and the range of values of m and n is defined as follows: 1 ≤  m  ≤  M , 1 ≤  n  ≤  N . For wavelet decomposition of this image, if the number of decomposition layers is L ( L is a positive integer), then 3* L high-frequency partial maps and a low-frequency approximate partial map can be obtained. Then X k , L can be used to represent the wavelet coefficients, where L is the number of decomposition layers, and K can be represented by H , V , and D , respectively, representing the horizontal, vertical, and diagonal subgraphs. Because the sub-picture distortion of the low frequency is large, the picture embedded in the watermark is removed from the picture outside the low frequency.

In order to realize the embedded digital watermark, we must first divide X K , L ( m i ,  n j ) into a certain size, and use B ( s , t ) to represent the coefficient block of size s * t in X K , L ( m i ,  n j ). Then the average value can be expressed by the following formula:

Where ∑ B ( s ,  t ) is the cumulative sum of the magnitudes of the coefficients within the block.

The embedding of the watermark sequence w is achieved by the quantization of AVG.

The interval of quantization is represented by Δ l according to considerations of robustness and concealment. For the low-level L th layer, since the coefficient amplitude is large, a larger interval can be set. For the other layers, starting from the L -1 layer, they are successively decremented.

According to w i  = {0, 1}, AVG is quantized to the nearest singular point, even point, D ( i , j ) is used to represent the wavelet coefficients in the block, and the quantized coefficient is represented by D ( i ,  j ) ' , where i  = 1, 2,. .., s ; j  = 1,2,..., t . Suppose T  =  AVG /Δ l , TD = rem(| T |, 2), where || means rounding and rem means dividing by 2 to take the remainder.

According to whether TD and w i are the same, the calculation of the quantized wavelet coefficient D ( i ,  j ) ' can be as follows:

Using the same wavelet base, an image containing the watermark is generated by inverse wavelet transform, and the wavelet base, the wavelet decomposition layer number, the selected coefficient region, the blocking method, the quantization interval, and the parity correspondence are recorded to form a key.

The extraction of the watermark is determined by the embedded method, which is the inverse of the embedded mode. First, wavelet transform is performed on the image to be detected, and the position of the embedded watermark is determined according to the key, and the inverse operation of the scramble processing is performed on the watermark.

2.4 Evaluation method

Filter normalized mean square error.

In order to measure the effect before and after filtering, this paper chooses the normalized mean square error M description. The calculation method of M is as follows:

where N 1 and N 2 are Pixels before and after normalization.

Normalized cross-correlation function

The normalized cross-correlation function is a classic algorithm of image matching algorithm, which can be used to represent the similarity of images. The normalized cross-correlation is determined by calculating the cross-correlation metric between the reference map and the template graph, generally expressed by NC( i , j ). If the NC value is larger, it means that the similarity between the two is greater. The calculation formula for the cross-correlation metric is as follows:

where T ( m , n ) is the n th row of the template image, the m th pixel value; S ( i , j ) is the part under the template cover, and i , j is the coordinate of the lower left corner of the subgraph in the reference picture S.

Normalize the above formula NC according to the following formula:

Peak signal-to-noise ratio

Peak signal-to-noise ratio is often used as a measure of signal reconstruction quality in areas such as image compression, which is often simply defined by mean square error (MSE). Two m  ×  n monochrome images I and K , if one is another noise approximation, then their mean square error is defined as:

Then the peak signal-to-noise ratio PSNR calculation method is:

Where Max is the maximum value of the pigment representing the image.

Information entropy

For a digital signal of an image, the frequency of occurrence of each pixel is different, so it can be considered that the image digital signal is actually an uncertainty signal. For image encryption, the higher the uncertainty of the image, the more the image tends to be random, the more difficult it is to crack. The lower the rule, the more regular it is, and the more likely it is to be cracked. For a grayscale image of 256 levels, the maximum value of information entropy is 8, so the more the calculation result tends to be 8, the better.

The calculation method of information entropy is as follows:

Correlation

Correlation is a parameter describing the relationship between two vectors. This paper describes the relationship between two images before and after image encryption by correlation. Assuming p ( x ,  y ) represents the correlation between pixels before and after encryption, the calculation method of p ( x ,  y ) can be calculated by the following formula:

3 Experiment

3.1 image parameter.

The images used in this article are all from the life photos, the shooting tool is Huawei meta 10, the picture size is 1440*1920, the picture resolution is 96 dbi, the bit depth is 24, no flash mode, there are 300 pictures as simulation pictures, all of which are life photos, and no special photos.

3.2 System environment

The computer system used in this simulation is Windows 10, and the simulation software used is MATLAB 2014B.

3.3 Wavelet transform-related parameters

For unified modeling, the wavelet decomposition used in this paper uses three layers of wavelet decomposition, and Daubechies is chosen as the wavelet base. The Daubechies wavelet is a wavelet function constructed by the world-famous wavelet analyst Ingrid Daubechies. They are generally abbreviated as dbN, where N is the order of the wavelet. The support region in the wavelet function Ψ( t ) and the scale function ϕ ( t ) is 2 N-1, and the vanishing moment of Ψ( t ) is N . The dbN wavelet has good regularity, that is, the smooth error introduced by the wavelet as a sparse basis is not easy to be detected, which makes the signal reconstruction process smoother. The characteristic of the dbN wavelet is that the order of the vanishing moment increases with the increase of the order (sequence N), wherein the higher the vanishing moment, the better the smoothness, the stronger the localization ability of the frequency domain, and the better the band division effect. However, the support of the time domain is weakened, and the amount of calculation is greatly increased, and the real-time performance is deteriorated. In addition, except for N  = 1, the dbN wavelet does not have symmetry (i.e., nonlinear phase), that is, a certain phase distortion is generated when the signal is analyzed and reconstructed. N  = 3 in this article.

4 Results and discussion

4.1 results 1: image filtering using wavelet transform.

In the process of image recording, transmission, storage, and processing, it is possible to pollute the image signal. The digital signal transmitted to the image will appear as noise. These noise data will often become isolated pixels. One-to-one isolated points, although they do not destroy the overall external frame of the image, but because these isolated points tend to be high in frequency, they are portable on the image as a bright spot, which greatly affects the viewing quality of the image, so to ensure the effect of image processing, the image must be denoised. The effective method of denoising is to remove the noise of a certain frequency of the image by filtering, but the denoising must ensure that the noise data can be removed without destroying the image. Figure  1 is the result of filtering the graph using the wavelet transform method. In order to test the wavelet filtering effect, this paper adds Gaussian white noise to the original image. Comparing the white noise with the frequency analysis of the original image, it can be seen that after the noise is added, the main image frequency segment of the original image is disturbed by the noise frequency, but after filtering using the wavelet transform, the frequency band of the main frame of the original image appears again. However, the filtered image does not change significantly compared to the original image. The normalized mean square error before and after filtering is calculated, and the M value before and after filtering is 0.0071. The wavelet transform is well protected to protect the image details, and the noise data is better removed (the white noise is 20%).

figure 1

Image denoising results comparison. (The first row from left to right are the original image, plus the noise map and the filtered map. The second row from left to right are the frequency distribution of the original image, the frequency distribution of the noise plus the filtered Frequency distribution)

4.2 Results 2: digital watermark encryption based on wavelet transform

As shown in Fig.  2 , the watermark encryption process based on wavelet transform can be seen from the figure. Watermarking the image by wavelet transform does not affect the structure of the original image. The noise is 40% of the salt and pepper noise. For the original image and the noise map, the wavelet transform method can extract the watermark well.

figure 2

Comparison of digital watermark before and after. (The first row from left to right are the original image, plus noise and watermark, and the noise is removed; the second row are the watermark original, the watermark extracted from the noise plus watermark, and the watermark extracted after denoising)

According to the method described in this paper, the image correlation coefficient and peak-to-noise ratio of the original image after watermarking are calculated. The correlation coefficient between the original image and the watermark is 0.9871 (the first column and the third column in the first row in the figure). The watermark does not destroy the structure of the original image. The signal-to-noise ratio of the original picture is 33.5 dB, and the signal-to-noise ratio of the water-jet printing is 31.58SdB, which proves that the wavelet transform can achieve watermark hiding well. From the second row of watermarking results, the watermark extracted from the image after noise and denoising, and the original watermark correlation coefficient are (0.9745, 0.9652). This shows that the watermark signal can be well extracted after being hidden by the wavelet transform.

4.3 Results 3: image encryption based on wavelet transform

In image transmission, the most common way to protect image content is to encrypt the image. Figure  3 shows the process of encrypting and decrypting an image using wavelet transform. It can be seen from the figure that after the image is encrypted, there is no correlation with the original image at all, but the decrypted image of the encrypted image reproduces the original image.

figure 3

Image encryption and decryption process diagram comparison. (The left is the original image, the middle is the encrypted image, the right is the decryption map)

The information entropy of Fig.  3 is calculated. The results show that the information entropy of the original image is 3.05, the information entropy of the decrypted graph is 3.07, and the information entropy of the encrypted graph is 7.88. It can be seen from the results of information entropy that before and after encryption. The image information entropy is basically unchanged, but the information entropy of the encrypted image becomes 7.88, indicating that the encrypted image is close to a random signal and has good confidentiality.

4.4 Result 4: image compression

Image data can be compressed because of the redundancy in the data. The redundancy of image data mainly manifests as spatial redundancy caused by correlation between adjacent pixels in an image; time redundancy due to correlation between different frames in an image sequence; spectral redundancy due to correlation of different color planes or spectral bands. The purpose of data compression is to reduce the number of bits required to represent the data by removing these data redundancy. Since the amount of image data is huge, it is very difficult to store, transfer, and process, so the compression of image data is very important. Figure  4 shows the result of two compressions of the original image. It can be seen from the figure that although the image is compressed, the main frame of the image does not change, but the image sharpness is significantly reduced. The Table  1 shows the compressed image properties.

figure 4

Image comparison before and after compression. (left is the original image, the middle is the first compression, the right is the second compression)

It can be seen from the results in Table 1 that after multiple compressions, the size of the image is significantly reduced and the image is getting smaller and smaller. The original image needs 2,764,800 bytes, which is reduced to 703,009 after a compression, which is reduced by 74.5%. After the second compression, only 182,161 is left, which is 74.1% lower. It can be seen that the wavelet transform can achieve image compression well.

5 Conclusion

With the development of informatization, today’s era is an era full of information. As the visual basis of human perception of the world, image is an important means for humans to obtain information, express information, and transmit information. Digital image processing, that is, processing images with a computer, has a long history of development. Digital image processing technology originated in the 1920s, when a photo was transmitted from London, England to New York, via a submarine cable, using digital compression technology. First of all, digital image processing technology can help people understand the world more objectively and accurately. The human visual system can help humans get more than 3/4 of the information from the outside world, and images and graphics are the carriers of all visual information, despite the identification of the human eye. It is very powerful and can recognize thousands of colors, but in many cases, the image is blurred or even invisible to the human eye. Image enhancement technology can make the blurred or even invisible image clear and bright. There are also some relevant research results on this aspect of research, which proves that relevant research is feasible [ 26 , 27 ].

It is precisely because of the importance of image processing technology that many researchers have begun research on image processing technology and achieved fruitful results. However, with the deepening of image processing technology research, today’s research has a tendency to develop in depth, and this depth is an in-depth aspect of image processing technology. However, the application of image processing technology is a system engineering. In addition to the deep requirements, there are also systematic requirements. Therefore, if the unified model research on multiple aspects of image application will undoubtedly promote the application of image processing technology. Wavelet transform has been successfully applied in many fields of image processing technology. Therefore, this paper uses wavelet transform as a method to establish a unified model based on wavelet transform. Simulation research is carried out by filtering, watermark hiding, encryption and decryption, and image compression of image processing technology. The results show that the model has achieved good results.

Abbreviations

Cellular automata

Computer generated hologram

Discrete cosine transform

Embedded Prediction Wavelet Image Coder

Human visual system

Least significant bits

Video on demand

Wavelet transform

H.W. Zhang, The research and implementation of image Denoising method based on Matlab[J]. Journal of Daqing Normal University 36 (3), 1-4 (2016)

J.H. Hou, J.W. Tian, J. Liu, Analysis of the errors in locally adaptive wavelet domain wiener filter and image Denoising[J]. Acta Photonica Sinica 36 (1), 188–191 (2007)

Google Scholar  

M. Lebrun, An analysis and implementation of the BM3D image Denoising method[J]. Image Processing on Line 2 (25), 175–213 (2012)

Article   Google Scholar  

A. Fathi, A.R. Naghsh-Nilchi, Efficient image Denoising method based on a new adaptive wavelet packet thresholding function[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 21 (9), 3981 (2012)

MATH   Google Scholar  

X. Zhang, X. Feng, W. Wang, et al., Gradient-based wiener filter for image denoising [J]. Comput. Electr. Eng. 39 (3), 934–944 (2013)

T. Chen, K.K. Ma, L.H. Chen, Tri-state median filter for image denoising.[J]. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 8 (12), 1834 (1999)

S.M.M. Rahman, M.K. Hasan, Wavelet-domain iterative center weighted median filter for image denoising[J]. Signal Process. 83 (5), 1001–1012 (2003)

Article   MATH   Google Scholar  

H.L. Eng, K.K. Ma, Noise adaptive soft-switching median filter for image denoising[C]// IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. IEEE 4 , 2175–2178 (2000)

S.G. Chang, B. Yu, M. Vetterli, Adaptive wavelet thresholding for image denoising and compression[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 9 (9), 1532 (2000)

M. Kivanc Mihcak, I. Kozintsev, K. Ramchandran, et al., Low-complexity image Denoising based on statistical modeling of wavelet Coecients[J]. IEEE Signal Processing Letters 6 (12), 300–303 (1999)

J.H. Wu, F.Z. Lin, Image authentication based on digital watermarking[J]. Chinese Journal of Computers 9 , 1153–1161 (2004)

MathSciNet   Google Scholar  

A. Wakatani, Digital watermarking for ROI medical images by using compressed signature image[C]// Hawaii international conference on system sciences. IEEE (2002), pp. 2043–2048

A.H. Paquet, R.K. Ward, I. Pitas, Wavelet packets-based digital watermarking for image verification and authentication [J]. Signal Process. 83 (10), 2117–2132 (2003)

Z. Wen, L.I. Taoshen, Z. Zhang, An image encryption technology based on chaotic sequences[J]. Comput. Eng. 31 (10), 130–132 (2005)

Y.Y. Wang, Y.R. Wang, Y. Wang, et al., Optical image encryption based on binary Fourier transform computer-generated hologram and pixel scrambling technology[J]. Optics & Lasers in Engineering 45 (7), 761–765 (2007)

X.Y. Zhang, C. Wang, S.M. Li, et al., Image encryption technology on two-dimensional cellular automata[J]. Journal of Optoelectronics Laser 19 (2), 242–245 (2008)

A.S. Lewis, G. Knowles, Image compression using the 2-D wavelet transform[J]. IEEE Trans. Image Process. 1 (2), 244–250 (2002)

R.A. Devore, B. Jawerth, B.J. Lucier, Image compression through wavelet transform coding[J]. IEEE Trans.inf.theory 38 (2), 719–746 (1992)

Article   MathSciNet   MATH   Google Scholar  

R.W. Buccigrossi, E.P. Simoncelli, Image compression via joint statistical characterization in the wavelet domain[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 8 (12), 1688–1701 (1999)

A.A. Cruzroa, J.E. Arevalo Ovalle, A. Madabhushi, et al., A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med Image Comput Comput Assist Interv. 16 , 403–410 (2013)

S.P. Mohanty, D.P. Hughes, M. Salathé, Using deep learning for image-based plant disease detection[J]. Front. Plant Sci. 7 , 1419 (2016)

B. Sahiner, H. Chan, D. Wei, et al., Image feature selection by a genetic algorithm: application to classification of mass and normal breast tissue[J]. Med. Phys. 23 (10), 1671 (1996)

B. Bhanu, S. Lee, J. Ming, Adaptive image segmentation using a genetic algorithm[J]. IEEE Transactions on Systems Man & Cybernetics 25 (12), 1543–1567 (2002)

Y. Egusa, H. Akahori, A. Morimura, et al., An application of fuzzy set theory for an electronic video camera image stabilizer[J]. IEEE Trans. Fuzzy Syst. 3 (3), 351–356 (1995)

K. Hasikin, N.A.M. Isa, Enhancement of the low contrast image using fuzzy set theory[C]// Uksim, international conference on computer modelling and simulation. IEEE (2012), pp. 371–376

P. Yang, Q. Li, Wavelet transform-based feature extraction for ultrasonic flaw signal classification. Neural Comput. & Applic. 24 (3–4), 817–826 (2014)

R.K. Lama, M.-R. Choi, G.-R. Kwon, Image interpolation for high-resolution display based on the complex dual-tree wavelet transform and hidden Markov model. Multimedia Tools Appl. 75 (23), 16487–16498 (2016)

Download references

Acknowledgements

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

This work was supported by

Shandong social science planning research project in 2018

Topic: The Application of Shandong Folk Culture in Animation in The View of Digital Media (No. 18CCYJ14).

Shandong education science 12th five-year plan 2015

Topic: Innovative Research on Stop-motion Animation in The Digital Media Age (No. YB15068).

Shandong education science 13th five-year plan 2016–2017

Approval of “Ports and Arts Education Special Fund”: BCA2017017.

Topic: Reform of Teaching Methods of Hand Drawn Presentation Techniques (No. BCA2017017).

National Research Youth Project of state ethnic affairs commission in 2018

Topic: Protection and Development of Villages with Ethnic Characteristics Under the Background of Rural Revitalization Strategy (No. 2018-GMC-020).

Availability of data and materials

Authors can provide the data.

About the authors

Zaozhuang University, No. 1 Beian Road., Shizhong District, Zaozhuang City, Shandong, P.R. China.

Lina, Zhang was born in Jining, Shandong, P.R. China, in 1983. She received a Master degree from Bohai University, P.R. China. Now she works in School of Media, Zaozhuang University, P.R. China. Her research interests include animation and Digital media art.

Lijuan, Zhang was born in Jining, Shandong, P.R. China, in 1983. She received a Master degree from Jingdezhen Ceramic Institute, P.R. China. Now she works in School of Fine Arts and Design, Zaozhuang University, P.R. China. Her research interests include Interior design and Digital media art.

Liduo, Zhang was born in Zaozhuang, Shandong, P.R. China, in 1982. He received a Master degree from Monash University, Australia. Now he works in School of economics and management, Zaozhuang University. His research interests include Internet finance and digital media.

Author information

Authors and affiliations.

School of Media, Zaozhuang University, Zaozhuang, Shandong, China

School of Fine Arts and Design, Zaozhuang University, Zaozhuang, Shandong, China

Lijuan Zhang

School of Economics and Management, Zaozhuang University, Zaozhuang, Shandong, China

Liduo Zhang

You can also search for this author in PubMed   Google Scholar

Contributions

All authors take part in the discussion of the work described in this paper. The author LZ wrote the first version of the paper. The author LZ and LZ did part experiments of the paper, LZ revised the paper in different version of the paper, respectively. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lijuan Zhang .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Zhang, L., Zhang, L. & Zhang, L. Application research of digital media image processing technology based on wavelet transform. J Image Video Proc. 2018 , 138 (2018). https://doi.org/10.1186/s13640-018-0383-6

Download citation

Received : 28 September 2018

Accepted : 23 November 2018

Published : 05 December 2018

DOI : https://doi.org/10.1186/s13640-018-0383-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Image processing
  • Digital watermark
  • Image denoising
  • Image encryption
  • Image compression

research paper on applications of digital image processing

Applications of Digital Image Processing XLIII

cover

Volume Details

Table of contents.

  • Front Matter: Volume 11510
  • Opening Remarks
  • Immersive Imaging
  • Energy Efficient Video Compression and Quality Measurement I
  • Energy Efficient Video Compression and Quality Measurement II
  • Artificial Intelligence in Imaging I
  • Artificial Intelligence in Imaging II
  • Compression I
  • Human Visual System and Perceptual Imaging I
  • Human Visual System and Perceptual Imaging II
  • New Standards in Image and Video Applications
  • Image and Video Processing and Analysis I
  • Medical Imaging I
  • Medical Imaging II
  • Image and Video Processing and Analysis II
  • Compression II
  • Image and Video Processing and Analysis III
  • Poster Session
  •  multichannel sounding with differently polarized laser beams of histological sections of the brain, myocardium, adrenal glands, liver and polycrystalline blood films of the dead and multichannel polarization filtering of a series of microscopic images with algorithmic determination of coordinate distributions (maps):
  •  statistical differentiation of MMI LB and MMI CB cards with the optically anisotropic component of histological sections of the brain, myocardium, adrenal glands, liver and polycrystalline blood films of the dead due to IHD (control group), alcohol poisoning (study group 1) and carbon monoxide (study group 2);
  •  determination of operational characteristics (sensitivity, specificity and balanced accuracy) of the strength of the multidimensional Mueller-matrix microscopy method for histological sections of the brain, myocardium, adrenal glands, liver and polycrystalline blood films of the dead of all groups.

research paper on applications of digital image processing

  • Reference Manager
  • Simple TEXT file

People also looked at

Specialty grand challenge article, grand challenges in image processing.

www.frontiersin.org

  • Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et Systèmes, Gif-sur-Yvette, France

Introduction

The field of image processing has been the subject of intensive research and development activities for several decades. This broad area encompasses topics such as image/video processing, image/video analysis, image/video communications, image/video sensing, modeling and representation, computational imaging, electronic imaging, information forensics and security, 3D imaging, medical imaging, and machine learning applied to these respective topics. Hereafter, we will consider both image and video content (i.e. sequence of images), and more generally all forms of visual information.

Rapid technological advances, especially in terms of computing power and network transmission bandwidth, have resulted in many remarkable and successful applications. Nowadays, images are ubiquitous in our daily life. Entertainment is one class of applications that has greatly benefited, including digital TV (e.g., broadcast, cable, and satellite TV), Internet video streaming, digital cinema, and video games. Beyond entertainment, imaging technologies are central in many other applications, including digital photography, video conferencing, video monitoring and surveillance, satellite imaging, but also in more distant domains such as healthcare and medicine, distance learning, digital archiving, cultural heritage or the automotive industry.

In this paper, we highlight a few research grand challenges for future imaging and video systems, in order to achieve breakthroughs to meet the growing expectations of end users. Given the vastness of the field, this list is by no means exhaustive.

A Brief Historical Perspective

We first briefly discuss a few key milestones in the field of image processing. Key inventions in the development of photography and motion pictures can be traced to the 19th century. The earliest surviving photograph of a real-world scene was made by Nicéphore Niépce in 1827 ( Hirsch, 1999 ). The Lumière brothers made the first cinematographic film in 1895, with a public screening the same year ( Lumiere, 1996 ). After decades of remarkable developments, the second half of the 20th century saw the emergence of new technologies launching the digital revolution. While the first prototype digital camera using a Charge-Coupled Device (CCD) was demonstrated in 1975, the first commercial consumer digital cameras started appearing in the early 1990s. These digital cameras quickly surpassed cameras using films and the digital revolution in the field of imaging was underway. As a key consequence, the digital process enabled computational imaging, in other words the use of sophisticated processing algorithms in order to produce high quality images.

In 1992, the Joint Photographic Experts Group (JPEG) released the JPEG standard for still image coding ( Wallace, 1992 ). In parallel, in 1993, the Moving Picture Experts Group (MPEG) published its first standard for coding of moving pictures and associated audio, MPEG-1 ( Le Gall, 1991 ), and a few years later MPEG-2 ( Haskell et al., 1996 ). By guaranteeing interoperability, these standards have been essential in many successful applications and services, for both the consumer and business markets. In particular, it is remarkable that, almost 30 years later, JPEG remains the dominant format for still images and photographs.

In the late 2000s and early 2010s, we could observe a paradigm shift with the appearance of smartphones integrating a camera. Thanks to advances in computational photography, these new smartphones soon became capable of rivaling the quality of consumer digital cameras at the time. Moreover, these smartphones were also capable of acquiring video sequences. Almost concurrently, another key evolution was the development of high bandwidth networks. In particular, the launch of 4G wireless services circa 2010 enabled users to quickly and efficiently exchange multimedia content. From this point, most of us are carrying a camera, anywhere and anytime, allowing to capture images and videos at will and to seamlessly exchange them with our contacts.

As a direct consequence of the above developments, we are currently observing a boom in the usage of multimedia content. It is estimated that today 3.2 billion images are shared each day on social media platforms, and 300 h of video are uploaded every minute on YouTube 1 . In a 2019 report, Cisco estimated that video content represented 75% of all Internet traffic in 2017, and this share is forecasted to grow to 82% in 2022 ( Cisco, 2019 ). While Internet video streaming and Over-The-Top (OTT) media services account for a significant bulk of this traffic, other applications are also expected to see significant increases, including video surveillance and Virtual Reality (VR)/Augmented Reality (AR).

Hyper-Realistic and Immersive Imaging

A major direction and key driver to research and development activities over the years has been the objective to deliver an ever-improving image quality and user experience.

For instance, in the realm of video, we have observed constantly increasing spatial and temporal resolutions, with the emergence nowadays of Ultra High Definition (UHD). Another aim has been to provide a sense of the depth in the scene. For this purpose, various 3D video representations have been explored, including stereoscopic 3D and multi-view ( Dufaux et al., 2013 ).

In this context, the ultimate goal is to be able to faithfully represent the physical world and to deliver an immersive and perceptually hyperrealist experience. For this purpose, we discuss hereafter some emerging innovations. These developments are also very relevant in VR and AR applications ( Slater, 2014 ). Finally, while this paper is only focusing on the visual information processing aspects, it is obvious that emerging display technologies ( Masia et al., 2013 ) and audio also plays key roles in many application scenarios.

Light Fields, Point Clouds, Volumetric Imaging

In order to wholly represent a scene, the light information coming from all the directions has to be represented. For this purpose, the 7D plenoptic function is a key concept ( Adelson and Bergen, 1991 ), although it is unmanageable in practice.

By introducing additional constraints, the light field representation collects radiance from rays in all directions. Therefore, it contains a much richer information, when compared to traditional 2D imaging that captures a 2D projection of the light in the scene integrating the angular domain. For instance, this allows post-capture processing such as refocusing and changing the viewpoint. However, it also entails several technical challenges, in terms of acquisition and calibration, as well as computational image processing steps including depth estimation, super-resolution, compression and image synthesis ( Ihrke et al., 2016 ; Wu et al., 2017 ). The resolution trade-off between spatial and angular resolutions is a fundamental issue. With a significant fraction of the earlier work focusing on static light fields, it is also expected that dynamic light field videos will stimulate more interest in the future. In particular, dense multi-camera arrays are becoming more tractable. Finally, the development of efficient light field compression and streaming techniques is a key enabler in many applications ( Conti et al., 2020 ).

Another promising direction is to consider a point cloud representation. A point cloud is a set of points in the 3D space represented by their spatial coordinates and additional attributes, including color pixel values, normals, or reflectance. They are often very large, easily ranging in the millions of points, and are typically sparse. One major distinguishing feature of point clouds is that, unlike images, they do not have a regular structure, calling for new algorithms. To remove the noise often present in acquired data, while preserving the intrinsic characteristics, effective 3D point cloud filtering approaches are needed ( Han et al., 2017 ). It is also important to develop efficient techniques for Point Cloud Compression (PCC). For this purpose, MPEG is developing two standards: Geometry-based PCC (G-PCC) and Video-based PCC (V-PCC) ( Graziosi et al., 2020 ). G-PCC considers the point cloud in its native form and compress it using 3D data structures such as octrees. Conversely, V-PCC projects the point cloud onto 2D planes and then applies existing video coding schemes. More recently, deep learning-based approaches for PCC have been shown to be effective ( Guarda et al., 2020 ). Another challenge is to develop generic and robust solutions able to handle potentially widely varying characteristics of point clouds, e.g. in terms of size and non-uniform density. Efficient solutions for dynamic point clouds are also needed. Finally, while many techniques focus on the geometric information or the attributes independently, it is paramount to process them jointly.

High Dynamic Range and Wide Color Gamut

The human visual system is able to perceive, using various adaptation mechanisms, a broad range of luminous intensities, from very bright to very dark, as experienced every day in the real world. Nonetheless, current imaging technologies are still limited in terms of capturing or rendering such a wide range of conditions. High Dynamic Range (HDR) imaging aims at addressing this issue. Wide Color Gamut (WCG) is also often associated with HDR in order to provide a wider colorimetry.

HDR has reached some levels of maturity in the context of photography. However, extending HDR to video sequences raises scientific challenges in order to provide high quality and cost-effective solutions, impacting the whole imaging processing pipeline, including content acquisition, tone reproduction, color management, coding, and display ( Dufaux et al., 2016 ; Chalmers and Debattista, 2017 ). Backward compatibility with legacy content and traditional systems is another issue. Despite recent progress, the potential of HDR has not been fully exploited yet.

Coding and Transmission

Three decades of standardization activities have continuously improved the hybrid video coding scheme based on the principles of transform coding and predictive coding. The Versatile Video Coding (VVC) standard has been finalized in 2020 ( Bross et al., 2021 ), achieving approximately 50% bit rate reduction for the same subjective quality when compared to its predecessor, High Efficiency Video Coding (HEVC). While substantially outperforming VVC in the short term may be difficult, one encouraging direction is to rely on improved perceptual models to further optimize compression in terms of visual quality. Another direction, which has already shown promising results, is to apply deep learning-based approaches ( Ding et al., 2021 ). Here, one key issue is the ability to generalize these deep models to a wide diversity of video content. The second key issue is the implementation complexity, both in terms of computation and memory requirements, which is a significant obstacle to a widespread deployment. Besides, the emergence of new video formats targeting immersive communications is also calling for new coding schemes ( Wien et al., 2019 ).

Considering that in many application scenarios, videos are processed by intelligent analytic algorithms rather than viewed by users, another interesting track is the development of video coding for machines ( Duan et al., 2020 ). In this context, the compression is optimized taking into account the performance of video analysis tasks.

The push toward hyper-realistic and immersive visual communications entails most often an increasing raw data rate. Despite improved compression schemes, more transmission bandwidth is needed. Moreover, some emerging applications, such as VR/AR, autonomous driving, and Industry 4.0, bring a strong requirement for low latency transmission, with implications on both the imaging processing pipeline and the transmission channel. In this context, the emergence of 5G wireless networks will positively contribute to the deployment of new multimedia applications, and the development of future wireless communication technologies points toward promising advances ( Da Costa and Yang, 2020 ).

Human Perception and Visual Quality Assessment

It is important to develop effective models of human perception. On the one hand, it can contribute to the development of perceptually inspired algorithms. On the other hand, perceptual quality assessment methods are needed in order to optimize and validate new imaging solutions.

The notion of Quality of Experience (QoE) relates to the degree of delight or annoyance of the user of an application or service ( Le Callet et al., 2012 ). QoE is strongly linked to subjective and objective quality assessment methods. Many years of research have resulted in the successful development of perceptual visual quality metrics based on models of human perception ( Lin and Kuo, 2011 ; Bovik, 2013 ). More recently, deep learning-based approaches have also been successfully applied to this problem ( Bosse et al., 2017 ). While these perceptual quality metrics have achieved good performances, several significant challenges remain. First, when applied to video sequences, most current perceptual metrics are applied on individual images, neglecting temporal modeling. Second, whereas color is a key attribute, there are currently no widely accepted perceptual quality metrics explicitly considering color. Finally, new modalities, such as 360° videos, light fields, point clouds, and HDR, require new approaches.

Another closely related topic is image esthetic assessment ( Deng et al., 2017 ). The esthetic quality of an image is affected by numerous factors, such as lighting, color, contrast, and composition. It is useful in different application scenarios such as image retrieval and ranking, recommendation, and photos enhancement. While earlier attempts have used handcrafted features, most recent techniques to predict esthetic quality are data driven and based on deep learning approaches, leveraging the availability of large annotated datasets for training ( Murray et al., 2012 ). One key challenge is the inherently subjective nature of esthetics assessment, resulting in ambiguity in the ground-truth labels. Another important issue is to explain the behavior of deep esthetic prediction models.

Analysis, Interpretation and Understanding

Another major research direction has been the objective to efficiently analyze, interpret and understand visual data. This goal is challenging, due to the high diversity and complexity of visual data. This has led to many research activities, involving both low-level and high-level analysis, addressing topics such as image classification and segmentation, optical flow, image indexing and retrieval, object detection and tracking, and scene interpretation and understanding. Hereafter, we discuss some trends and challenges.

Keypoints Detection and Local Descriptors

Local imaging matching has been the cornerstone of many analysis tasks. It involves the detection of keypoints, i.e. salient visual points that can be robustly and repeatedly detected, and descriptors, i.e. a compact signature locally describing the visual features at each keypoint. It allows to subsequently compute pairwise matching between the features to reveal local correspondences. In this context, several frameworks have been proposed, including Scale Invariant Feature Transform (SIFT) ( Lowe, 2004 ) and Speeded Up Robust Features (SURF) ( Bay et al., 2008 ), and later binary variants including Binary Robust Independent Elementary Feature (BRIEF) ( Calonder et al., 2010 ), Oriented FAST and Rotated BRIEF (ORB) ( Rublee et al., 2011 ) and Binary Robust Invariant Scalable Keypoints (BRISK) ( Leutenegger et al., 2011 ). Although these approaches exhibit scale and rotation invariance, they are less suited to deal with large 3D distortions such as perspective deformations, out-of-plane rotations, and significant viewpoint changes. Besides, they tend to fail under significantly varying and challenging illumination conditions.

These traditional approaches based on handcrafted features have been successfully applied to problems such as image and video retrieval, object detection, visual Simultaneous Localization And Mapping (SLAM), and visual odometry. Besides, the emergence of new imaging modalities as introduced above can also be beneficial for image analysis tasks, including light fields ( Galdi et al., 2019 ), point clouds ( Guo et al., 2020 ), and HDR ( Rana et al., 2018 ). However, when applied to high-dimensional visual data for semantic analysis and understanding, these approaches based on handcrafted features have been supplanted in recent years by approaches based on deep learning.

Deep Learning-Based Methods

Data-driven deep learning-based approaches ( LeCun et al., 2015 ), and in particular the Convolutional Neural Network (CNN) architecture, represent nowadays the state-of-the-art in terms of performances for complex pattern recognition tasks in scene analysis and understanding. By combining multiple processing layers, deep models are able to learn data representations with different levels of abstraction.

Supervised learning is the most common form of deep learning. It requires a large and fully labeled training dataset, a typically time-consuming and expensive process needed whenever tackling a new application scenario. Moreover, in some specialized domains, e.g. medical data, it can be very difficult to obtain annotations. To alleviate this major burden, methods such as transfer learning and weakly supervised learning have been proposed.

In another direction, deep models have been shown to be vulnerable to adversarial attacks ( Akhtar and Mian, 2018 ). Those attacks consist in introducing subtle perturbations to the input, such that the model predicts an incorrect output. For instance, in the case of images, imperceptible pixel differences are able to fool deep learning models. Such adversarial attacks are definitively an important obstacle to the successful deployment of deep learning, especially in applications where safety and security are critical. While some early solutions have been proposed, a significant challenge is to develop effective defense mechanisms against those attacks.

Finally, another challenge is to enable low complexity and efficient implementations. This is especially important for mobile or embedded applications. For this purpose, further interactions between signal processing and machine learning can potentially bring additional benefits. For instance, one direction is to compress deep neural networks in order to enable their more efficient handling. Moreover, by combining traditional processing techniques with deep learning models, it is possible to develop low complexity solutions while preserving high performance.

Explainability in Deep Learning

While data-driven deep learning models often achieve impressive performances on many visual analysis tasks, their black-box nature often makes it inherently very difficult to understand how they reach a predicted output and how it relates to particular characteristics of the input data. However, this is a major impediment in many decision-critical application scenarios. Moreover, it is important not only to have confidence in the proposed solution, but also to gain further insights from it. Based on these considerations, some deep learning systems aim at promoting explainability ( Adadi and Berrada, 2018 ; Xie et al., 2020 ). This can be achieved by exhibiting traits related to confidence, trust, safety, and ethics.

However, explainable deep learning is still in its early phase. More developments are needed, in particular to develop a systematic theory of model explanation. Important aspects include the need to understand and quantify risk, to comprehend how the model makes predictions for transparency and trustworthiness, and to quantify the uncertainty in the model prediction. This challenge is key in order to deploy and use deep learning-based solutions in an accountable way, for instance in application domains such as healthcare or autonomous driving.

Self-Supervised Learning

Self-supervised learning refers to methods that learn general visual features from large-scale unlabeled data, without the need for manual annotations. Self-supervised learning is therefore very appealing, as it allows exploiting the vast amount of unlabeled images and videos available. Moreover, it is widely believed that it is closer to how humans actually learn. One common approach is to use the data to provide the supervision, leveraging its structure. More generally, a pretext task can be defined, e.g. image inpainting, colorizing grayscale images, predicting future frames in videos, by withholding some parts of the data and by training the neural network to predict it ( Jing and Tian, 2020 ). By learning an objective function corresponding to the pretext task, the network is forced to learn relevant visual features in order to solve the problem. Self-supervised learning has also been successfully applied to autonomous vehicles perception. More specifically, the complementarity between analytical and learning methods can be exploited to address various autonomous driving perception tasks, without the prerequisite of an annotated data set ( Chiaroni et al., 2021 ).

While good performances have already been obtained using self-supervised learning, further work is still needed. A few promising directions are outlined hereafter. Combining self-supervised learning with other learning methods is a first interesting path. For instance, semi-supervised learning ( Van Engelen and Hoos, 2020 ) and few-short learning ( Fei-Fei et al., 2006 ) methods have been proposed for scenarios where limited labeled data is available. The performance of these methods can potentially be boosted by incorporating a self-supervised pre-training. The pretext task can also serve to add regularization. Another interesting trend in self-supervised learning is to train neural networks with synthetic data. The challenge here is to bridge the domain gap between the synthetic and real data. Finally, another compelling direction is to exploit data from different modalities. A simple example is to consider both the video and audio signals in a video sequence. In another example in the context of autonomous driving, vehicles are typically equipped with multiple sensors, including cameras, LIght Detection And Ranging (LIDAR), Global Positioning System (GPS), and Inertial Measurement Units (IMU). In such cases, it is easy to acquire large unlabeled multimodal datasets, where the different modalities can be effectively exploited in self-supervised learning methods.

Reproducible Research and Large Public Datasets

The reproducible research initiative is another way to further ensure high-quality research for the benefit of our community ( Vandewalle et al., 2009 ). Reproducibility, referring to the ability by someone else working independently to accurately reproduce the results of an experiment, is a key principle of the scientific method. In the context of image and video processing, it is usually not sufficient to provide a detailed description of the proposed algorithm. Most often, it is essential to also provide access to the code and data. This is even more imperative in the case of deep learning-based models.

In parallel, the availability of large public datasets is also highly desirable in order to support research activities. This is especially critical for new emerging modalities or specific application scenarios, where it is difficult to get access to relevant data. Moreover, with the emergence of deep learning, large datasets, along with labels, are often needed for training, which can be another burden.

Conclusion and Perspectives

The field of image processing is very broad and rich, with many successful applications in both the consumer and business markets. However, many technical challenges remain in order to further push the limits in imaging technologies. Two main trends are on the one hand to always improve the quality and realism of image and video content, and on the other hand to be able to effectively interpret and understand this vast and complex amount of visual data. However, the list is certainly not exhaustive and there are many other interesting problems, e.g. related to computational imaging, information security and forensics, or medical imaging. Key innovations will be found at the crossroad of image processing, optics, psychophysics, communication, computer vision, artificial intelligence, and computer graphics. Multi-disciplinary collaborations are therefore critical moving forward, involving actors from both academia and the industry, in order to drive these breakthroughs.

The “Image Processing” section of Frontier in Signal Processing aims at giving to the research community a forum to exchange, discuss and improve new ideas, with the goal to contribute to the further advancement of the field of image processing and to bring exciting innovations in the foreseeable future.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1 https://www.brandwatch.com/blog/amazing-social-media-statistics-and-facts/ (accessed on Feb. 23, 2021).

Adadi, A., and Berrada, M. (2018). Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access 6, 52138–52160. doi:10.1109/access.2018.2870052

CrossRef Full Text | Google Scholar

Adelson, E. H., and Bergen, J. R. (1991). “The plenoptic function and the elements of early vision” Computational models of visual processing . Cambridge, MA: MIT Press , 3-20.

Google Scholar

Akhtar, N., and Mian, A. (2018). Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430. doi:10.1109/access.2018.2807385

Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vis. image understanding 110 (3), 346–359. doi:10.1016/j.cviu.2007.09.014

Bosse, S., Maniry, D., Müller, K. R., Wiegand, T., and Samek, W. (2017). Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 27 (1), 206–219. doi:10.1109/TIP.2017.2760518

PubMed Abstract | CrossRef Full Text | Google Scholar

Bovik, A. C. (2013). Automatic prediction of perceptual image and video quality. Proc. IEEE 101 (9), 2008–2024. doi:10.1109/JPROC.2013.2257632

Bross, B., Chen, J., Ohm, J. R., Sullivan, G. J., and Wang, Y. K. (2021). Developments in international video coding standardization after AVC, with an overview of Versatile Video Coding (VVC). Proc. IEEE . doi:10.1109/JPROC.2020.3043399

Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010). Brief: binary robust independent elementary features. In K. Daniilidis, P. Maragos, and N. Paragios (eds) European conference on computer vision . Berlin, Heidelberg: Springer , 778–792. doi:10.1007/978-3-642-15561-1_56

Chalmers, A., and Debattista, K. (2017). HDR video past, present and future: a perspective. Signal. Processing: Image Commun. 54, 49–55. doi:10.1016/j.image.2017.02.003

Chiaroni, F., Rahal, M.-C., Hueber, N., and Dufaux, F. (2021). Self-supervised learning for autonomous vehicles perception: a conciliation between analytical and learning methods. IEEE Signal. Process. Mag. 38 (1), 31–41. doi:10.1109/msp.2020.2977269

Cisco, (20192019). Cisco visual networking index: forecast and trends, 2017-2022 (white paper) , Indianapolis, Indiana: Cisco Press .

Conti, C., Soares, L. D., and Nunes, P. (2020). Dense light field coding: a survey. IEEE Access 8, 49244–49284. doi:10.1109/ACCESS.2020.2977767

Da Costa, D. B., and Yang, H.-C. (2020). Grand challenges in wireless communications. Front. Commun. Networks 1 (1), 1–5. doi:10.3389/frcmn.2020.00001

Deng, Y., Loy, C. C., and Tang, X. (2017). Image aesthetic assessment: an experimental survey. IEEE Signal. Process. Mag. 34 (4), 80–106. doi:10.1109/msp.2017.2696576

Ding, D., Ma, Z., Chen, D., Chen, Q., Liu, Z., and Zhu, F. (2021). Advances in video compression system using deep neural network: a review and case studies . Ithaca, NY: Cornell university .

Duan, L., Liu, J., Yang, W., Huang, T., and Gao, W. (2020). Video coding for machines: a paradigm of collaborative compression and intelligent analytics. IEEE Trans. Image Process. 29, 8680–8695. doi:10.1109/tip.2020.3016485

Dufaux, F., Le Callet, P., Mantiuk, R., and Mrak, M. (2016). High dynamic range video - from acquisition, to display and applications . Cambridge, Massachusetts: Academic Press .

Dufaux, F., Pesquet-Popescu, B., and Cagnazzo, M. (2013). Emerging technologies for 3D video: creation, coding, transmission and rendering . Hoboken, NJ: Wiley .

Fei-Fei, L., Fergus, R., and Perona, P. (2006). One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach Intell. 28 (4), 594–611. doi:10.1109/TPAMI.2006.79

Galdi, C., Chiesa, V., Busch, C., Lobato Correia, P., Dugelay, J.-L., and Guillemot, C. (2019). Light fields for face analysis. Sensors 19 (12), 2687. doi:10.3390/s19122687

Graziosi, D., Nakagami, O., Kuma, S., Zaghetto, A., Suzuki, T., and Tabatabai, A. (2020). An overview of ongoing point cloud compression standardization activities: video-based (V-PCC) and geometry-based (G-PCC). APSIPA Trans. Signal Inf. Process. 9, 2020. doi:10.1017/ATSIP.2020.12

Guarda, A., Rodrigues, N., and Pereira, F. (2020). Adaptive deep learning-based point cloud geometry coding. IEEE J. Selected Top. Signal Process. 15, 415-430. doi:10.1109/mmsp48831.2020.9287060

Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., and Bennamoun, M. (2020). Deep learning for 3D point clouds: a survey. IEEE transactions on pattern analysis and machine intelligence . doi:10.1109/TPAMI.2020.3005434

Han, X.-F., Jin, J. S., Wang, M.-J., Jiang, W., Gao, L., and Xiao, L. (2017). A review of algorithms for filtering the 3D point cloud. Signal. Processing: Image Commun. 57, 103–112. doi:10.1016/j.image.2017.05.009

Haskell, B. G., Puri, A., and Netravali, A. N. (1996). Digital video: an introduction to MPEG-2 . Berlin, Germany: Springer Science and Business Media .

Hirsch, R. (1999). Seizing the light: a history of photography . New York, NY: McGraw-Hill .

Ihrke, I., Restrepo, J., and Mignard-Debise, L. (2016). Principles of light field imaging: briefly revisiting 25 years of research. IEEE Signal. Process. Mag. 33 (5), 59–69. doi:10.1109/MSP.2016.2582220

Jing, L., and Tian, Y. (2020). “Self-supervised visual feature learning with deep neural networks: a survey,” IEEE transactions on pattern analysis and machine intelligence , Ithaca, NY: Cornell University .

Le Callet, P., Möller, S., and Perkis, A. (2012). Qualinet white paper on definitions of quality of experience. European network on quality of experience in multimedia systems and services (COST Action IC 1003), 3(2012) .

Le Gall, D. (1991). Mpeg: A Video Compression Standard for Multimedia Applications. Commun. ACM 34, 46–58. doi:10.1145/103085.103090

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature 521 (7553), 436–444. doi:10.1038/nature14539

Leutenegger, S., Chli, M., and Siegwart, R. Y. (2011). “BRISK: binary robust invariant scalable keypoints,” IEEE International conference on computer vision , Barcelona, Spain , 6-13 Nov, 2011 ( IEEE ), 2548–2555.

Lin, W., and Jay Kuo, C.-C. (2011). Perceptual visual quality metrics: a survey. J. Vis. Commun. image representation 22 (4), 297–312. doi:10.1016/j.jvcir.2011.01.005

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60 (2), 91–110. doi:10.1023/b:visi.0000029664.99615.94

Lumiere, L. (1996). 1936 the lumière cinematograph. J. Smpte 105 (10), 608–611. doi:10.5594/j17187

Masia, B., Wetzstein, G., Didyk, P., and Gutierrez, D. (2013). A survey on computational displays: pushing the boundaries of optics, computation, and perception. Comput. & Graphics 37 (8), 1012–1038. doi:10.1016/j.cag.2013.10.003

Murray, N., Marchesotti, L., and Perronnin, F. (2012). “AVA: a large-scale database for aesthetic visual analysis,” IEEE conference on computer vision and pattern recognition , Providence, RI , June, 2012 . ( IEEE ), 2408–2415. doi:10.1109/CVPR.2012.6247954

Rana, A., Valenzise, G., and Dufaux, F. (2018). Learning-based tone mapping operator for efficient image matching. IEEE Trans. Multimedia 21 (1), 256–268. doi:10.1109/TMM.2018.2839885

Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). “ORB: an efficient alternative to SIFT or SURF,” IEEE International conference on computer vision , Barcelona, Spain , November, 2011 ( IEEE ), 2564–2571. doi:10.1109/ICCV.2011.6126544

Slater, M. (2014). Grand challenges in virtual environments. Front. Robotics AI 1, 3. doi:10.3389/frobt.2014.00003

Van Engelen, J. E., and Hoos, H. H. (2020). A survey on semi-supervised learning. Mach Learn. 109 (2), 373–440. doi:10.1007/s10994-019-05855-6

Vandewalle, P., Kovacevic, J., and Vetterli, M. (2009). Reproducible research in signal processing. IEEE Signal. Process. Mag. 26 (3), 37–47. doi:10.1109/msp.2009.932122

Wallace, G. K. (1992). The JPEG still picture compression standard. IEEE Trans. Consumer Electron.Feb 38 (1), xviii-xxxiv. doi:10.1109/30.125072

Wien, M., Boyce, J. M., Stockhammer, T., and Peng, W.-H. (20192019). Standardization status of immersive video coding. IEEE J. Emerg. Sel. Top. Circuits Syst. 9 (1), 5–17. doi:10.1109/JETCAS.2019.2898948

Wu, G., Masia, B., Jarabo, A., Zhang, Y., Wang, L., Dai, Q., et al. (2017). Light field image processing: an overview. IEEE J. Sel. Top. Signal. Process. 11 (7), 926–954. doi:10.1109/JSTSP.2017.2747126

Xie, N., Ras, G., van Gerven, M., and Doran, D. (2020). Explainable deep learning: a field guide for the uninitiated , Ithaca, NY: Cornell University ..

Keywords: image processing, immersive, image analysis, image understanding, deep learning, video processing

Citation: Dufaux F (2021) Grand Challenges in Image Processing. Front. Sig. Proc. 1:675547. doi: 10.3389/frsip.2021.675547

Received: 03 March 2021; Accepted: 10 March 2021; Published: 12 April 2021.

Reviewed and Edited by:

Copyright © 2021 Dufaux. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Frédéric Dufaux, [email protected]

Research and implementation of a digital image processing education platform

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Open access
  • Published: 07 December 2013

Digital image processing techniques for detecting, quantifying and classifying plant diseases

  • Jayme Garcia Arnal Barbedo 1  

SpringerPlus volume  2 , Article number:  660 ( 2013 ) Cite this article

66k Accesses

212 Citations

12 Altmetric

Metrics details

This paper presents a survey on methods that use digital image processing techniques to detect, quantify and classify plant diseases from digital images in the visible spectrum. Although disease symptoms can manifest in any part of the plant, only methods that explore visible symptoms in leaves and stems were considered. This was done for two main reasons: to limit the length of the paper and because methods dealing with roots, seeds and fruits have some peculiarities that would warrant a specific survey. The selected proposals are divided into three classes according to their objective: detection, severity quantification, and classification. Each of those classes, in turn, are subdivided according to the main technical solution used in the algorithm. This paper is expected to be useful to researchers working both on vegetable pathology and pattern recognition, providing a comprehensive and accessible overview of this important field of research.

Introduction

Agriculture has become much more than simply a means to feed ever growing populations. Plants have become an important source of energy, and are a fundamental piece in the puzzle to solve the problem of global warming. There are several diseases that affect plants with the potential to cause devastating economical, social and ecological losses. In this context, diagnosing diseases in an accurate and timely way is of the utmost importance.

There are several ways to detect plant pathologies. Some diseases do not have any visible symptoms associated, or those appear only when it is too late to act. In those cases, normally some kind of sophisticated analysis, usually by means of powerful microscopes, is necessary. In other cases, the signs can only be detected in parts of the electromagnetic spectrum that are not visible to humans. A common approach in this case is the use of remote sensing techniques that explore multi and hyperspectral image captures. The methods that adopt this approach often employ digital image processing tools to achieve their goals. However, due to their many peculiarities and to the extent of the literature on the subject, they will not be treated in this paper. A large amount of information on the subject can be found in the papers by Bock et al. ( 2010 ), Mahlein et al. ( 2012 ) and Sankaran et al. ( 2010 ).

Most diseases, however, generate some kind of manifestation in the visible spectrum. In the vast majority of the cases, the diagnosis, or at least a first guess about the disease, is performed visually by humans. Trained raters may be efficient in recognizing and quantifying diseases, however they have associated some disadvantages that may harm the efforts in many cases. Bock et al. ( 2010 ) list some of those disadvantages:

Raters may tire and lose concentration, thus decreasing their accuracy.

There can be substantial inter- and intra-rater variability (subjectivity).

There is a need to develop standard area diagrams to aide assessment.

Training may need to be repeated to maintain quality. Raters are expensive.

Visual rating can be destructive if samples are collected in the field for assessment later in the laboratory.

Raters are prone to various illusions (for example, lesion number/size and area infected).

Besides those disadvantages, it is important to consider that some crops may extend for extremely large areas, making monitoring a challenging task.

Depending on the application, many of those problems may be solved, or at least reduced, by the use of digital images combined with some kind of image processing and, in some cases, pattern recognition and automatic classification tools. Many systems have been proposed in the last three decades, and this paper tries to organize and present those in a meaningful and useful way, as will be seen in the next section. Some critical remarks about the directions taken by the researches on this subject are presented in the concluding section.

Literature review

Vegetable pathologies may manifest in different parts of the plant. There are methods exploring visual cues present in almost all of those parts, like roots (Smith and Dickson 1991 ), kernels (Ahmad et al. 1999 ), fruits (Aleixos et al. 2002 ; Corkidi et al. 2005 ; López-García et al. 2010 ), stems and leaves. As commented before, this work concentrates in the latter two, particularly leaves.

This section is divided into three subsections according to the main purpose of the proposed methods. The subsections, in turn, are divided according to the main technical solution employed in the algorithm. A summarizing table containing information about the cultures considered and technical solutions adopted by each work is presented in the concluding section.

Some characteristics are shared by most methods presented in this section: the images are captured using consumer-level cameras in a controlled laboratory environment, and the format used for the images is RGB quantized with 8 bits. Therefore, unless stated otherwise, those are the conditions under which the described methods operate. Also, virtually all methods cited in this paper apply some kind of preprocessing to clean up the images, thus this information will be omitted from now on, unless some peculiarity warrants more detailing.

Because the information gathered by applying image processing techniques often allows not only detecting the disease, but also estimating its severity, there are not many methods focused only in the detection problem. There are two main situations in which simple detection applies:

Partial classification: when a disease has to be identified amidst several possible pathologies, it may be convenient to perform a partial classification, in which candidate regions are classified as being the result of the disease of interest or not, instead of applying a complete classification into any of the possible diseases. This is the case of the method by Abdullah et al. ( 2007 ), which is described in Section ‘Neural networks’.

Real-time monitoring: in this case, the system continuously monitor the crops, and issues an alarm as soon as the disease of interest is detected in any of the plants. The papers by Sena Jr et al. ( 2003 ) and Story et al. ( 2010 ) fit into this context. Both proposals are also described in the following.

Neural networks

The method proposed by Abdullah et al. ( 2007 ) tries to discriminate a given disease ( corynespora ) from other pathologies that affect rubber tree leaves. The algorithm does not employ any kind of segmentation. Instead, Principal Component Analysis is applied directly to the RGB values of the pixels of a low resolution (15×15 pixels) image of the leaves. The first two principal components are then fed to a Multilayer Perceptron (MLP) Neural Network with one hidden layer, whose output reveals if the sample is infected by the disease of interest or not.

Thresholding

The method proposed by Sena Jr et al. ( 2003 ) aims to discriminate between maize plants affected by fall armyworm from healthy ones using digital images. They divided their algorithm into two main stages: image processing and image analysis. In the image processing stage, the image is transformed to a grey scale, thresholded and filtered to remove spurious artifacts. In the image analysis stage, the whole image is divided into 12 blocks. Blocks whose leaf area is less than 5% of the total area are discarded. For each remaining block, the number of connected objects, representing the diseased regions, is counted. The plant is considered diseased if this number is above a threshold, which, after empirical evaluation, was set to ten.

Dual-segmented regression analysis

Story et al. ( 2010 ) proposed a method for monitoring and early detection of calcium deficiency in lettuce. The first step of the algorithm is the plant segmentation by thresholding, so the canopy region is isolated. The outlines of the region of interest are applied back to the original image, in such a way only the area of interest is considered. From that, a number of color features (RGB and HSL) and texture features (from the gray-level co-occurrence matrix) are extracted. After that, the separation point identifying the onset of stress due to the calcium deficiency is calculated by identifying the mean difference between the treatment and control containers at each measured time for all features. Dual-segmented regression analysis is performed to identify where in time a change point was present between the nutrient-deficit group of plants and the healthy group of plants. The authors concluded arguing that their system can be used to monitor plants in greenhouses during the night, but more research is needed for its use during the day, when lighting conditions vary more intensely.

Quantification

The methods presented in this section aim to quantify the severity of a given disease. Such a severity may be inferred either by the area of the leaves that are affected by the disease, or by how deeply rooted is the affection, which can be estimated by means of color and texture features. Most quantification algorithms include a segmentation step to isolate the symptoms, from which features can be extracted and properly processed in order to provide an estimate for the severity of the disease.

It is worth noting that the problem of determining the severity of a disease by analyzing and measuring its symptoms is difficult even if performed manually by one or more specialists, which have to pair the diagnosis guidelines with the symptoms as accurately as possible. As a result, the manual measurements will always contain some degree of subjectivity, which in turn means that references used to validate the automatic methods are not exactly “ground truth”. It is important to take this into consideration when assessing the performance of those methods.

The methods presented in the following are grouped according to the main strategies they employ to estimate the severity of the diseases.

One of the first methods to use digital image processing was proposed by Lindow and Webb ( 1983 ). The images were captured using an analog video camera, under a red light illumination to highlight the necrotic areas. Those images were later digitized and stored in a computer. The tests were performed using leaves from tomatoes, bracken fern, sycamore and California buckeye. The identification of the necrotic regions is done by a simple thresholding. The algorithm then apply a correction factor to compensate for pixel variations in the healthy parts of the leaves, so at least some of the pixels from healthy regions that were misclassified as part of the diseased areas can be reassigned to the correct set.

Price et al. ( 1993 ) compared visual and digital image-processing methods in quantifying the severity of coffee leaf rust. They tested two different imaging systems. In the first one, the images were captured by a black and white charge coupled device (CCD) camera, and in the second one, the images were captured with a color CCD camera. In both cases, the segmentation was performed by a simple thresholding. According to the authors, the image processing-based systems had better performance than visual evaluations, especially for cases with more severe symptoms. They also observed that the color imaging had greater potential in discriminating between rusted and non-rusted foliage.

The method proposed by Tucker and Chakraborty ( 1997 ) aims to quantify and identify diseases in sunflower and oat leaves. The first step of the algorithm is a segmentation whose threshold varies according to the disease being considered (blight or rust). The resulting pixels are connected into clusters representing the diseased regions. Depending on the characteristics of the lesions, they are classified into the appropriate category (type a or b in case of blight and by size in case of rust). The authors reported good results, but observed some errors due to inappropriate illumination during the capture of the images.

Martin and Rybicki ( 1998 ) proposed a method to quantify the symptoms caused by the maize streak virus. The thresholding scheme adopted by the authors was based on the strategy described by Lindow and Webb ( 1983 ) and briefly explained in the previous paragraph. The authors compared the results obtained by visual assessment, by using a commercial software package and by employing a custom system implemented by themselves. They concluded that the commercial and custom software packages had approximately the same performance, and that both computer-based methods achieved better accuracy and precision than the visual approach.

The method proposed by Skaloudova et al. ( 2006 ) measures the damage caused in leaves by spider mites. The algorithm is based on a two-stage thresholding. The first stage discriminates the leaf from the background, and the second stage separates damaged regions from healthy surface. The final estimate is given by the ratio between the number of pixels in damage regions divided by the total number of pixels of the leaf. The authors compared the results with two other methods based on the leaf damage index and chlorophyll fluorescence. They concluded that their method and the leaf damage index provided superior results when compared with the chlorophyll fluorescence.

In their work, Weizheng et al. ( 2008 ) presented a strategy to quantify lesions in soybean leaves. The algorithm is basically composed by a two-step thresholding. The first threshold aims to separate leaf from background. After that, the image containing only the leaf is converted to the HSI color space, and the Sobel operator is applied to identify the lesion edges. A second threshold is applied to the resulting Sobel gradient image. Finally, small objects in the binary image are discarded and holes enclosed by white pixels are filled. The resulting objects reveal the diseased regions.

Camargo and Smith ( 2009a ) proposed a method to identify regions of leaves containing lesions caused by diseases. The tests were performed using leaves from a variety of plants, like bananas, maize, alfalfa, cotton and soybean. Their algorithm is based on two main operations. First, a color transformation to the HSV and I1I2I3 spaces is performed, from which only H and two modified versions of I3 are used in the subsequent steps. After that, a thresholding based on the histogram of intensities technique (Prewitt 1970 ) is applied in order to separate healthy and diseased regions. According to the authors, their approach was able to properly discriminate between diseased and healthy areas for a wide variety of conditions and species of plants.

The method proposed by Macedo-Cruz et al. ( 2011 ) aimed to quantify the damage caused by frost in oat crops. The images used by the authors were captured directly in the crop fields. The first step of the algorithm is the conversion from RGB to the L*a*b* representation. The authors employed three different thresholding strategies: Otsu’s method, Isodata algorithm, and fuzzy thresholding. Each strategy generates a threshold value for each color channel, which are combined by a simple average so a single threshold value is assigned to each channel. If necessary, the resulting partitions may be thresholded again, and so on, until some stopping criteria are met. The final resulting partitions give rise to a number of classes that, after properly labeled, reveal the extent of the damage suffered by the crops.

Lloret et al. ( 2011 ) proposed a system to monitor the health of vineyards. The images were captured by means of webcams scattered throughout the field. The main objective was to detected and quantify diseased leaves. Their system has five stages: 1) leaf size estimation, which is necessary due to the variation of the distance between the cameras and the plants; 2) thresholding, which separates diseased leaves and ground from healthy leaves using both the RGB and HSV color representations of the image; 3) a set of morphological operations, aiming to reduce noise without eliminating useful features; 4) a detection step, which aims to discriminate between ground and actual diseased leaves; 5) calculation of the ratio of diseased leaves. Depending on the value of this ratio, the system emits a warning that the plant requires some attention.

Patil and Bodhe ( 2011 ) proposed a method for assessing the severity of fungi-related disease in sugar cane leaves. The method performs two segmentations. The first one aims to separate the leaves from the rest of the scene, and is performed by means of a simple thresholding. In the second segmentation, the image is converted from the RGB to the HSI color space, and a binarization is applied in order to separate the diseased regions. The threshold for the binarization is calculated by the so-called triangle thresholding method, which is based on the gray-scale histogram of the image. The binary image is finally used to determine the ratio of the infection with respect to the entire leaf.

Color analysis

Boese et al. ( 2008 ) proposed a method to estimate the severity of eelgrass leaf injury, which can be caused by desiccation, wasting disease, and micro herbivory feeding. The first step of the algorithm is the unsupervised segmentation of the leaves into a number of classes (six to ten). In the following, an expert labels the classes into one of five possibilities (the three types of injuries, plus healthy tissue and background). After that, the quantification is just a matter of measuring the areas occupied by each of the injuries. According to the authors, their approach still have a number of problems that limit its utility, but it is an improvement over other approaches to quantify complex leaf injuries from multiple stressors.

The method proposed by Pagola et al. ( 2009 ) deals with the problem of quantifying nitrogen deficiency in barley leaves. They use some color channel manipulations in the RGB space and apply Principal Component Analysis (PCA) to obtain a measure for the “greenness” of the pixels. In order to aggregate the results of all pixels into a single estimate, the authors tested four strategies, whose main goal was to emphasize relevant regions and reduce the influence of the regions that are not photosinthetically active, like veins and leaf spots. The authors concluded that their method had high correlation with the largely adopted approach based on non-destructive hand-held chlorophyll meters.

Contreras-Medina et al. ( 2012 ) proposed a system to quantify five different types of symptoms in plant leaves. Their system is actually composed of five independent modules: 1) chlorosis algorithm, which combines the red and green components of the image in order to determine the yellowness of the leaf, which indicates the severity of the chlorosis; 2) necrosis algorithm, which uses the blue component to discriminate leaves from background, and the green component to identify and quantify the necrotic regions; 3) leaf deformation algorithm, which uses the blue component to segment the leaf and calculates the sphericity of the leaf as a measure for its deformation; 4) white spots algorithm, which applies a thresholding to the blue component of the image to estimate the area occupied by those spots; 5) mosaic algorithm, which uses the blue channel, a number of morphological operations and the Canny edge detector to identify and quantify the venations present in the leaf.

Fuzzy logic

In their paper, Sannakki et al. ( 2011 ) presented a method to quantify disease symptoms based on Fuzzy logic. The tests were performed using pomegranate leaves. The algorithm begins converting the images to the L*a*b* color space. The pixels are grouped into a number of classes through K-means clustering. According to the authors, one of the groups will correspond to the diseased areas, however the paper does not provide any information on how the correct group is identified. In the following, the program calculates the percentage of the leaf that is infected. Finally, a Fuzzy Inference System is employed for the final estimation of the disease rating. The details on how such a system is applied are also absent.

Sekulska-Nalewajko and Goclawski ( 2011 ) method aims to detect and quantify disease symptoms in pumpkin and cucumber leaves. The images used in the tests were captured using a flatbed scanner. The leaves were detached from the plants, treated and stained prior to the imaging. The authors used functions present in the Matlab toolboxes to implement their ideas. The first step of the algorithm is the isolation of the leaf by thresholding. In the following, the image is transformed from RGB to HSV color space. The brightness component (V) is discarded. Then, a Fuzzy c-means algorithm is applied in order to group the pixels into two main clusters, representing healthy and diseased regions. The authors argued that their approach is a better solution than using third-party packages which, according to them, require too many operations to achieve the desired results.

Zhou et al. ( 2011 ) proposed a method to evaluate the degree of hopper infestation in rice crops. The presence of rice plant-hoppers manifests more intensely in the stem, so that was the part of the plant focused by the authors. In the algorithm, after the regions of interest are extracted, fractal-dimension value features are extracted using the box-counting dimension method. These features are used to derive a regression model. Finally, a fuzzy C-means algorithm is used to classify the regions into one of four classes: no infestation, mild infestation, moderate infestation and severe infestation.

Knowledge-based system

The aim of the work by Boissard et al. ( 2008 ) was a little different from the others presented in this paper, as their method tries to quantify the number of whiteflies in rose leaves as part of an early pest detection system. The method employs two knowledge-based systems (KBS) to estimate the number of insects. The first system, the so-called classification KBS, takes the numerical results from some image processing operations, and interprets them into higher level concepts which, in turn, are explored to assist the algorithm to choose and retain only the regions containing insects. The second system, the so-called supervision KBS, selects the image processing tools to be applied, as well as the parameters to be used, in order to collect and feed the most meaningful information to the first system. According to the authors, their proposal had some problems, but it was a good addition to the efforts towards the automation of greenhouse operations.

Region growing

Pang et al. ( 2011 ) proposed a method to segment lesions caused by six types of diseases that affect maize crops. The algorithm begins by identifying all pixels for which the level of the red channel (R) is higher than the level of the green channel (G). According to the authors, those pixels are part of a diseased region in 98% of the cases. The connected regions are then identified and labeled. The second part of the algorithm tries to identify the pixels for which R < G that are actually part of the lesions. To do that, the algorithm takes the connected regions as seeds and applies a region growing technique to more accurately define the diseased regions. The termination condition for the growing procedure is given by the threshold values obtained by applying Otsu’s method to each connected region.

Third party image processing packages

Olmstead et al. ( 2001 ) compared two different methods (one visual and one computational) for quantifying powdery mildew infection in sweet cherry leaves. The images were captured using a flatbed scanner. The image analysis, which is basically the application of thresholding, was performed using the SigmaScan Pro (v. 4.0) software package. In order to generate a standard for comparison of the two methods, the fungi colonies were manually painted white and submitted to the image analysis, providing the reference values. According to the authors, the visual assessment provided far superior estimates in comparison with the computational one.

The method proposed by Berner and Paxson ( 2003 ) aimed at quantifying the symptoms in infected yellow starthistle. The images were captured using a flatbed scanner, and the images were analyzed by the SigmaScan Pro (v.5.0) software package. The operations applied to the image are simple: brightness and contrast adjustments, transformation to gray scale, and application of color overlays. Those overlays emphasize both diseased regions (pustules) and dark areas along venations, so a shape-based selection is carried out in order to keep only the diseased regions. Finally, the pustules are counted.

Moya et al. ( 2005 ) compared the results obtained by visual and image processing-based assessment of squash leaves infected with powdery mildew. They used a commercial software package, the ArcView GIS 3.2, to segment the leaf images into either five or ten classes. The assigned classes were then manually compared to the original images, and the regions corresponding to disease were properly labeled and measured. Finally, the severity of the disease was given by the number of selected pixels divided by the total number of pixels in the leaf. The authors compared these results to those obtained entirely manually. They also compared the results according to the type of device used for capturing the images (digital camera or scanner).

In their proposals, Bock et al. ( 2008 2009 ) aimed at quantifying the severity of the Foliar Citrus Canker in Grapefruit leaves. To perform the image analysis, the authors employed a package called Assess V1.0: Image Analysis Software for plant disease quantification (Lamari 2002 ). In their approach, the images are first converted to the HSI format, and then thresholded to separate the diseased parts from the rest of the scene. The value of the threshold was initially tuned manually by visually comparing the resulting segmentation with the actual image. After the ideal segmentation is achieved, estimating the severity is just a matter of calculating the healthy and diseased areas and finding their ratio. The authors later tried to automate the thresholding process, achieving some mixed results due to tone and lighting variations that prevent fixed thresholds to be valid in all cases.

Goodwin and Hsiang ( 2010 ) and Wijekoon et al. ( 2008 ) used a freely available software called Scion Image to quantify fungal infection in leaves of lilies-of-the-valley, apple trees, phlox and golden rod. The images were captured both in laboratory and in situ , using flatbed scanners for detached leaves and consumer level digital cameras for attached leaves. The use of the Scion software was almost entirely based upon the method proposed by Murakami ( 2005 ), in which the color of a targeted area is manually adjusted in order to maximize the discrimination between healthy and diseased surfaces. The symptoms of several fungal diseases were tested, like powdery mildew, rust, anthracnose and scab.

The Assess software (v. 2.0) was used by Coninck et al. ( 2012 ) to determine the severity of Cercospora leaf spot (CLS) disease in sugar beet breeding. Their approach was related to that used by Bock et al. ( 2009 ), with the images being converted to the HSI representation and with a properly threshold being determined by means of practical experiments. The main purpose of the authors was not to develop a novel method for disease symptom quantification, but to compare the accuracy of three very different ways of estimating the disease severity: visual assessment, real-time Polymerase Chain Reaction (PCR) and image processing. The authors concluded stating that the use of both image analysis and real-time PCR had the potential to increase accuracy and sensitivity of assessments of CLS in sugar beet, while reducing bias in the evaluations.

The software package ImageJ was used by Peressotti et al. ( 2011 ) to quantify grapevine downy mildew sporulation. The authors wrote a macro for ImageJ, which properly adjusts color balance and contrast prior to presenting the image to the user. After that, the user can test several different values of threshold to segment the image, until a satisfactory result is achieved. The authors reported good correlation between the results obtained by their method and by visual assessment.

Classification

The classification methods can be seen as extensions of the detection methods, but instead of trying to detect only one specific disease amidst different conditions and symptoms, these ones try to identify and label whichever pathology that is affecting the plant. As in the case of quantification, classification methods almost always include a segmentation step, which is normally followed by the extraction of a number of features that will feed some kind of classifier. The methods presented in the following are grouped according to the kind of classification strategy employed.

A very early attempt to monitor plant health was carried out by Hetzroni et al. ( 1994 ). Their system tried to identify iron, zinc and nitrogen deficiencies by monitoring lettuce leaves. The capture of the images was done by an analog video camera, and only afterwards the images would be digitized. The first step of the proposed algorithm is the segmentation of the images into leaf and background. In the following a number of size and color features are extracted from both the RGB and HSI representations of the image. Those parameters are finally fed to neural networks and statistical classifiers, which are used to determine the plant condition.

Pydipati et al. ( 2005 ) compared two different approaches to detect and classify three types of citrus diseases. The authors collected 39 texture features, and created four different subsets of those features to be used in two different classification approaches. The first approach was based on a Mahalanobis minimum distance classifier, using the nearest neighbor principle. The second approach used radial basis functions (RBF) neural network classifiers trained with the backpropagation algorithm. According to the authors, both classification approaches performed equally well when using the best of the four subsets, which contained ten hue and saturation texture features.

Huang ( 2007 ) proposed a method to detect and classify three different types of diseases that affect Phalaenopsis orchid seedlings. The segmentation procedure adopted by the author is significantly more sophisticated than those found in other papers, and is composed by four steps: removal of the plant vessel using a Bayes classifier, equalization of the image using an exponential transform, a rough estimation for the location of the diseased region, and equalization of the sub-image centered at that rough location. A number of color and texture features are then extracted from the gray level co-occurrence matrix (Haralick et al. 1973 ). Finally, those features are submitted to an MLP artificial neural network with one hidden layer, which performs the final classification.

Sanyal et al. ( 2007 ) tackled the problem of detecting and classifying six types of mineral deficiencies in rice crops. First, the algorithm extracts a number of texture and color features. Each kind of feature (texture and color) is submitted to its own specific MLP neural network. Both networks have one hidden layer, but the number of neurons in the hidden layer is different (40 for texture and 70 for color). The results returned by both networks are then combined, yielding the final classification. A very similar approach is used by the same authors in another paper (Sanyal and Patel 2008 ), but in this case the objective is to identify two kinds of diseases (blast and brown spots) that affect rice crops.

The method proposed by Al Bashish et al. ( 2010 ) tries to identify five different plant diseases. The authors did not specify the species of plants used in the tests, and the images were captured in situ . After a preprocessing stage to clean up the image, a K-means clustering algorithm is applied in order to divide the image into four clusters. According to the authors, at least one of the clusters must correspond to one of the diseases. After that, for each cluster a number of color and texture features are extracted by means of the so-called Color Co-Occurrence Method, which operates with images in the HSI format. Those features are fed to a MLP Neural Network with ten hidden layers, which performs the final classification.

Kai et al. ( 2011 ) proposed a method to identify three types of diseases in maize leaves. First, the images are converted to the YCbCr color representation. Apparently, some rules are applied during the thresholding in order to properly segment the diseased regions. However, due to a lack of clarity, it is not possible to infer exactly how this is done. The authors then extract a number of texture features from the gray level co-occurrence matrix. Finally, the features are submitted to an MLP neural network with one hidden layer.

Wang et al. ( 2012 ) proposed a method to discriminate between pairs of diseases in wheat and grapevines. The images are segmented by a K-means algorithm, and then 50 color, shape and texture features are extracted. For the purpose of classification, the authors tested four different kinds of neural networks: Multilayer Perceptron, Radial Basis Function, Generalized Regression, and Probabilistic. The authors reported good results for all kinds of neural networks.

Suport vector machines

Meunkaewjinda et al. ( 2008 ) proposed a method to identify and classify diseases that affect grapevines. The method uses several color representations (HSI, L*a*b*, UVL and YCbCr) throughout its execution. The separation between leaves and background is performed by an MLP neural network, which is coupled with a color library built a priori by means of an unsupervised self organizing map (SOM). The colors present on the leaves are then clustered by means of an unsupervised and untrained self-organizing map. A genetic algorithm determines the number of clusters to be adopted in each case. Diseased and healthy regions are then separated by a Support Vector Machine (SVM). After some additional manipulations, the segmented image is submitted to a multiclass SVM, which performs the classification into either scab, rust, or no disease.

Youwen et al. ( 2008 ) proposed a method to identify two diseases that can manifest in cucumber leaves. The segmentation into healthy and diseased regions is achieved using a statistic pattern recognition approach. In the following, some color, shape and texture features are extracted. Those features feed an SVM, which performs the final classification. The authors stated that the results provided by the SVM are far better than those achieved using neural networks.

The system proposed by Yao et al. ( 2009 ) aimed to identify and classify three types of diseases that affect rice crops. The algorithm first applies a particular color transformation to the original RGB image, resulting in two channels ( y 1 and y 2). Then, the image is segmented by Otsu’s method, after which the diseased regions are isolated. Color, shape and texture features are extracted, the latter one from the HSV color space. Finally, the features are submitted to a Support Vector Machine, which performs the final classification.

The method proposed by Camargo and Smith ( 2009b ) tries to identify three different kinds of diseases that affect cotton plants. The authors used images not only of leaves, but also of fruits and stems. The segmentation of the image is performed using a technique developed by the authors (Camargo and Smith 2009a ), which was described earlier in this paper (Section ‘Thresholding’). After that, a number of features is extracted from the diseased regions. Those features are then used to feed an SVM. The one-against-one method (Hsu and Lin 2002 ) was used to allow the SVM to deal with multiple classes. The authors concluded that the texture features have the best discrimination potential.

Jian and Wei ( 2010 ) proposed a method to recognize three kinds of cucumber leaf diseases. As in most approaches, the separation between healthy and diseased regions is made by a simple thresholding procedure. In the following, a variety of color, shape and texture features are extracted. Those features are submitted to an SVM with Radial Basis Function (RBF) as kernel, which performs the final classification.

Fuzzy classifier

The method proposed by Hairuddin et al. ( 2011 ) tries to identify four different nutritional deficiencies in oil palm plants. The image is segmented according to color similarities, but the authors did not provide any detail on how this is done. After the segmentation, a number of color and texture features are extracted and submitted to a fuzzy classifier which, instead of outputting the deficiencies themselves, reveals the amounts of fertilizers that should be used to correct those deficiencies. Unfortunately, the technical details provided in this paper are superficial, making it difficult to reach a clear understanding about the approach adopted by the authors.

Xu et al. ( 2011 ) proposed a method to detect nitrogen and potassium deficiencies in tomato plants. The algorithm begins extracting a number of features from the color image. The color features are all based on the b* component of the L*a*b* color space. The texture features are extracted using three different methods: difference operators, Fourier transform and Wavelet packet decomposition. The selection and combination of the features was carried out by means of a genetic algorithm. Finally, the optimized combination of features is used as the input of a fuzzy K-nearest neighbor classifier, which is responsible for the final identification.

Feature-based rules

In their two papers, Kurniawati et al. ( 2009a 2009b ) proposed a method to identify and label three different kinds of diseases that affect paddy crops. As in many other methods, the segmentation of healthy and diseased regions is performed by means of thresholding. The authors tested two kinds of thresholding, Otsu’s and local entropy, with the best results being achieved by the latter one. Afterwards, a number of shape and color features are extracted. Those features are the basis for a set of rules that determine the disease that best fits the characteristics of the selected region.

Zhang ( 2010 ) proposed a method for identifying and classifying lesions in citrus leaves. The method is mostly based on two sets of features. The first set was selected having as main goal to separate lesions from the rest of the scene, which is achieved by setting thresholds to each feature and applying a weighted voting scheme. The second set aims to provide as much information as possible about the lesions, so a discrimination between diseases becomes possible. The final classification is, again, achieved by means of feature thresholds and a weighted voting system. A more detailed version of (Zhang 2010 ) can be found in (Zhang and Meng 2011 ).

The method proposed by Wiwart et al. ( 2009 ) aims to detect and discriminate among four types of mineral deficiencies (nitrogen, phosphorus, potassium and magnesium). The tests were performed using faba bean, pea and yellow lupine leaves. Prior to the color analysis, the images are converted to the HSI and L*a*b* color spaces. The presence or absence of the deficiencies is then determined by the color differences between healthy leaves and the leaves under test. Those differences are quantified by Euclidean distances calculated in both color spaces.

Pugoy and Mariano ( 2011 ) proposed a system to identify two different types of diseases that attack rice leaves. The algorithm first converts the image from RGB to HSI color space. The K-means technique is applied to cluster the pixels into a number of groups. Those groups are then compared to a library that relates colors to the respective diseases. This comparison results in values that indicate the likelihood of each region being affected by each of the diseases.

Self organizing maps

The method proposed by Phadikar and Sil ( 2008 ) detects and differentiates two diseases that affect rice crops, blast and brown spots. First, the image is converted to the HSI color space. Then, a entropy-based thresholding is used to segment the image. An edge detector is applied to the segmented image, and the intensity of the green components is used to detect the spots. Each region containing each detected spot is then resized by interpolation, so all regions have a size of 80×100 pixels. The pixel values (gray scale) are finally fed to a self organizing map (SOM), which performs the final classification.

Discriminant analysis

Pydipati et al. ( 2006 ) method aims to detect and classify three different types of citrus diseases. The method relies heavily on the color co-occurrence method (CCM), which, in turn, was developed through the use of spatial gray-level dependence matrices (SGDM’s) (Shearer and Holmes 1990 ). The resulting CCM matrices, which are generated from the HSI color representation of the images, are used to extract 39 texture features. The number of features was then reduced by means of a redundancy reduction procedure. The authors observed that the elimination of intensity features improved the results, as hue and saturation features are more robust to ambient light variations than the former ones. The final classification was performed using discriminant analysis.

Membership function

Anthonys and Wickramarachchi ( 2009 ) proposed a method to discriminate among three different diseases that attack paddy plants. The image is segmented by a thresholding procedure – the grayscale version of the image used in such a procedure is obtained after assigning different weights to each component of its RGB representation. The resulting images, containing only the regions supposedly containing the symptoms of the diseases, are then converted to the L*a*b* format, and a number of color and shape features are extracted. The values of those features are compared to some reference value intervals stored in a lookup table by means of the so-called Membership Function, which outputs a single similarity score for each possible disease. The highest score determines the disease affecting the plant.

Table 1 shows an overview of all methods presented in this paper, together with the type of plant considered in each research and the main technical solution used in the algorithm.

Despite the importance of the subject of identifying plant diseases using digital image processing, and although this has been studied for at least 30 years, the advances achieved seem to be a little timid. Some facts lead to this conclusion:

Methods are too specific. The ideal method would be able to identify any disease in any kind of plant. Evidently, this is unfeasible given the current technological level. However, many of the methods that are being proposed not only are able to deal with only one species of plant, but those plants need to be at a certain growth stage in order to the algorithm to be effective. That is acceptable if the disease only attacks the plant in that specific stage, but it is very limiting otherwise. Many of the papers do not state this kind of information explicitly, but if their training and test sets include only images of a certain growth stage, which is often the case, the validity of the results cannot be extended to other stages.

Operation conditions are too strict. Many images used to develop new methods are collected under very strict conditions of lighting, angle of capture, distance between object and capture device, among others. This is a common practice and is perfectly acceptable in the early stages of research. However, in most real world applications, those conditions are almost impossible to be enforced, especially if the analysis is expected to be carried out in a non-destructive way. Thus, it is a problem that many studies never get to the point of testing and upgrading the method to deal with more realistic conditions, because this limits their scope greatly.

Lack of technical knowledge about more sophisticated technical tools. The simplest solution for a problem is usually the preferable one. In the case of image processing, some problems can be solved by using only morphological mathematical operations, which are easy to implement and understand. However, more complex problems often demand more sophisticated approaches. Techniques like neural networks, genetic algorithms and support vector machines can be very powerful if properly applied. Unfortunately, that is often not the case. In many cases, it seems that the use of those techniques is founded more in the hype they generate in the scientific community than in their technical appropriateness with respect to the problem at hand. As a result, problems like overfitting, overtraining, undersized sample sets, sample sets with low representativeness, bias, among others, seem to be a widespread plague. Those problems, although easily identifiable by a knowledgeable individual on the topic, seem to go widely overlooked by the authors, probably due to the lack of knowledge about the tools they are employing. The result is a whole group of technically flawed solutions.

Evidently, there are some high quality manuscripts in which the authors rigorously take into account most factors that could harm the validity of their results, but unfortunately those still seem to be the exception, not the rule. As a result, the technology evolves slower than it could. The underlying conclusion is that the authors should spend a little more time learning about the tools they intend to use. A better understand about the concepts behind those tools can potentially lead to more solid results and to less time wasted, improving the overall quality of the literature of the area.

The wide-ranging variety of applications on the subject of counting objects in digital images makes it difficult for someone to prospect all possible useful ideas present in the literature, which can cause potential solutions for problematic issues to be missed. In this context, this paper tried to present a comprehensive survey on the subject, aiming at being a starting point for those conducting research on the issue. Due to the large number of references, the descriptions are short, providing a quick overview of the ideas underlying each of the solutions. It is important to highlight that the work on the subject is not limited to what was shown here. Many papers on the subject could not be included in order to keep the paper length under control – the papers were selected as to consider the largest number of different problems as possible. Thus, if the reader wishes to attain a more complete understanding on a given application or problem, he/she can refer to the bibliographies of the respective articles.

Abdullah NE, Rahim AA, Hashim H, Kamal MM: Classification of rubber tree leaf diseases using multilayer perceptron neural network. In 2007 5th student conference on research and development . Selangor: IEEE; 2007:1-6.

Google Scholar  

Ahmad IS, Reid JF, Paulsen MR, Sinclair JB: Color classifier for symptomatic soybean seeds using image processing. Plant Dis 1999, 83(4):320-327. 10.1094/PDIS.1999.83.4.320

Article   Google Scholar  

Al Bashish D, Braik M, Bani-Ahmad S: A framework for detection and classification of plant leaf and stem diseases. In 2010 international conference on signal and image processing . Chennai: IEEE; 2010:113-118.

Chapter   Google Scholar  

Aleixos N, Blasco J, Navarron F, Molto E: Multispectral inspection of citrus in real-time using machine vision and digital signal processors. Comput Electron Agric 2002, 33(2):121-137. 10.1016/S0168-1699(02)00002-9

Anthonys G, Wickramarachchi N: An image recognition system for crop disease identification of paddy fields in Sri Lanka. In 2009 International Conference on Industrial and Information Systems (ICIIS) . Sri Lanka: IEEE; 2009:403-407.

Berner DK, Paxson LK: Use of digital images to differentiate reactions of collections of yellow starthistle (Centaurea solstitialis) to infection by Puccinia jaceae. Biol Control 2003, 28(2):171-179. 10.1016/S1049-9644(03)00096-3

Bock CH, Parker PE, Cook AZ, Gottwald TR: Visual rating and the use of image analysis for assessing different symptoms of citrus canker on grapefruit leaves. Plant Dis 2008, 92(4):530-541. 10.1094/PDIS-92-4-0530

Bock CH, Cook AZ, Parker PE, Gottwald TR: Automated image analysis of the severity of foliar citrus canker symptoms. Plant Dis 2009, 93(6):660-665. 10.1094/PDIS-93-6-0660

Bock CH, Poole GH, Parker PE, Gottwald TR: Plant disease severity estimated visually, by digital photography and image analysis, and by hyperspectral imaging. Critical Rev Plant Sci 2010, 29(2):59-107. 10.1080/07352681003617285

Boese BL, Clinton PJ, Dennis D, Golden RC, Kim B: Digital image analysis of Zostera marina leaf injury. Aquat Bot 2008, 88: 87-90. 10.1016/j.aquabot.2007.08.016

Boissard P, Martin V, Moisan S: A cognitive vision approach to early pest detection in greenhouse crops. Comput Electron Agric 2008, 62(2):81-93. 10.1016/j.compag.2007.11.009

Camargo A, Smith JS: An image-processing based algorithm to automatically identify plant disease visual symptoms. Biosyst Eng 2009a, 102: 9-21. 10.1016/j.biosystemseng.2008.09.030

Camargo A, Smith JS: Image pattern classification for the identification of disease causing agents in plants. Comput Electron Agric 2009b, 66(2):121-125. 10.1016/j.compag.2009.01.003

Coninck BMA, Amand O, Delauré SL, Lucas S, Hias N, Weyens G, Mathys J, De Bruyne E, Cammue BPA: The use of digital image analysis and real-time PCR fine-tunes bioassays for quantification of Cercospora leaf spot disease in sugar beet breeding. Plant Pathol 2012, 61: 76-84. 10.1111/j.1365-3059.2011.02497.x

Contreras-Medina LM, Osornio-Rios RA, Torres-Pacheco I, Romero-Troncoso RJ, Guevara-González RG, Millan-Almaraz JR: Smart sensor for real-time quantification of common symptoms present in unhealthy plants. Sensors (Basel, Switzerland) 2012, 12: 784-805. 10.3390/s120100784

Corkidi G, Balderas-Ruíz KA, Taboada B, Serrano-Carreón L, Galindo E: Assessing mango anthracnose using a new three-dimensional image-analysis technique to quantify lesions on fruit. Plant Pathol 2005, 55(2):250-257.

Goodwin PH, Hsiang T: Quantification of fungal infection of leaves with digital images and Scion Image software. Methods Mol Biol 2010, 638: 125-135. 10.1007/978-1-60761-611-5_9

Hairuddin MA, Tahir NM, Baki SRS: Overview of image processing approach for nutrient deficiencies detection in Elaeis Guineensis. In 2011 IEEE international conference on system engineering and technology . Shah Alam: IEEE; 2011:116-120.

Haralick RM, Shanmugam K, Dinstein I: Textural features for image classification. IEEE Trans Syst Man Cybern SMC-3 1973, 3: 610-621.

Hetzroni A, Miles GE, Engel BA, Hammer PA, Latin RX: Machine vision monitoring of plant health. Adv Space Res 1994, 14(11):203-212. 10.1016/0273-1177(94)90298-4

Hsu CW, Lin CJ: A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 2002, 13: 415-425. 10.1109/72.991427

Huang KY: Application of artificial neural network for detecting Phalaenopsis seedling diseases using color and texture features. Comput Electron Agric 2007, 57: 3-11. 10.1016/j.compag.2007.01.015

Jian Z, Wei Z: Support vector machine for recognition of cucumber leaf diseases. In 2010 2nd international conference on advanced computer control . Shenyang: IEEE; 2010:264-266.

Kai S, Zhikun L, Hang S, Chunhong G: A research of maize disease image recognition of corn based on BP networks. In 2011 third international conference on measuring technology and mechatronics automation . Shangshai: IEEE; 2011:246-249.

Kurniawati NN, Abdullah SNHS, Abdullah S, Abdullah S: Investigation on image processing techniques for diagnosing paddy diseases. In 2009 international conference of soft computing and pattern recognition . Malacca: IEEE; 2009a:272-277.

Kurniawati NN, Abdullah SNHS, Abdullah S, Abdullah S: Texture analysis for diagnosing paddy disease. In 2009 International conference on electrical engineering and informatics . Selangor: IEEE; 2009b:23-27.

Lamari L: Assess: image analysis software for plant disease quantification . St. Paul: APS Press; 2002.

Lindow SE, Webb RR: Quantification of foliar plant disease symptoms by microcomputer-digitized video image analysis. Phytopathology 1983, 73(4):520-524. 10.1094/Phyto-73-520

Lloret J, Bosch I, Sendra S, Serrano A: A wireless sensor network for vineyard monitoring that uses image processing. Sensors 2011, 11(6):6165-6196.

López-García F, Andreu-García G, Blasco J, Aleixos N, Valiente JM: Automatic detection of skin defects in citrus fruits using a multivariate image analysis approach. Comput Electron Agric 2010, 71(2):189-197. 10.1016/j.compag.2010.02.001

Macedo-Cruz A, Pajares G, Santos M, Villegas-Romero I: Digital image sensor-based assessment of the status of oat (Avena sativa L.) crops after frost damage. Sensors 2011, 11(6):6015-6036.

Mahlein AK, Oerke EC, Steiner U, Dehne HW: Recent advances in sensing plant diseases for precision crop protection. Eur J Plant Pathol 2012, 133: 197-209. 10.1007/s10658-011-9878-z

Martin DP, Rybicki EP: Microcomputer-based quantification of maize streak virus symptoms in zea mays. Phytopathology 1998, 88(5):422-427. 10.1094/PHYTO.1998.88.5.422

Meunkaewjinda A, Kumsawat P, Attakitmongcol K, Srikaew A: Grape leaf disease detection from color imagery using hybrid intelligent system. In 2008 5th international conference on electrical engineering/electronics, computer, telecommunications and information technology . Krabi: IEEE; 2008:513-516.

Moya EA, Barrales LR, Apablaza GE: Assessment of the disease severity of squash powdery mildew through visual analysis, digital image analysis and validation of these methodologies. Crop Protect 2005, 24(9):785-789. 10.1016/j.cropro.2005.01.003

Murakami PF: An instructional guide for leaf color analysis using digital imaging software. 2005.

Olmstead JW, Lang GA, Grove GG: Assessment of severity of powdery mildew infection of sweet cherry leaves by digital image analysis. Hortscience 2001, 36: 107-111.

Pagola M, Ortiz R, Irigoyen I, Bustince H, Barrenechea E, Aparicio-Tejo P, Lamsfus C, Lasa B: New method to assess barley nitrogen nutrition status based on image colour analysis. Comput Electron Agric 2009, 65(2):213-218. 10.1016/j.compag.2008.10.003

Pang J, Bai Zy, Lai Jc, Li Sk: Automatic segmentation of crop leaf spot disease images by integrating local threshold and seeded region growing. In 2011 international conference on image analysis and signal processing . Hubei: IEEE; 2011:590-594.

Patil SB, Bodhe SK: Leaf disease severity measyrement using image processing. Int J Eng Technol 2011, 3(5):297-301.

Peressotti E, Duchêne E, Merdinoglu D, Mestre P: A semi-automatic non-destructive method to quantify grapevine downy mildew sporulation. J Microbiol Methods 2011, 84(2):265-271. 10.1016/j.mimet.2010.12.009

Phadikar S, Sil J: Rice disease identification using pattern recognition techniques . Khulna: IEEE; 2008. pp 420–423

Book   Google Scholar  

Prewitt J: Object enhancement and extraction. In Picture processing and psychopictorics . Orlando: Academic Press; 1970.

Price TV, Gross R, Wey JH, Osborne CF: A comparison of visual and digital image-processing methods in quantifying the severity of coffee leaf rust (Hemileia vastatrix). Aust J Exp Agric 1993, 33: 97-101. 10.1071/EA9930097

Pugoy RADL, Mariano VY: Automated rice leaf disease detection using color image analysis. In 3rd international conference on digital image processing, volume 8009 . Chengdu: SPIE; 2011:F1-F7.

Pydipati R, Burks TF, Lee WS: Statistical and neural network classifiers for citrus disease detection using machine vision. Trans ASAE 2005, 48(5):2007-2014.

Pydipati R, Burks TF, Lee WS: Identification of citrus disease using color texture features and discriminant analysis. Comput Electron Agric 2006, 52(1–2):49-59.

Sankaran S, Mishra A, Ehsani R, Davis C: A review of advanced techniques for detecting plant diseases. Comput Electron Agric 2010, 72: 1-13. 10.1016/j.compag.2010.02.007

Sannakki SS, Rajpurohit VS, Nargund VB, Kumar A: Leaf disease grading by machine vision and fuzzy logic. Int J 2011, 2(5):1709-1716.

Sanyal P, Patel SC: Pattern recognition method to detect two diseases in rice plants. Imaging Sci J 2008, 56(6):7.

Sanyal P, Bhattacharya U, Parui SK, Bandyopadhyay SK, Patel S: Color texture analysis of rice leaves diagnosing deficiency in the balance of mineral levels towards improvement of crop productivity. In 10th International Conference on Information Technology (ICIT 2007) . Orissa: IEEE; 2007:85-90.

Sekulska-Nalewajko J, Goclawski J: A semi-automatic method for the discrimination of diseased regions in detached leaf images using fuzzy c-means clustering. In VII international conference on perspective technologies and methods in MEMS design . Polyana-Svalyava: IEEE; 2011:172-175.

Sena DG Jr, Pinto FAC, Queiroz DM, Viana PA: Fall armyworm damaged maize plant identification using digital images. Biosyst Eng 2003, 85(4):449-454. 10.1016/S1537-5110(03)00098-9

Shearer SA, Holmes RG: Plant identification using color co-occurrence matrices. Trans ASAE 1990, 33(6):2037-2044.

Skaloudova B, Krvan V, Zemek R: Computer-assisted estimation of leaf damage caused by spider mites. Comput Electron Agric 2006, 53(2):81-91. 10.1016/j.compag.2006.04.002

Smith SE, Dickson S: Quantification of active vesicular-arbuscular mycorrhizal infection using image analysis and other techniques. Aust J Plant Physiol 1991, 18(6):637-648. 10.1071/PP9910637

Story D, Kacira M, Kubota C, Akoglu A, An L: Lettuce calcium deficiency detection with machine vision computed plant features in controlled environments. Comput Electron Agric 2010, 74(2):238-243. 10.1016/j.compag.2010.08.010

Tucker CC, Chakraborty S: Quantitative assessment of lesion characteristics and disease severity using digital image processing. J Phytopathol 1997, 145(7):273-278. 10.1111/j.1439-0434.1997.tb00400.x

Wang H, Li G, Ma Z, Li X: Application of neural networks to image recognition of plant diseases. In Proceedings of the 2012 International Conference on Systems and Informatics (ICSAI) . Yantai: IEEE; 2012:2159-2164.

Weizheng S, Yachun W, Zhanliang C, Hongda W: Grading method of leaf spot disease based on image processing. In 2008 international conference on computer science and software engineering . Wuhan: IEEE; 2008:491-494.

Wijekoon CP, Goodwin PH, Hsiang T: Quantifying fungal infection of plant leaves by digital image analysis using scion image software. J Microbiol Methods 2008, 74(2–3):94-101.

Wiwart M, Fordonski G, Zuk-Golaszewska K, Suchowilska E: Early diagnostics of macronutrient deficiencies in three legume species by color image analysis. Comput Electron Agric 2009, 65: 125-132. 10.1016/j.compag.2008.08.003

Xu G, Zhang F, Shah SG, Ye Y, Mao H: Use of leaf color images to identify nitrogen and potassium deficient tomatoes. Pattern Recognit Lett 2011, 32(11):1584-1590. 10.1016/j.patrec.2011.04.020

Yao Q, Guan Z, Zhou Y, Tang J, Hu Y, Yang B: Application of support vector machine for detecting rice diseases using shape and color texture features. In 2009 international conference on engineering computation . Hong Kong: IEEE; 2009:79-83.

Youwen T, Tianlai L, Yan N: The recognition of cucumber disease based on image processing and support vector machine. In 2008 congress on image and signal processing . Sanya: IEEE; 2008:262-267.

Zhang M: Citrus canker detection based on leaf images analysis. In The 2nd international conference on information science and engineering . Hangzhou: IEEE; 2010:3584-3587.

Zhang M, Meng Q: Automatic citrus canker detection from leaf images captured in field. Pattern Recognit Lett 2011, 32(15):2036-2046. 10.1016/j.patrec.2011.08.003

Zhou Z, Zang Y, Li Y, Zhang Y, Wang P, Luo X: Rice plant-hopper infestation detection and classification algorithms based on fractal dimension values and fuzzy C-means. Math Comput Model 2011, 58: 701-709.

Download references

Author information

Authors and affiliations.

Embrapa Agricultural Informatics, Campinas, SP, Brazil

Jayme Garcia Arnal Barbedo

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jayme Garcia Arnal Barbedo .

Additional information

Competing interests.

The author declares that he has no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Arnal Barbedo, J.G. Digital image processing techniques for detecting, quantifying and classifying plant diseases. SpringerPlus 2 , 660 (2013). https://doi.org/10.1186/2193-1801-2-660

Download citation

Received : 14 June 2013

Accepted : 26 September 2013

Published : 07 December 2013

DOI : https://doi.org/10.1186/2193-1801-2-660

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Radial Basis Function
  • Powdery Mildew
  • Texture Feature
  • Diseased Region

research paper on applications of digital image processing

1035 Natoma Street, San Francisco

This exquisite Edwardian single-family house has a 1344 Sqft main…

Customer Reviews

research paper on applications of digital image processing

Real-time intelligent image processing for security applications

  • Guest Editorial
  • Published: 05 September 2021
  • Volume 18 , pages 1787–1788, ( 2021 )

Cite this article

  • Akansha Singh 1 ,
  • Ping Li 2 ,
  • Krishna Kant Singh 3 &
  • Vijayalakshmi Saravana 4  

4089 Accesses

5 Citations

Explore all metrics

The advent of machine learning techniques and image processing techniques has led to new research opportunities in this area. Machine learning has enabled automatic extraction and analysis of information from images. The convergence of machine learning with image processing is useful in a variety of security applications. Image processing plays a significant role in physical as well as digital security. Physical security applications include homeland security, surveillance applications, identity authentication, and so on. Digital security implies protecting digital data. Techniques like digital watermarking, network security, and steganography enable digital security.

Avoid common mistakes on your manuscript.

1 Accepted papers

The rapidly increasing capabilities of imaging systems and techniques have opened new research areas in the security domain. The increase of cyber and physical crimes requires novel techniques to control them. In the case of both physical and digital security, real-time performance is crucial. The availability of the right image information at the right time will enable situational awareness. The real-time image processing techniques can perform the required operation by a latency being within the required time frame. Physical security applications like surveillance and object tracking will be practical only if provided in real time. Similarly, biometric authentication, watermarking or network security is also time restricted applications and requires real-time image processing. This special issue aims to bring together researchers to present novel tools and techniques for real-time image processing for security applications augmented by machine learning techniques.

This special issue on Real-Time Intelligent Image Processing for Security Applications comprises contributions on the topics in theory and applications related to the latest developments in security applications using image processing. Real-time imaging and video processing can be used for finding solutions to a variety of security problems. The special issue consists of the articles that address such security problems.

The paper entitled “RGB + D and deep learning-based real-time detection of suspicious event in Bank ATMs” presents a real-time detection method for human activities. The method is applied to enhance the surveillance and security of Bank Automated Teller Machine (ATM) [ 1 ]. The increasing number of illicit activities at ATMs has become a security concern.

The existing methods for surveillance involving human interaction are not very efficient. The human surveillance methods are highly dependent on the security personnel’s behavior. The real-time surveillance of these machines can be achieved by the proposed solution. The authors have presented a deep learning-based method for detecting the different kinds of motion from the video stream. The motions are classified as abnormal in case of any suspicious activity.

The paper entitled “A real-time person tracking system based on SiamMask network for intelligent video surveillance” presents a real-time surveillance system by tracking persons. The proposed solution can be applied to various public places, offices, buildings, etc., for tracking persons [ 2 ]. The authors have presented a person tracking and segmentation system using an overhead camera perspective.

The paper entitled “Adaptive and stabilized real-time super-resolution control for UAV-assisted smart harbor surveillance platforms” presents a method for smart harbor surveillance platforms [ 3 ]. The method utilizes drones for flexible localization of nodes. An algorithm for scheduling among the data transmitted by different drones and multi-access edge computing systems is proposed. In the second stage of the algorithm, all drones transmit their own data, and these data are utilized for surveillance. Further, the authors have used the concept of super resolution for improving the quality of data and surveillance. Lyapunov optimization-based method is used for maximizing the time-average performance of the system subject to stability of the self-adaptive super resolution control.

The paper entitled “Real-Time Video Summarizing using Image Semantic Segmentation for CBVR” presents a real-time video summarizing method using image semantic segmentation for CBVR [ 4 ]. The paper presents a method for summarizing the videos frame-wise using stacked generalization by an ensemble of different machine learning algorithms. Also, the ranks are given to videos on the basis of the time a particular building or monument appears in the video. The videos are retrieved using KD Tree. The method can be applied to different applications for security surveillance. The authors use video summarization using prominent objects in the video scene. The summary is used to query the video for extracting the required frames. The labeling is done using machine learning and image matching algorithms.

The paper entitled “A real-time classification model based on joint sparse-collaborative representation” presents a classification model based on joint sparse-collaborative representation [ 5 ]. The paper proposes the two-phase test sample representation method. The authors have made improvements in the first phase of the traditional two set method. The second phase has an imbalance in the training samples. Thus, the authors have included the unselected training samples in modeling. The proposed method is applied on numerous face databases. The method has shown good recognition accuracy.

The paper entitled “Recognizing Human Violent Action Using Drone Surveillance within Real-Time Proximity” presents a method for recognizing human violent action using drone surveillance [ 6 ]. The authors have presented a machine-driven recognition and classification of human actions from drone videos. A database is also created from an unconstrained environment using drones. Key-point extraction is performed and 2D skeletons for the persons in the frame are generated. These extracted key points are given as features in the classification module to recognize the actions. For classification, the authors have used SVM and Random Forest methods. The violent actions can be recognized using the proposed method.

2 Conclusion

The editors believe that the papers selected for this special issue will enhance the body of knowledge in the field of security using real-time imaging. We would like to thank the authors for contributing their works to this special issue. The editors would like to acknowledge and thank the reviewers for their insightful comments. These comments have been a guiding force in improving the quality of the papers. The editors would also like to thank the editorial staff for their support and help. We are especially thankful to the Journal of Real-Time Image Processing Chief Editors, Nasser Kehtarnavaz and Matthias F. Carlsohn, who provided us the opportunity to offer this special issue.

Khaire, P.A., Kumar, P.: RGB+ D and deep learning-based real-time detection of suspicious event in Bank-ATMs. J Real-Time Image Proc 23 , 1–3 (2021)

Google Scholar  

Ahmed, I., Jeon, G.: A real-time person tracking system based on SiamMask network for intelligent video surveillance. J Real-Time Image Proc 28 , 1–2 (2021)

Jung, S., Kim, J.: Adaptive and stabilized real-time super-resolution control for UAV-assisted smart harbor surveillance platforms. J Real-Time Image Proc 17 , 1–1 (2021)

Jain R, Jain P, Kumar T, Dhiman G (2021) Real time video summarizing using image semantic segmentation for CBVR. J Real-Time Image Proc.

Li Y, Jin J, Chen CLP (2021) A real-time classification model based on joint sparse-collaborative representation. J Real-Time Image Proc.

Srivastava A, Badal T, Garg A, Vidyarthi A, Singh R (2021) Recognizing human violent action using drone surveillance within real time proximity. J Real-Time Image Proc.

Download references

Author information

Authors and affiliations.

Computer Science Engineering Department, Bennett University, Greater Noida, India

Akansha Singh

Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong

Faculty of Engineering and Technology, Jain (Deemed-To-Be University), Bengaluru, India

Krishna Kant Singh

Department of Computer Science, University of South Dakota, Vermillion, USA

Vijayalakshmi Saravana

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Akansha Singh .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Singh, A., Li, P., Singh, K.K. et al. Real-time intelligent image processing for security applications. J Real-Time Image Proc 18 , 1787–1788 (2021). https://doi.org/10.1007/s11554-021-01169-w

Download citation

Published : 05 September 2021

Issue Date : October 2021

DOI : https://doi.org/10.1007/s11554-021-01169-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

95% of organizations plan to expand their usage of AI

AI + Machine Learning , Analyst Reports , Azure AI , Virtual Machines

New infrastructure for the era of AI: Emerging technology and trends in 2024

By Omar Khan General Manager, Azure Product Marketing

Posted on April 1, 2024 4 min read

  • Tag: Azure confidential computing
  • Tag: Generative AI
  • Tag: High-performance computing
  • Tag: Virtual Machines

This is part of a larger series on the new infrastructure of the era of AI, highlighting emerging technology and trends in large-scale compute. This month, we’re sharing the 2024 edition of the State of AI Infrastructure report to help businesses harness the power of AI now. 

The era of AI is upon us. You’ve heard about the latest advancements in our technology, the new AI solutions powered by Microsoft, our partners, and our customers, and the excitement is just beginning. To continue the pace of these innovations, companies need the best hardware that matches the workloads they are trying to run. This is what we call purpose-built infrastructure for AI—it’s infrastructure that is customized to meet your business needs. Now, let’s explore how Microsoft cloud infrastructure has evolved to support these emerging technologies.

background pattern

The State of AI Infrastructure

An annual report on trends and developments in AI infrastructure based on Microsoft commissioned surveys conducted by Forrester Consulting and Ipsos

Looking back at Microsoft’s biggest investments in AI infrastructure

2023 brought huge advancements in AI infrastructure . From new virtual machines to updated services, we’ve paved the way for AI advancements that include custom-built silicon and powerful supercomputers.

Some of the highlights of Microsoft AI infrastructure innovations in 2023 include:

  • Launching new Azure Virtual Machines powered by AMD Instinct and NVIDIA Hopper graphics processing units (GPUs), optimized for different AI and high-performance computing (HPC) workloads, such as large language models, mid-range AI training, and generative AI inferencing.
  • Introducing Azure confidential VMs with NVIDIA H100 GPUs—enabling secure and private AI applications on the cloud.
  • Developing custom-built silicon for AI and enterprise workloads, such as Azure Maia AI accelerator series, an AI accelerator chip, and Azure Cobalt CPU series, a cloud-native chip based on Arm architecture.
  • Building the third most powerful supercomputer in the world, Azure Eagle, with 14,400 NVIDIA H100 GPUs and Intel Xeon Sapphire Rapids processors and achieving the second best MLPerf Training v3.1 record submission using 10,752 H100 GPUs.

Understanding the state of AI and demand for new infrastructure

2024 is shaping up to be an even more promising year for AI than its predecessor. With the rapid pace of technological advancements, AI infrastructure is becoming more diverse and widespread than ever before. From cloud to edge, CPUs to GPUs, and application-specific integrated circuits (ASICs), the AI hardware and software landscape is expanding at an impressive rate.

To help you keep up with the current state of AI, its trends and challenges, and to learn about best practices for building and deploying scalable and efficient AI systems, we’ve recently published our Microsoft Azure: The State of AI Infrastructure report. The report addresses the following key themes:

  • Using AI for organizational and personal advancement AI is revolutionizing the way businesses operate, with an overwhelming 95% of organizations planning to expand their usage in the next two years. Recent research commissioned by Microsoft highlights the role of AI in driving innovation and competition. Beyond mandates, individuals within these organizations recognize the value AI brings to their roles and the success of their companies. IT professionals are at the forefront of AI adoption and use, with 68% of those surveyed already implementing it in their professional work. But it doesn’t stop there—AI is also being used in their personal lives, with 66% of those surveyed incorporating it into their daily routines. AI’s transformative potential spans across industries, from improving diagnostic accuracy in healthcare to optimizing customer service through intelligent chatbots . As AI shapes the future of work, it’s essential for organizations to embrace its adoption to stay competitive in an ever-evolving business landscape.
  • Navigating from AI exploration to implementation The implementation of AI in businesses is still in its early stages, with one-third of companies exploring and planning their approach. However, a significant segment has progressed to pilot testing, experimenting with AI’s capabilities in real-world scenarios. They’re taking that next critical step towards full-scale implementation. This phase is crucial as it allows businesses to gauge the effectiveness of AI, tailor it to their specific needs, and identify any potential issues before a wider rollout. Because of this disparity in adoption, organizations have a unique opportunity to differentiate themselves and gain a competitive advantage by accelerating their AI initiatives. However, many organizations will need to make significant tech and infrastructure changes before they can fully leverage AI’s benefits. Those who can quickly navigate from exploration to implementation will establish themselves as leaders in leveraging AI for innovation, efficiency, and enhanced decision-making.
  • Acknowledging challenges of building and maintaining AI infrastructure To fully leverage AI’s potential, companies need to ensure they have a solid foundation to support their AI strategies and drive innovation. Like the transportation industry, a solid infrastructure to manage everyday congestion is crucial. However, AI infrastructure skilling remains the largest challenge, both within companies and in the job market. This challenge is multifaceted, encompassing issues such as the complexity of orchestrating AI workloads, a shortage of skilled personnel to manage AI systems, and the rapid pace at which AI technology evolves. These hurdles can impede an organization’s ability to fully leverage AI’s potential, leading to inefficiencies and missed opportunities.
  • Leveraging partners to accelerate AI innovation Strategic partnerships play a pivotal role in the AI journey of organizations. As companies delve deeper into AI, they often seek out solution providers with deep AI expertise and a track record of proven AI solutions. These partnerships are instrumental in accelerating AI production and addressing the complex challenges of AI infrastructure. Partners are expected to assist with a range of needs, including infrastructure design, training, security, compliance, and strategic planning. As businesses progress in their AI implementation, their priorities shift towards performance, optimization, and cloud provider integration. Engaging the right partner can significantly expedite the AI journey for businesses of any size and at any stage of AI implementation. This presents a substantial opportunity for partners to contribute, but it also places a responsibility on them to ensure their staff is adequately prepared to provide consulting, strategy, and training services.

Discover more

To drive major AI innovation , companies must overcome many challenges at a breakneck pace. Our insights in The State of AI Infrastructure report underscore the need for a strategic approach to building and maintaining AI infrastructure that is agile, scalable, and capable of adapting to the latest technological advancements. By addressing these infrastructure challenges, companies can ensure they have a solid foundation to support their AI strategies and drive innovation.

  • Annual Roundup of AI Infrastructure Breakthroughs for 2023

#AIInfraMarketPulse

Let us know what you think of Azure and what you would like to see in the future.

Provide feedback

Build your cloud computing and Azure skills with free courses by Microsoft Learn.

Explore Azure learning

Related posts

AI + Machine Learning , Azure AI , Azure VMware Solution , Events , Microsoft Copilot for Azure , Microsoft Defender for Cloud

Get ready for AI at the Migrate to Innovate digital event   chevron_right

AI + Machine Learning , Azure AI Speech , Azure AI Studio , Azure OpenAI Service , Azure SQL Database

What’s new in Azure Data, AI, and Digital Applications: Helping you navigate the fast pace of change   chevron_right

AI + Machine Learning , Announcements , Azure AI , Azure AI Search

Announcing updates to Azure AI Search to help organizations build and scale generative AI applications   chevron_right

AI + Machine Learning , Azure AI , Industry trends

Azure Maia for the era of AI: From silicon to software to systems   chevron_right

IMAGES

  1. (PDF) Digital Image Processing Using Machine Learning

    research paper on applications of digital image processing

  2. (PDF) Application of Digital Image Processing Technology in Textile and

    research paper on applications of digital image processing

  3. 😊 Research paper on digital image processing. Digital Image Processing

    research paper on applications of digital image processing

  4. (PDF) Application of Digital Image Processing Techniques with MATLAB

    research paper on applications of digital image processing

  5. 😊 Research paper on digital image processing. Digital Image Processing

    research paper on applications of digital image processing

  6. (PDF) The Application of Digital Image Processing in the Evaluation of

    research paper on applications of digital image processing

VIDEO

  1. Paper review: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

  2. 1. Introduction to Digital Image Processing

  3. Lec 03

  4. Lec 02

  5. Paper presentation on Digital Image Processing || PPT on Digital Image

  6. Model Question Paper, Set-1, Q2abc solution

COMMENTS

  1. Image Processing: Research Opportunities and Challenges

    Interest in digital image processing methods stems from two principal application areas: improvement of pictorial information for human interpretation; and processing of image data for storage ...

  2. 267349 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DIGITAL IMAGE PROCESSING. Find methods information, sources, references or conduct a literature ...

  3. Overview of Research Progress of Digital Image Processing ...

    Abstract. Digital image processing technology has gone through rapid development and is extensively applied in daily life and production, with the rapid development of modern information technology. It plays an inestimable role in remote sensing, medicine, recognition and other fields. This paper briefly introduces the basic concept of digital ...

  4. Digital Image Processing: Advanced Technologies and Applications

    We are open to papers that address a diverse range of topics, from foundational issues up to novel algorithms that aim to provide state-of-the-art solutions and technological systems for practical and feasible applications. The Special Issue on "Digital Image Processing: Advanced Technologies and Applications" covers rising trends in image ...

  5. Application research of digital media image processing ...

    With the development of information technology, people access information more and more rely on the network, and more than 80% of the information in the network is replaced by multimedia technology represented by images. Therefore, the research on image processing technology is very important, but most of the research on image processing technology is focused on a certain aspect. The research ...

  6. digital image processing Latest Research Papers

    Abstract Digital image processing technologies are used to extract and evaluate the cracks of heritage rock in this paper. Firstly, the image needs to go through a series of image preprocessing operations such as graying, enhancement, filtering and binaryzation to filter out a large part of the noise. Then, in order to achieve the requirements ...

  7. Digital Image Processing: Principles and Applications

    Digital image processing is an important part in digital analysis of remote sensing data. It allows one to enhance image features of interest while attenuating irrelevant features of a given application and then extract useful information about the scene from the enhanced image. It comprises the four basic steps, which include image correction ...

  8. Applications of Digital Image Processing

    8.1 Introduction. In the real world, digital image processing is a very effective tool for solving various problems. Image processing is widely used in industry to find defects in products, for packing products, and for identification of objects. It is also used for identification of people and other problems such as traffic, and so on, which ...

  9. Real-time intelligent image processing for the internet of things

    Overall, the eleven papers appearing in this special issue demonstrate multiple perspectives and approaches with implications for the theories, models, and algorithms used in real-time image processing and its IoT applications. These papers identify frameworks and techniques for artificial intelligence and deep learning, helping the field to ...

  10. IOPscience

    Finally, the development trend of digital image processing can be briefly analyzed, and the developing direction of digital image processing technology is expressed. This paper is beneficial to understand the latest technology and development trends in digital image processing, and can promote in-depth research

  11. Digital Image Processing

    In this paper we give a tutorial overview of the field of digital image processing. Following a brief discussion of some basic concepts in this area, image processing algorithms are presented with emphasis on fundamental techniques which are broadly applicable to a number of applications. In addition to several real-world examples of such techniques, we also discuss the applicability of ...

  12. Application research of digital image technology in graphic design

    1. Introduction. Digital image processing technology uses two-dimensional data as the information source to improve the visual image effect through the technical processing of image signals [1].Digital images have outstanding advantages in signal acquisition, recognition, processing and conversion, and have wider application fields.

  13. Applications of Digital Image Processing XLIII

    In this paper we review and summarize research results concerning video encoding parallelization, with a primary focus on medium and fine grained methods that operate at block or inner block levels. ... allowing through digital image processing techniques the visualization in real time of the subject's veins through the computer screen. As a ...

  14. Advances in image processing using machine learning techniques

    With the recent advances in digital technology, there is an eminent integration of ML and image processing to help resolve complex problems. In this special issue, we received six interesting papers covering the following topics: image prediction, image segmentation, clustering, compressed sensing, variational learning, and dynamic light coding.

  15. Frontiers

    The field of image processing has been the subject of intensive research and development activities for several decades. This broad area encompasses topics such as image/video processing, image/video analysis, image/video communications, image/video sensing, modeling and representation, computational imaging, electronic imaging, information forensics and security, 3D imaging, medical imaging ...

  16. Applied Sciences

    Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

  17. Application research of digital image technology in graphic design

    The characteristics of digital image processing are manifested in four aspects: First, it can ensure the image input and output are consistent. In analog image processing, the application of technology may reduce image quality, while digital images ensure input and output consistency. Second, digital images are more accurate in processing.

  18. Digital Image Processing and IoT in Smart Health Care -A review

    This paper discusses the impact of online image processing methods in IoT-based health care, which can be beneficial in the health sector for predicting some major human diseases. ... the role of Digital Image Processing in IOT based applications of health care, a case study of IoT-based healthcare application of disease classification.

  19. Research on the Digital Image Processing Method Based on ...

    The following is an in-depth study of the Prewitt operator parallel algorithm with different processing strategies, so as to compare and analyze the application performance of CUDA parallel computing in digital image processing. First, process the image line by line. This kind of algorithm needs to complete the initialization of the data on the ...

  20. Research and implementation of a digital image processing education

    Digital image processing is an important course, which has strong theoretical and practical needs for students. This paper proposes a digital image processing education platform (DIPEP) based on C# and .NET framework. It has the image processing, analyzing and visualization function. Students can develop and integrate algorithms into the platform quickly and easily. Moreover, algorithm flow ...

  21. Digital image processing techniques for detecting, quantifying and

    This paper presents a survey on methods that use digital image processing techniques to detect, quantify and classify plant diseases from digital images in the visible spectrum. Although disease symptoms can manifest in any part of the plant, only methods that explore visible symptoms in leaves and stems were considered. This was done for two main reasons: to limit the length of the paper and ...

  22. Research Paper On Applications Of Digital Image Processing

    Your Price: .35 per page. Research Paper On Applications Of Digital Image Processing -.

  23. An interior design framework utilizing image processing and virtual

    With the rapid development of science and technology, Virtual Reality (VR) technology based on image processing shows great potential and innovation in the field of interior design. The purpose of this paper is to explore the application of VR technology based on image processing in interior design system and analyze its potential impact on interior design field. Through the in-depth study of ...

  24. Real-time intelligent image processing for security applications

    The advent of machine learning techniques and image processing techniques has led to new research opportunities in this area. Machine learning has enabled automatic extraction and analysis of information from images. The convergence of machine learning with image processing is useful in a variety of security applications. Image processing plays a significant role in physical as well as digital ...

  25. New infrastructure for the era of AI: Emerging technology and trends in

    By Omar Khan General Manager, Azure Product Marketing. This is part of a larger series on the new infrastructure of the era of AI, highlighting emerging technology and trends in large-scale compute. This month, we're sharing the 2024 edition of the State of AI Infrastructure report to help businesses harness the power of AI now.