• Reference Manager
  • Simple TEXT file

People also looked at

Specialty grand challenge article, grand challenges in image processing.

www.frontiersin.org

  • Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et Systèmes, Gif-sur-Yvette, France

Introduction

The field of image processing has been the subject of intensive research and development activities for several decades. This broad area encompasses topics such as image/video processing, image/video analysis, image/video communications, image/video sensing, modeling and representation, computational imaging, electronic imaging, information forensics and security, 3D imaging, medical imaging, and machine learning applied to these respective topics. Hereafter, we will consider both image and video content (i.e. sequence of images), and more generally all forms of visual information.

Rapid technological advances, especially in terms of computing power and network transmission bandwidth, have resulted in many remarkable and successful applications. Nowadays, images are ubiquitous in our daily life. Entertainment is one class of applications that has greatly benefited, including digital TV (e.g., broadcast, cable, and satellite TV), Internet video streaming, digital cinema, and video games. Beyond entertainment, imaging technologies are central in many other applications, including digital photography, video conferencing, video monitoring and surveillance, satellite imaging, but also in more distant domains such as healthcare and medicine, distance learning, digital archiving, cultural heritage or the automotive industry.

In this paper, we highlight a few research grand challenges for future imaging and video systems, in order to achieve breakthroughs to meet the growing expectations of end users. Given the vastness of the field, this list is by no means exhaustive.

A Brief Historical Perspective

We first briefly discuss a few key milestones in the field of image processing. Key inventions in the development of photography and motion pictures can be traced to the 19th century. The earliest surviving photograph of a real-world scene was made by Nicéphore Niépce in 1827 ( Hirsch, 1999 ). The Lumière brothers made the first cinematographic film in 1895, with a public screening the same year ( Lumiere, 1996 ). After decades of remarkable developments, the second half of the 20th century saw the emergence of new technologies launching the digital revolution. While the first prototype digital camera using a Charge-Coupled Device (CCD) was demonstrated in 1975, the first commercial consumer digital cameras started appearing in the early 1990s. These digital cameras quickly surpassed cameras using films and the digital revolution in the field of imaging was underway. As a key consequence, the digital process enabled computational imaging, in other words the use of sophisticated processing algorithms in order to produce high quality images.

In 1992, the Joint Photographic Experts Group (JPEG) released the JPEG standard for still image coding ( Wallace, 1992 ). In parallel, in 1993, the Moving Picture Experts Group (MPEG) published its first standard for coding of moving pictures and associated audio, MPEG-1 ( Le Gall, 1991 ), and a few years later MPEG-2 ( Haskell et al., 1996 ). By guaranteeing interoperability, these standards have been essential in many successful applications and services, for both the consumer and business markets. In particular, it is remarkable that, almost 30 years later, JPEG remains the dominant format for still images and photographs.

In the late 2000s and early 2010s, we could observe a paradigm shift with the appearance of smartphones integrating a camera. Thanks to advances in computational photography, these new smartphones soon became capable of rivaling the quality of consumer digital cameras at the time. Moreover, these smartphones were also capable of acquiring video sequences. Almost concurrently, another key evolution was the development of high bandwidth networks. In particular, the launch of 4G wireless services circa 2010 enabled users to quickly and efficiently exchange multimedia content. From this point, most of us are carrying a camera, anywhere and anytime, allowing to capture images and videos at will and to seamlessly exchange them with our contacts.

As a direct consequence of the above developments, we are currently observing a boom in the usage of multimedia content. It is estimated that today 3.2 billion images are shared each day on social media platforms, and 300 h of video are uploaded every minute on YouTube 1 . In a 2019 report, Cisco estimated that video content represented 75% of all Internet traffic in 2017, and this share is forecasted to grow to 82% in 2022 ( Cisco, 2019 ). While Internet video streaming and Over-The-Top (OTT) media services account for a significant bulk of this traffic, other applications are also expected to see significant increases, including video surveillance and Virtual Reality (VR)/Augmented Reality (AR).

Hyper-Realistic and Immersive Imaging

A major direction and key driver to research and development activities over the years has been the objective to deliver an ever-improving image quality and user experience.

For instance, in the realm of video, we have observed constantly increasing spatial and temporal resolutions, with the emergence nowadays of Ultra High Definition (UHD). Another aim has been to provide a sense of the depth in the scene. For this purpose, various 3D video representations have been explored, including stereoscopic 3D and multi-view ( Dufaux et al., 2013 ).

In this context, the ultimate goal is to be able to faithfully represent the physical world and to deliver an immersive and perceptually hyperrealist experience. For this purpose, we discuss hereafter some emerging innovations. These developments are also very relevant in VR and AR applications ( Slater, 2014 ). Finally, while this paper is only focusing on the visual information processing aspects, it is obvious that emerging display technologies ( Masia et al., 2013 ) and audio also plays key roles in many application scenarios.

Light Fields, Point Clouds, Volumetric Imaging

In order to wholly represent a scene, the light information coming from all the directions has to be represented. For this purpose, the 7D plenoptic function is a key concept ( Adelson and Bergen, 1991 ), although it is unmanageable in practice.

By introducing additional constraints, the light field representation collects radiance from rays in all directions. Therefore, it contains a much richer information, when compared to traditional 2D imaging that captures a 2D projection of the light in the scene integrating the angular domain. For instance, this allows post-capture processing such as refocusing and changing the viewpoint. However, it also entails several technical challenges, in terms of acquisition and calibration, as well as computational image processing steps including depth estimation, super-resolution, compression and image synthesis ( Ihrke et al., 2016 ; Wu et al., 2017 ). The resolution trade-off between spatial and angular resolutions is a fundamental issue. With a significant fraction of the earlier work focusing on static light fields, it is also expected that dynamic light field videos will stimulate more interest in the future. In particular, dense multi-camera arrays are becoming more tractable. Finally, the development of efficient light field compression and streaming techniques is a key enabler in many applications ( Conti et al., 2020 ).

Another promising direction is to consider a point cloud representation. A point cloud is a set of points in the 3D space represented by their spatial coordinates and additional attributes, including color pixel values, normals, or reflectance. They are often very large, easily ranging in the millions of points, and are typically sparse. One major distinguishing feature of point clouds is that, unlike images, they do not have a regular structure, calling for new algorithms. To remove the noise often present in acquired data, while preserving the intrinsic characteristics, effective 3D point cloud filtering approaches are needed ( Han et al., 2017 ). It is also important to develop efficient techniques for Point Cloud Compression (PCC). For this purpose, MPEG is developing two standards: Geometry-based PCC (G-PCC) and Video-based PCC (V-PCC) ( Graziosi et al., 2020 ). G-PCC considers the point cloud in its native form and compress it using 3D data structures such as octrees. Conversely, V-PCC projects the point cloud onto 2D planes and then applies existing video coding schemes. More recently, deep learning-based approaches for PCC have been shown to be effective ( Guarda et al., 2020 ). Another challenge is to develop generic and robust solutions able to handle potentially widely varying characteristics of point clouds, e.g. in terms of size and non-uniform density. Efficient solutions for dynamic point clouds are also needed. Finally, while many techniques focus on the geometric information or the attributes independently, it is paramount to process them jointly.

High Dynamic Range and Wide Color Gamut

The human visual system is able to perceive, using various adaptation mechanisms, a broad range of luminous intensities, from very bright to very dark, as experienced every day in the real world. Nonetheless, current imaging technologies are still limited in terms of capturing or rendering such a wide range of conditions. High Dynamic Range (HDR) imaging aims at addressing this issue. Wide Color Gamut (WCG) is also often associated with HDR in order to provide a wider colorimetry.

HDR has reached some levels of maturity in the context of photography. However, extending HDR to video sequences raises scientific challenges in order to provide high quality and cost-effective solutions, impacting the whole imaging processing pipeline, including content acquisition, tone reproduction, color management, coding, and display ( Dufaux et al., 2016 ; Chalmers and Debattista, 2017 ). Backward compatibility with legacy content and traditional systems is another issue. Despite recent progress, the potential of HDR has not been fully exploited yet.

Coding and Transmission

Three decades of standardization activities have continuously improved the hybrid video coding scheme based on the principles of transform coding and predictive coding. The Versatile Video Coding (VVC) standard has been finalized in 2020 ( Bross et al., 2021 ), achieving approximately 50% bit rate reduction for the same subjective quality when compared to its predecessor, High Efficiency Video Coding (HEVC). While substantially outperforming VVC in the short term may be difficult, one encouraging direction is to rely on improved perceptual models to further optimize compression in terms of visual quality. Another direction, which has already shown promising results, is to apply deep learning-based approaches ( Ding et al., 2021 ). Here, one key issue is the ability to generalize these deep models to a wide diversity of video content. The second key issue is the implementation complexity, both in terms of computation and memory requirements, which is a significant obstacle to a widespread deployment. Besides, the emergence of new video formats targeting immersive communications is also calling for new coding schemes ( Wien et al., 2019 ).

Considering that in many application scenarios, videos are processed by intelligent analytic algorithms rather than viewed by users, another interesting track is the development of video coding for machines ( Duan et al., 2020 ). In this context, the compression is optimized taking into account the performance of video analysis tasks.

The push toward hyper-realistic and immersive visual communications entails most often an increasing raw data rate. Despite improved compression schemes, more transmission bandwidth is needed. Moreover, some emerging applications, such as VR/AR, autonomous driving, and Industry 4.0, bring a strong requirement for low latency transmission, with implications on both the imaging processing pipeline and the transmission channel. In this context, the emergence of 5G wireless networks will positively contribute to the deployment of new multimedia applications, and the development of future wireless communication technologies points toward promising advances ( Da Costa and Yang, 2020 ).

Human Perception and Visual Quality Assessment

It is important to develop effective models of human perception. On the one hand, it can contribute to the development of perceptually inspired algorithms. On the other hand, perceptual quality assessment methods are needed in order to optimize and validate new imaging solutions.

The notion of Quality of Experience (QoE) relates to the degree of delight or annoyance of the user of an application or service ( Le Callet et al., 2012 ). QoE is strongly linked to subjective and objective quality assessment methods. Many years of research have resulted in the successful development of perceptual visual quality metrics based on models of human perception ( Lin and Kuo, 2011 ; Bovik, 2013 ). More recently, deep learning-based approaches have also been successfully applied to this problem ( Bosse et al., 2017 ). While these perceptual quality metrics have achieved good performances, several significant challenges remain. First, when applied to video sequences, most current perceptual metrics are applied on individual images, neglecting temporal modeling. Second, whereas color is a key attribute, there are currently no widely accepted perceptual quality metrics explicitly considering color. Finally, new modalities, such as 360° videos, light fields, point clouds, and HDR, require new approaches.

Another closely related topic is image esthetic assessment ( Deng et al., 2017 ). The esthetic quality of an image is affected by numerous factors, such as lighting, color, contrast, and composition. It is useful in different application scenarios such as image retrieval and ranking, recommendation, and photos enhancement. While earlier attempts have used handcrafted features, most recent techniques to predict esthetic quality are data driven and based on deep learning approaches, leveraging the availability of large annotated datasets for training ( Murray et al., 2012 ). One key challenge is the inherently subjective nature of esthetics assessment, resulting in ambiguity in the ground-truth labels. Another important issue is to explain the behavior of deep esthetic prediction models.

Analysis, Interpretation and Understanding

Another major research direction has been the objective to efficiently analyze, interpret and understand visual data. This goal is challenging, due to the high diversity and complexity of visual data. This has led to many research activities, involving both low-level and high-level analysis, addressing topics such as image classification and segmentation, optical flow, image indexing and retrieval, object detection and tracking, and scene interpretation and understanding. Hereafter, we discuss some trends and challenges.

Keypoints Detection and Local Descriptors

Local imaging matching has been the cornerstone of many analysis tasks. It involves the detection of keypoints, i.e. salient visual points that can be robustly and repeatedly detected, and descriptors, i.e. a compact signature locally describing the visual features at each keypoint. It allows to subsequently compute pairwise matching between the features to reveal local correspondences. In this context, several frameworks have been proposed, including Scale Invariant Feature Transform (SIFT) ( Lowe, 2004 ) and Speeded Up Robust Features (SURF) ( Bay et al., 2008 ), and later binary variants including Binary Robust Independent Elementary Feature (BRIEF) ( Calonder et al., 2010 ), Oriented FAST and Rotated BRIEF (ORB) ( Rublee et al., 2011 ) and Binary Robust Invariant Scalable Keypoints (BRISK) ( Leutenegger et al., 2011 ). Although these approaches exhibit scale and rotation invariance, they are less suited to deal with large 3D distortions such as perspective deformations, out-of-plane rotations, and significant viewpoint changes. Besides, they tend to fail under significantly varying and challenging illumination conditions.

These traditional approaches based on handcrafted features have been successfully applied to problems such as image and video retrieval, object detection, visual Simultaneous Localization And Mapping (SLAM), and visual odometry. Besides, the emergence of new imaging modalities as introduced above can also be beneficial for image analysis tasks, including light fields ( Galdi et al., 2019 ), point clouds ( Guo et al., 2020 ), and HDR ( Rana et al., 2018 ). However, when applied to high-dimensional visual data for semantic analysis and understanding, these approaches based on handcrafted features have been supplanted in recent years by approaches based on deep learning.

Deep Learning-Based Methods

Data-driven deep learning-based approaches ( LeCun et al., 2015 ), and in particular the Convolutional Neural Network (CNN) architecture, represent nowadays the state-of-the-art in terms of performances for complex pattern recognition tasks in scene analysis and understanding. By combining multiple processing layers, deep models are able to learn data representations with different levels of abstraction.

Supervised learning is the most common form of deep learning. It requires a large and fully labeled training dataset, a typically time-consuming and expensive process needed whenever tackling a new application scenario. Moreover, in some specialized domains, e.g. medical data, it can be very difficult to obtain annotations. To alleviate this major burden, methods such as transfer learning and weakly supervised learning have been proposed.

In another direction, deep models have been shown to be vulnerable to adversarial attacks ( Akhtar and Mian, 2018 ). Those attacks consist in introducing subtle perturbations to the input, such that the model predicts an incorrect output. For instance, in the case of images, imperceptible pixel differences are able to fool deep learning models. Such adversarial attacks are definitively an important obstacle to the successful deployment of deep learning, especially in applications where safety and security are critical. While some early solutions have been proposed, a significant challenge is to develop effective defense mechanisms against those attacks.

Finally, another challenge is to enable low complexity and efficient implementations. This is especially important for mobile or embedded applications. For this purpose, further interactions between signal processing and machine learning can potentially bring additional benefits. For instance, one direction is to compress deep neural networks in order to enable their more efficient handling. Moreover, by combining traditional processing techniques with deep learning models, it is possible to develop low complexity solutions while preserving high performance.

Explainability in Deep Learning

While data-driven deep learning models often achieve impressive performances on many visual analysis tasks, their black-box nature often makes it inherently very difficult to understand how they reach a predicted output and how it relates to particular characteristics of the input data. However, this is a major impediment in many decision-critical application scenarios. Moreover, it is important not only to have confidence in the proposed solution, but also to gain further insights from it. Based on these considerations, some deep learning systems aim at promoting explainability ( Adadi and Berrada, 2018 ; Xie et al., 2020 ). This can be achieved by exhibiting traits related to confidence, trust, safety, and ethics.

However, explainable deep learning is still in its early phase. More developments are needed, in particular to develop a systematic theory of model explanation. Important aspects include the need to understand and quantify risk, to comprehend how the model makes predictions for transparency and trustworthiness, and to quantify the uncertainty in the model prediction. This challenge is key in order to deploy and use deep learning-based solutions in an accountable way, for instance in application domains such as healthcare or autonomous driving.

Self-Supervised Learning

Self-supervised learning refers to methods that learn general visual features from large-scale unlabeled data, without the need for manual annotations. Self-supervised learning is therefore very appealing, as it allows exploiting the vast amount of unlabeled images and videos available. Moreover, it is widely believed that it is closer to how humans actually learn. One common approach is to use the data to provide the supervision, leveraging its structure. More generally, a pretext task can be defined, e.g. image inpainting, colorizing grayscale images, predicting future frames in videos, by withholding some parts of the data and by training the neural network to predict it ( Jing and Tian, 2020 ). By learning an objective function corresponding to the pretext task, the network is forced to learn relevant visual features in order to solve the problem. Self-supervised learning has also been successfully applied to autonomous vehicles perception. More specifically, the complementarity between analytical and learning methods can be exploited to address various autonomous driving perception tasks, without the prerequisite of an annotated data set ( Chiaroni et al., 2021 ).

While good performances have already been obtained using self-supervised learning, further work is still needed. A few promising directions are outlined hereafter. Combining self-supervised learning with other learning methods is a first interesting path. For instance, semi-supervised learning ( Van Engelen and Hoos, 2020 ) and few-short learning ( Fei-Fei et al., 2006 ) methods have been proposed for scenarios where limited labeled data is available. The performance of these methods can potentially be boosted by incorporating a self-supervised pre-training. The pretext task can also serve to add regularization. Another interesting trend in self-supervised learning is to train neural networks with synthetic data. The challenge here is to bridge the domain gap between the synthetic and real data. Finally, another compelling direction is to exploit data from different modalities. A simple example is to consider both the video and audio signals in a video sequence. In another example in the context of autonomous driving, vehicles are typically equipped with multiple sensors, including cameras, LIght Detection And Ranging (LIDAR), Global Positioning System (GPS), and Inertial Measurement Units (IMU). In such cases, it is easy to acquire large unlabeled multimodal datasets, where the different modalities can be effectively exploited in self-supervised learning methods.

Reproducible Research and Large Public Datasets

The reproducible research initiative is another way to further ensure high-quality research for the benefit of our community ( Vandewalle et al., 2009 ). Reproducibility, referring to the ability by someone else working independently to accurately reproduce the results of an experiment, is a key principle of the scientific method. In the context of image and video processing, it is usually not sufficient to provide a detailed description of the proposed algorithm. Most often, it is essential to also provide access to the code and data. This is even more imperative in the case of deep learning-based models.

In parallel, the availability of large public datasets is also highly desirable in order to support research activities. This is especially critical for new emerging modalities or specific application scenarios, where it is difficult to get access to relevant data. Moreover, with the emergence of deep learning, large datasets, along with labels, are often needed for training, which can be another burden.

Conclusion and Perspectives

The field of image processing is very broad and rich, with many successful applications in both the consumer and business markets. However, many technical challenges remain in order to further push the limits in imaging technologies. Two main trends are on the one hand to always improve the quality and realism of image and video content, and on the other hand to be able to effectively interpret and understand this vast and complex amount of visual data. However, the list is certainly not exhaustive and there are many other interesting problems, e.g. related to computational imaging, information security and forensics, or medical imaging. Key innovations will be found at the crossroad of image processing, optics, psychophysics, communication, computer vision, artificial intelligence, and computer graphics. Multi-disciplinary collaborations are therefore critical moving forward, involving actors from both academia and the industry, in order to drive these breakthroughs.

The “Image Processing” section of Frontier in Signal Processing aims at giving to the research community a forum to exchange, discuss and improve new ideas, with the goal to contribute to the further advancement of the field of image processing and to bring exciting innovations in the foreseeable future.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1 https://www.brandwatch.com/blog/amazing-social-media-statistics-and-facts/ (accessed on Feb. 23, 2021).

Adadi, A., and Berrada, M. (2018). Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access 6, 52138–52160. doi:10.1109/access.2018.2870052

CrossRef Full Text | Google Scholar

Adelson, E. H., and Bergen, J. R. (1991). “The plenoptic function and the elements of early vision” Computational models of visual processing . Cambridge, MA: MIT Press , 3-20.

Google Scholar

Akhtar, N., and Mian, A. (2018). Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430. doi:10.1109/access.2018.2807385

Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vis. image understanding 110 (3), 346–359. doi:10.1016/j.cviu.2007.09.014

Bosse, S., Maniry, D., Müller, K. R., Wiegand, T., and Samek, W. (2017). Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 27 (1), 206–219. doi:10.1109/TIP.2017.2760518

PubMed Abstract | CrossRef Full Text | Google Scholar

Bovik, A. C. (2013). Automatic prediction of perceptual image and video quality. Proc. IEEE 101 (9), 2008–2024. doi:10.1109/JPROC.2013.2257632

Bross, B., Chen, J., Ohm, J. R., Sullivan, G. J., and Wang, Y. K. (2021). Developments in international video coding standardization after AVC, with an overview of Versatile Video Coding (VVC). Proc. IEEE . doi:10.1109/JPROC.2020.3043399

Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010). Brief: binary robust independent elementary features. In K. Daniilidis, P. Maragos, and N. Paragios (eds) European conference on computer vision . Berlin, Heidelberg: Springer , 778–792. doi:10.1007/978-3-642-15561-1_56

Chalmers, A., and Debattista, K. (2017). HDR video past, present and future: a perspective. Signal. Processing: Image Commun. 54, 49–55. doi:10.1016/j.image.2017.02.003

Chiaroni, F., Rahal, M.-C., Hueber, N., and Dufaux, F. (2021). Self-supervised learning for autonomous vehicles perception: a conciliation between analytical and learning methods. IEEE Signal. Process. Mag. 38 (1), 31–41. doi:10.1109/msp.2020.2977269

Cisco, (20192019). Cisco visual networking index: forecast and trends, 2017-2022 (white paper) , Indianapolis, Indiana: Cisco Press .

Conti, C., Soares, L. D., and Nunes, P. (2020). Dense light field coding: a survey. IEEE Access 8, 49244–49284. doi:10.1109/ACCESS.2020.2977767

Da Costa, D. B., and Yang, H.-C. (2020). Grand challenges in wireless communications. Front. Commun. Networks 1 (1), 1–5. doi:10.3389/frcmn.2020.00001

Deng, Y., Loy, C. C., and Tang, X. (2017). Image aesthetic assessment: an experimental survey. IEEE Signal. Process. Mag. 34 (4), 80–106. doi:10.1109/msp.2017.2696576

Ding, D., Ma, Z., Chen, D., Chen, Q., Liu, Z., and Zhu, F. (2021). Advances in video compression system using deep neural network: a review and case studies . Ithaca, NY: Cornell university .

Duan, L., Liu, J., Yang, W., Huang, T., and Gao, W. (2020). Video coding for machines: a paradigm of collaborative compression and intelligent analytics. IEEE Trans. Image Process. 29, 8680–8695. doi:10.1109/tip.2020.3016485

Dufaux, F., Le Callet, P., Mantiuk, R., and Mrak, M. (2016). High dynamic range video - from acquisition, to display and applications . Cambridge, Massachusetts: Academic Press .

Dufaux, F., Pesquet-Popescu, B., and Cagnazzo, M. (2013). Emerging technologies for 3D video: creation, coding, transmission and rendering . Hoboken, NJ: Wiley .

Fei-Fei, L., Fergus, R., and Perona, P. (2006). One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach Intell. 28 (4), 594–611. doi:10.1109/TPAMI.2006.79

Galdi, C., Chiesa, V., Busch, C., Lobato Correia, P., Dugelay, J.-L., and Guillemot, C. (2019). Light fields for face analysis. Sensors 19 (12), 2687. doi:10.3390/s19122687

Graziosi, D., Nakagami, O., Kuma, S., Zaghetto, A., Suzuki, T., and Tabatabai, A. (2020). An overview of ongoing point cloud compression standardization activities: video-based (V-PCC) and geometry-based (G-PCC). APSIPA Trans. Signal Inf. Process. 9, 2020. doi:10.1017/ATSIP.2020.12

Guarda, A., Rodrigues, N., and Pereira, F. (2020). Adaptive deep learning-based point cloud geometry coding. IEEE J. Selected Top. Signal Process. 15, 415-430. doi:10.1109/mmsp48831.2020.9287060

Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., and Bennamoun, M. (2020). Deep learning for 3D point clouds: a survey. IEEE transactions on pattern analysis and machine intelligence . doi:10.1109/TPAMI.2020.3005434

Han, X.-F., Jin, J. S., Wang, M.-J., Jiang, W., Gao, L., and Xiao, L. (2017). A review of algorithms for filtering the 3D point cloud. Signal. Processing: Image Commun. 57, 103–112. doi:10.1016/j.image.2017.05.009

Haskell, B. G., Puri, A., and Netravali, A. N. (1996). Digital video: an introduction to MPEG-2 . Berlin, Germany: Springer Science and Business Media .

Hirsch, R. (1999). Seizing the light: a history of photography . New York, NY: McGraw-Hill .

Ihrke, I., Restrepo, J., and Mignard-Debise, L. (2016). Principles of light field imaging: briefly revisiting 25 years of research. IEEE Signal. Process. Mag. 33 (5), 59–69. doi:10.1109/MSP.2016.2582220

Jing, L., and Tian, Y. (2020). “Self-supervised visual feature learning with deep neural networks: a survey,” IEEE transactions on pattern analysis and machine intelligence , Ithaca, NY: Cornell University .

Le Callet, P., Möller, S., and Perkis, A. (2012). Qualinet white paper on definitions of quality of experience. European network on quality of experience in multimedia systems and services (COST Action IC 1003), 3(2012) .

Le Gall, D. (1991). Mpeg: A Video Compression Standard for Multimedia Applications. Commun. ACM 34, 46–58. doi:10.1145/103085.103090

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature 521 (7553), 436–444. doi:10.1038/nature14539

Leutenegger, S., Chli, M., and Siegwart, R. Y. (2011). “BRISK: binary robust invariant scalable keypoints,” IEEE International conference on computer vision , Barcelona, Spain , 6-13 Nov, 2011 ( IEEE ), 2548–2555.

Lin, W., and Jay Kuo, C.-C. (2011). Perceptual visual quality metrics: a survey. J. Vis. Commun. image representation 22 (4), 297–312. doi:10.1016/j.jvcir.2011.01.005

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60 (2), 91–110. doi:10.1023/b:visi.0000029664.99615.94

Lumiere, L. (1996). 1936 the lumière cinematograph. J. Smpte 105 (10), 608–611. doi:10.5594/j17187

Masia, B., Wetzstein, G., Didyk, P., and Gutierrez, D. (2013). A survey on computational displays: pushing the boundaries of optics, computation, and perception. Comput. & Graphics 37 (8), 1012–1038. doi:10.1016/j.cag.2013.10.003

Murray, N., Marchesotti, L., and Perronnin, F. (2012). “AVA: a large-scale database for aesthetic visual analysis,” IEEE conference on computer vision and pattern recognition , Providence, RI , June, 2012 . ( IEEE ), 2408–2415. doi:10.1109/CVPR.2012.6247954

Rana, A., Valenzise, G., and Dufaux, F. (2018). Learning-based tone mapping operator for efficient image matching. IEEE Trans. Multimedia 21 (1), 256–268. doi:10.1109/TMM.2018.2839885

Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). “ORB: an efficient alternative to SIFT or SURF,” IEEE International conference on computer vision , Barcelona, Spain , November, 2011 ( IEEE ), 2564–2571. doi:10.1109/ICCV.2011.6126544

Slater, M. (2014). Grand challenges in virtual environments. Front. Robotics AI 1, 3. doi:10.3389/frobt.2014.00003

Van Engelen, J. E., and Hoos, H. H. (2020). A survey on semi-supervised learning. Mach Learn. 109 (2), 373–440. doi:10.1007/s10994-019-05855-6

Vandewalle, P., Kovacevic, J., and Vetterli, M. (2009). Reproducible research in signal processing. IEEE Signal. Process. Mag. 26 (3), 37–47. doi:10.1109/msp.2009.932122

Wallace, G. K. (1992). The JPEG still picture compression standard. IEEE Trans. Consumer Electron.Feb 38 (1), xviii-xxxiv. doi:10.1109/30.125072

Wien, M., Boyce, J. M., Stockhammer, T., and Peng, W.-H. (20192019). Standardization status of immersive video coding. IEEE J. Emerg. Sel. Top. Circuits Syst. 9 (1), 5–17. doi:10.1109/JETCAS.2019.2898948

Wu, G., Masia, B., Jarabo, A., Zhang, Y., Wang, L., Dai, Q., et al. (2017). Light field image processing: an overview. IEEE J. Sel. Top. Signal. Process. 11 (7), 926–954. doi:10.1109/JSTSP.2017.2747126

Xie, N., Ras, G., van Gerven, M., and Doran, D. (2020). Explainable deep learning: a field guide for the uninitiated , Ithaca, NY: Cornell University ..

Keywords: image processing, immersive, image analysis, image understanding, deep learning, video processing

Citation: Dufaux F (2021) Grand Challenges in Image Processing. Front. Sig. Proc. 1:675547. doi: 10.3389/frsip.2021.675547

Received: 03 March 2021; Accepted: 10 March 2021; Published: 12 April 2021.

Reviewed and Edited by:

Copyright © 2021 Dufaux. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Frédéric Dufaux, [email protected]

digital image processing Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Developing Digital Photomicroscopy

(1) The need for efficient ways of recording and presenting multicolour immunohistochemistry images in a pioneering laboratory developing new techniques motivated a move away from photography to electronic and ultimately digital photomicroscopy. (2) Initially broadcast quality analogue cameras were used in the absence of practical digital cameras. This allowed the development of digital image processing, storage and presentation. (3) As early adopters of digital cameras, their advantages and limitations were recognised in implementation. (4) The adoption of immunofluorescence for multiprobe detection prompted further developments, particularly a critical approach to probe colocalization. (5) Subsequently, whole-slide scanning was implemented, greatly enhancing histology for diagnosis, research and teaching.

Parallel Algorithm of Digital Image Processing Based on GPU

Quantitative identification cracks of heritage rock based on digital image technology.

Abstract Digital image processing technologies are used to extract and evaluate the cracks of heritage rock in this paper. Firstly, the image needs to go through a series of image preprocessing operations such as graying, enhancement, filtering and binaryzation to filter out a large part of the noise. Then, in order to achieve the requirements of accurately extracting the crack area, the image is again divided into the crack area and morphological filtering. After evaluation, the obtained fracture area can provide data support for the restoration and protection of heritage rock. In this paper, the cracks of heritage rock are extracted in three different locations.The results show that the three groups of rock fractures have different effects on the rocks, but they all need to be repaired to maintain the appearance of the heritage rock.

Determination of Optical Rotation Based on Liquid Crystal Polymer Vortex Retarder and Digital Image Processing

Discussion on curriculum reform of digital image processing under the certification of engineering education, influence and application of digital image processing technology on oil painting creation in the era of big data, geometric correction analysis of highly distortion of near equatorial satellite images using remote sensing and digital image processing techniques, color enhancement of low illumination garden landscape images.

The unfavorable shooting environment severely hinders the acquisition of actual landscape information in garden landscape design. Low quality, low illumination garden landscape images (GLIs) can be enhanced through advanced digital image processing. However, the current color enhancement models have poor applicability. When the environment changes, these models are easy to lose image details, and perform with a low robustness. Therefore, this paper tries to enhance the color of low illumination GLIs. Specifically, the color restoration of GLIs was realized based on modified dynamic threshold. After color correction, the low illumination GLI were restored and enhanced by a self-designed convolutional neural network (CNN). In this way, the authors achieved ideal effects of color restoration and clarity enhancement, while solving the difficulty of manual feature design in landscape design renderings. Finally, experiments were carried out to verify the feasibility and effectiveness of the proposed image color enhancement approach.

Discovery of EDA-Complex Photocatalyzed Reactions Using Multidimensional Image Processing: Iminophosphorane Synthesis as a Case Study

Abstract Herein, we report a multidimensional screening strategy for the discovery of EDA-complex photocatalyzed reactions using only photographic devices (webcam, cellphone) and TLC analysis. An algorithm was designed to identify automatically EDA-complex reactive mixtures in solution from digital image processing in a 96-wells microplate and by TLC-analysis. The code highlights the region of absorption of the mixture in the visible spectrum, and the quantity of the color change through grayscale values. Furthermore, the code identifies automatically the blurs on the TLC plate and classifies the mixture as colorimetric reactions, non-reactive or potentially reactive EDA mixtures. This strategy allowed us to discover and then optimize a new EDA-mediated approach for obtaining iminophosphoranes in up to 90% yield.

Mangosteen Quality Grading for Export Markets Using Digital Image Processing Techniques

Export citation format, share document.

Captcha Page

We apologize for the inconvenience...

To ensure we keep this website safe, please can you confirm you are a human by ticking the box below.

If you are unable to complete the above request please contact us using the below link, providing a screenshot of your experience.

https://ioppublishing.org/contacts/

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals

Image processing articles from across Nature Portfolio

Image processing is manipulation of an image that has been digitised and uploaded into a computer. Software programs modify the image to make it more useful, and can for example be used to enable image recognition.

research work in digital image processing

Creating a universal cell segmentation algorithm

Cell segmentation currently involves the use of various bespoke algorithms designed for specific cell types, tissues, staining methods and microscopy technologies. We present a universal algorithm that can segment all kinds of microscopy images and cell types across diverse imaging protocols.

Latest Research and Reviews

research work in digital image processing

Identification of CT radiomic features robust to acquisition and segmentation variations for improved prediction of radiotherapy-treated lung cancer patient recurrence

  • Thomas Louis
  • François Lucia
  • Roland Hustinx

research work in digital image processing

Joint transformer architecture in brain 3D MRI classification: its application in Alzheimer’s disease classification

  • Taymaz Akan
  • Mohammad A. N. Bhuiyan

research work in digital image processing

Automated quantification of avian influenza virus antigen in different organs

  • Maria Landmann
  • David Scheibner
  • Reiner Ulrich

research work in digital image processing

Microenvironmental reorganization in brain tumors following radiotherapy and recurrence revealed by hyperplexed immunofluorescence imaging

Improved imaging techniques are required to help advance our understanding of the complex role of the tumour microenvironment (TME). Here, the authors develop a high-throughput, highly multiplexed tissue visualisation workflow and demonstrate its utility by characterising the response of the TME to radiotherapy in preclinical models of glioblastoma.

  • Spencer S. Watson
  • Johanna A. Joyce

research work in digital image processing

Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration

A pretrained foundation model (UniFMIR) enables versatile and generalizable performance across diverse fluorescence microscopy image reconstruction tasks.

research work in digital image processing

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

The CPJUMP1 Resource comprises Cell Painting images and profiles of 75 million cells treated with hundreds of chemical and genetic perturbations. The dataset enables exploration of their relationships and lays the foundation for the development of advanced methods to match perturbations.

  • Srinivas Niranj Chandrasekaran
  • Beth A. Cimini
  • Anne E. Carpenter

Advertisement

News and Comment

Where imaging and metrics meet.

When it comes to bioimaging and image analysis, details matter. Papers in this issue offer guidance for improved robustness and reproducibility.

research work in digital image processing

EfficientBioAI: making bioimaging AI models efficient in energy and latency

  • Jianxu Chen

research work in digital image processing

JDLL: a library to run deep learning models on Java bioimage informatics platforms

  • Carlos García López de Haro
  • Stéphane Dallongeville
  • Jean-Christophe Olivo-Marin

research work in digital image processing

Moving towards a generalized denoising network for microscopy

The visualization and analysis of biological events using fluorescence microscopy is limited by the noise inherent in the images obtained. Now, a self-supervised spatial redundancy denoising transformer is proposed to address this challenge.

  • Lachlan Whitehead

research work in digital image processing

Imaging across scales

New twists on established methods and multimodal imaging are poised to bridge gaps between cellular and organismal imaging.

  • Rita Strack

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research work in digital image processing

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10607786

Logo of jimaging

Developments in Image Processing Using Deep Learning and Reinforcement Learning

Jorge valente.

1 Techframe-Information Systems, SA, 2785-338 São Domingos de Rana, Portugal; [email protected] (J.V.); [email protected] (J.A.)

João António

Carlos mora.

2 Smart Cities Research Center, Polytechnic Institute of Tomar, 2300-313 Tomar, Portugal; [email protected]

Sandra Jardim

Associated data.

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to company privacy matters; however, all data contained in the dataset mentioned in the manuscript is publicly available.

The growth in the volume of data generated, consumed, and stored, which is estimated to exceed 180 zettabytes in 2025, represents a major challenge both for organizations and for society in general. In addition to being larger, datasets are increasingly complex, bringing new theoretical and computational challenges. Alongside this evolution, data science tools have exploded in popularity over the past two decades due to their myriad of applications when dealing with complex data, their high accuracy, flexible customization, and excellent adaptability. When it comes to images, data analysis presents additional challenges because as the quality of an image increases, which is desirable, so does the volume of data to be processed. Although classic machine learning (ML) techniques are still widely used in different research fields and industries, there has been great interest from the scientific community in the development of new artificial intelligence (AI) techniques. The resurgence of neural networks has boosted remarkable advances in areas such as the understanding and processing of images. In this study, we conducted a comprehensive survey regarding advances in AI design and the optimization solutions proposed to deal with image processing challenges. Despite the good results that have been achieved, there are still many challenges to face in this field of study. In this work, we discuss the main and more recent improvements, applications, and developments when targeting image processing applications, and we propose future research directions in this field of constant and fast evolution.

1. Introduction

Images constitute one of the most important forms of communication used by society and contain a large amount of important information. The human vision system is usually the first form of contact with media and has the ability to naturally extract important, and sometimes subtle, information, enabling the execution of different tasks, from the simplest, such as identifying objects, to the more complex, such as the creation and integration of knowledge. However, this system is limited to the visible range of the electromagnetic spectrum. On the contrary, computer systems have a more comprehensive coverage capacity, ranging from gamma to radio waves, which makes it possible to process a wide spectrum of images, covering a wide and varied field of applications. On the other hand, the exponential growth in the volume of images created and stored daily makes their analysis and processing a difficult task to implement outside the technological sphere. In this way, image processing through computational systems plays a fundamental role in extracting necessary and relevant information for carrying out different tasks in different contexts and application areas.

Image processing originated in 1964 with the processing of the images of the lunar surface, and in a simple way, we can define the concept of image processing as an area of signal processing dedicated to the development of computational techniques aimed at the analysis, improvement, compression, restoration, and extraction of information from digital images. With a wide range of applications, image processing has been a subject of great interest both from the scientific community and from industry. This interest, combined with the technological evolution of computer systems and the need to have systems with increasingly better levels of performance, both in terms of precision and reliability and in terms of processing speed, has enabled a great evolution of image processing techniques, moving from the use of nonlearning-based methods to the application of machine learning techniques.

Having emerged in the mid-twentieth century, machine learning (ML) is a subset of artificial intelligence (AI), a field of computer science that focuses on designing machines and computational solutions capable of executing, ideally automatically, tasks that include, among others, natural language understanding, speech understanding, and image recognition [ 1 ]. When providing new ways to design AI models [ 2 ], ML, such as other scientific computing applications, commonly uses linear algebra operations on multidimensional arrays, which are computational data structures for representing vectors, matrices, and tensors of a higher order. ML is a data analysis method that automates the construction of analytical models and computer algorithms, which are used in a large range of data types [ 1 ] and are particularly useful for analyzing data and establishing potential patterns to try and predict new information [ 3 ]. This suit of techniques has exploded in use and as a topic of research over the past decade, to the point where almost everyone interacts with modern AI models many times every day [ 4 ].

AI, in particular ML, has revolutionized many areas of technology. One of the areas where the impact of such techniques is noticeable is image processing. The advancement of algorithms and computational capabilities has driven and enabled the performance of complex tasks in the field of image processing, such as facial recognition, object detection and classification, generation of synthetic images, semantic segmentation, image restoration, and image retrieval. The application of ML techniques in image processing brings a set of benefits that impact different sectors of society. This technology has the potential to optimize processes, improve the accuracy of data analysis, and provide new possibilities in different areas. With ML techniques, it is possible to analyze and interpret images with high precision. The advances that have been made in the use of neural networks have made it possible to identify objects, recognize patterns, and carry out complex analyses on images with a high accuracy rate. Pursuing ever-increasing precision is essential in areas such as medicine, where accurate diagnosis can make a difference in patients’ lives.

By applying ML techniques and models to image processing, it is possible to automate tasks that were previously performed manually. In this context, and as an example, we have quality control processes in production lines, where ML allows for the identification of defects in products quickly and accurately, eliminating the need for human inspection, leading to an increase in process efficiency, as well as to a reduction in errors inherent to the human factor and costs.

The recognized ability of ML models to extract valuable information from images enables advanced analysis in several areas, namely public safety, where facial recognition algorithms can be used to identify individuals, and scientific research, such as the inspection of astronomical images, the classification of tissues or tumor cells, and the detection of patterns in large volumes of data.

With so much new research and proposed approaches being published with high frequency, it is a daunting task to keep up with current trends and new research topics, especially if they occur in a research field one is not familiar with. For this purpose, we propose to explore and review publications discussing the new techniques available and the current challenges and point out some of the possible directions for the future. We believe this research can prove helpful for future researchers and provide a modern vision of this fascinating and vast research subject.

On the other hand, and as far as it was possible to verify from the analysis of works published in recent years, there is a lack of studies that highlight machine learning techniques applied to image processing in different areas. There are several works that focus on reviewing the work that has been developed in a given area, and the one that seems to arouse the most interest is the area of medical imaging [ 5 , 6 , 7 , 8 , 9 ]. Therefore, this paper also contributes to presenting an analysis and discussion of ML techniques in a broad context of application.

This document is divided into sections, subsections, and details. Section 2 —Introduction: describes the research methodology used to carry out this review manuscript. Section 3 —Technical Background: presents an overview of the AI models most used in image processing. Section 4 —Image Processing Developments: describes related work and different state-of-the-art approaches used by researchers to solve modern-day challenges. Section 5 —Discussion and Future Directions: presents the main challenges and limitations that still exist in the area covered by this manuscript, pointing out some possible directions for the evolution of the models proposed to date. Finally, Section 6 provides a brief concluding remark with the main conclusions that can be taken from our study.

2. Methodology

In order to carry out this review, we considered a vast number of scientific publications in the scope of ML, particularly those involving image processing methods using DL and RL techniques and applied to real-world problems.

2.1. Search Process and Sources of Information

In order to guarantee the reliability of the documents, the information sources were validated, having been considered reputable publication journals and university repositories. From the selected sources, we attempted to include research from multiple areas and topics to provide a general and detailed representation of the ways image processing research has developed and can be used. Nevertheless, some areas appear to have developed a greater interest in some of the ML methods previously described. The search process involved using a selection of keywords that are closely related to image processing on popular scientific search engines such as Springer Science Direct, and Core. These search engines were selected since they allowed us to make comparable searches, targeting specific terms and filtering results by research area. In order to cover a broad range of topics and magazines, the only search filter that we used was chosen to ensure that the subjects were related to data science and/or artificial intelligence.

As of February 2023, a search using the prompt “image processing AI” returns manuscripts related mostly to “Medicine”, “Computer science”, and “Engineering”. In fact, while searching in the three different research aggregators, the results stayed somewhat consistent. A summary of the results obtained can be observed in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is jimaging-09-00207-g001.jpg

Main research areas for the tested search inputs for three different academic engines.

Since there is more research available on some topics, the cases described ahead can also have a higher prevalence when compared to others.

2.2. Inclusion and Exclusion Criteria for Article Selection

The research carried out in the different repositories resulted in a large number of research works proposed by different authors. By considering the constant advances made in this subject and the amount of research developed, we opted to mainly focus on research developed in the last 5 years. We analyzed and selected the research sources that provided novel and/or interesting applications of ML in image processing. The objective was to present a broad representation of the recent trends in ML research and provide the information in a more concise form.

3. Technical Background

The growing use of the internet in general, and social networks in particular, has led to the availability of a large increase in digital images; being privileged means being able to express emotions and share information, which enables many diverse applications [ 10 ]. Identifying the interesting parts of a scene is a fundamental step for recognizing and interpreting an image [ 11 ]. To understand how the different techniques are applied in processing images and extracting their features, as well as explaining the main concepts and technicalities of the different types of AI models, we will provide a general technical background review of machine learning and image processing, which will help provide relevant context to the scope of this review, as well as guide the reader through the covered topics.

3.1. Graphics Processing Units

Many advances covered in this paper, along with classical ML and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of AI [ 1 , 10 ], enabling the processing of massive amounts of data generated each day and lowering the barrier to adoption [ 1 ]. In particular, the usage of GPUs revolutionized the landscape of classical ML and DL models. From the 1990s to the late 2000s, ML research was predominantly focused on SVM, which was considered state-of-the-art [ 1 ]. In the following decade, starting in 2010, GPUs brought new life into the field of DL, jumpstarting a high amount of research and development [ 1 ]. State-of-the-art DL algorithms tend to have higher computational complexity, requiring several iterations to make the parameters converge to an optimal value [ 12 , 13 ]. However, the relevance of DL has only become greater over the years, as this technology has gradually become one of the main focuses of ML research [ 14 ].

While research into the use of ML on GPUs predates the recent resurgence of DL, the usage of general-purpose GPUs for computing (GPGPU) became widespread when CUDA was released in 2007 [ 1 ]. Shortly after, CNN started to be implemented on top of GPUs, demonstrating dramatic end-to-end speedup, even over highly optimized CPU implementations. CNNs are a subset of methods that can be used, for example, for image restoration, which has demonstrated outstanding performance [ 1 ]. Some studies have shown that, when compared with traditional neural networks and SVM, the accuracy of recognition using CNNs is notably higher [ 12 ].

Some of these performance gains were accomplished even before the existence of dedicated GPU-accelerated BLAS libraries. The release of the first CUDA Toolkit brought new life to general-purpose parallel computing with GPUs, with one of the main benefits of this approach being the ability of GPUs to enable a multithreaded single-instruction (SIMT) programming paradigm, higher throughput, and more parallel models when compared to SIMD. This process makes several blocks of multiprocessors available, each with many parallel cores (threads), allowing access to high-speed memory [ 1 ].

3.2. Image Processing

For humans, an image is a visual and meaningful arrangement of regions and objects [ 11 ]. Recent advances in image processing methods find application in different contexts of our daily lives, both as citizens and in the professional field, such as compression, enhancement, and noise removal from images [ 10 , 15 ]. In classification tasks, an image can be transformed into millions of pixels, which makes data processing very difficult [ 2 ]. As a complex and difficult image-processing task, segmentation has high importance and application in several areas, namely in automatic visual systems, where precision affects not only the segmentation results but also the results of the following tasks, which, directly or indirectly, depend on it [ 11 ]. In segmentation, the goal is to divide an image into its constituent parts (or objects)—sometimes referred to as regions of interest (ROI)—without overlapping [ 16 , 17 ], which can be achieved through different feature descriptors, such as the texture, color, and edges, as well as a histogram of oriented gradients (HOG) and a global image descriptor (GIST) [ 11 , 17 ]. While the human vision system segments images on a natural basis, without special effort, automatic segmentation is one of the most complex tasks in image processing and computer vision [ 16 ].

Given its high applicability and importance, object detection has been a subject of high interest in the scientific community. Depending on the objective, it may be necessary to detect objects with a significant size compared to the image where they are located or to detect several objects of different sizes. The results of object detection in images vary depending on their dimensions and are generally better for large objects [ 18 ]. Image processing techniques and algorithms find application in the most diverse areas. In the medical field, image processing has grown in many directions, including computer vision, pattern recognition, image mining, and ML [ 19 ].

In order to use some ML models when problems in image processing occur, it is often necessary to reduce the number of data entries to quickly extract valuable information from the data [ 10 ]. In order to facilitate this process, the image can be transformed into a reduced set of features in an operation that selects and measures the representative data properties in a reduced form, representing the original data up to a certain degree of precision, and mimicking the high-level features of the source [ 2 ]. While deep neural networks (DNNs) are often used for processing images, some traditional ML techniques can be applied to improve the data obtained. For example, in Zeng et al. [ 20 ], a deep convolutional neural network (CNN) was used to extract image features, and principal component analysis (PCA) was applied to reduce the dimensionality of the data.

3.3. Machine Learning Overview

ML draws inspiration from a conceptual understanding of how the human brain works, focusing on performing specific tasks that often involve pattern recognition, including image processing [ 1 ], targeted marketing, guiding business decisions, or finding anomalies in business processes [ 4 ]. Its flexibility has allowed it to be used in many fields owing to its high precision, flexible customization, and excellent adaptability, being increasingly more common in the fields of environmental science and engineering, especially in recent years [ 3 ]. When learning from data, deep learning systems acquire the ability to identify and classify patterns, making decisions with minimal human intervention [ 2 ]. Classical techniques are still fairly widespread across different research fields and industries, particularly when working with datasets not appropriate for modern deep learning (DL) methods and architectures [ 1 ]. In fact, some data scientists like to reinforce that no single ML algorithm fits all data, with proper model selection being dependent on the problem being solved [ 21 , 22 ]. In diagnosis modeling that uses the classification paradigm, the learning process is based on observing data as examples. In these situations, the model is constructed by learning from data along with its annotated labels [ 2 ].

While ML models are an important part of data handling, other steps need to be taken in preparation, like data acquisition, the selection of the appropriate algorithm, model training, and model validation [ 3 ]. The selection of relevant features is one of the key prerequisites to designing an efficient classifier, which allows for robust and focused learning models [ 23 ].

There are two main classes of methods in ML: supervised and unsupervised learning, with the primary difference being the presence of labels in the datasets.

  • In supervised learning, we can determine predictive functions using labeled training datasets, meaning each data object instance must include an input for both the values and the expected labels or output values [ 21 ]. This class of algorithms tries to identify the relationships between input and output values and generate a predictive model able to determine the result based only on the corresponding input data [ 3 , 21 ]. Supervised learning methods are suitable for regression and data classification, being primarily used for a variety of algorithms like linear regression, artificial neural networks (ANNs), decision trees (DTs), support vector machines (SVMs), k-nearest neighbors (KNNs), random forest (RF), and others [ 3 ]. As an example, systems using RF and DT algorithms have developed a huge impact on areas such as computational biology and disease prediction, while SVM has also been used to study drug–target interactions and to predict several life-threatening diseases, such as cancer or diabetes [ 23 ].
  • Unsupervised learning is typically used to solve several problems in pattern recognition based on unlabeled training datasets. Unsupervised learning algorithms are able to classify the training data into different categories according to their different characteristics [ 21 , 24 ], mainly based on clustering algorithms [ 24 ]. The number of categories is unknown, and the meaning of each category is unclear; therefore, unsupervised learning is usually used for classification problems and for association mining. Some commonly employed algorithms include K-means [ 3 ], SVM, or DT classifiers. Data processing tools like PCA, which is used for dimensionality reduction, are often necessary prerequisites before attempting to cluster a set of data.

Some studies make reference to semi-supervised learning, in which a combination of unsupervised and supervised learning methods are used. In theory, a mixture of labeled and unlabeled data is used to help reduce the costs of labeling a large amount of data. The advantage is that the existence of some labeled data should make these models perform better than strictly unsupervised learning [ 21 ].

In addition to the previously mentioned classes of methods, reinforcement learning (RL) can also be regarded as another class of machine learning (ML) algorithms. This class refers to the generalization ability of a machine to correctly answer unlearned problems [ 3 ].

The current availability of large amounts of data has revolutionized data processing and statistical modeling techniques but, in turn, has brought new theoretical and computational challenges. Some problems have complex solutions due to scale, high dimensions, or other factors, which might require the application of multiple ML models [ 4 ] and large datasets [ 25 ]. ML has also drawn attention as a tool in resource management to dynamically manage resource scaling. It can provide data-driven methods for future insights and has been regarded as a promising approach for predicting workload quickly and accurately [ 26 ]. As an example, ML applications in biological fields are growing rapidly in several areas, such as genome annotation, protein binding, and recognizing the key factors of cancer disease prediction [ 23 ]. The deployment of ML algorithms on cloud servers has also offered opportunities for more efficient resource management [ 26 ].

Most classical ML techniques were developed to target structured data, meaning data in a tabular form with data objects stored as rows and the features stored as columns. In contrast, DL is specifically useful when working with larger, unstructured datasets, such as text and images [ 1 ]. Additional hindrances may apply in certain situations, as, for example, in some engineering design applications, heterogeneous data sources can lead to sparsity in the training data [ 25 ]. Since modern problems often require libraries that can scale for larger data sizes, a handful of ML algorithms can be parallelized through multiprocessing. Nevertheless, the final scale of these algorithms is still limited by the amount of memory and number of processing cores available on a single machine [ 1 ].

Some of the limitations in using ML algorithms come from the size and quality of the data. Real datasets are a challenge for ML algorithms since the user may face skewed label distributions [ 1 ]. Such class imbalances can lead to strong predictive biases, as models can optimize the training objective by learning to predict the majority label most of the time. The term “ensemble techniques” in ML is used for combinations of multiple ML algorithms or models. These are known and widely used for providing stability, increasing model performance, and controlling the bias-variance trade-off [ 1 ]. Hyperparameter tuning is also a fundamental use case in ML, which requires the training and testing of a model over many different configurations to be able to find the model with the best predictive performance. The ability to train multiple smaller models in parallel, especially in a distributed environment, becomes important when multiple models are being combined [ 1 ].

Over the past few years, frequent advances have occurred in AI research caused by a resurgence in neural network methods that have fueled breakthroughs in areas like image understanding, natural language processing, and others [ 27 ]. One area of AI research that appears particularly inviting from this perspective is deep reinforcement learning (DRL), which marries neural network modeling with RL techniques. This technique has exploded within the last 5 years into one of the most intense areas of AI research, generating very promising results to mimic human-level performance in tasks varying from playing poker [ 28 ], video games [ 29 ], multiplayer contests, and complex board games, including Go and Chess [ 27 ]. Beyond its inherent interest as an AI topic, DRL might hold special interest for research in psychology and neuroscience since the mechanisms that drive learning in DRL were partly inspired by animal conditioning research and are believed to relate closely to neural mechanisms for reward-based learning centering on dopamine [ 27 ].

3.3.1. Deep Learning Concepts

DL is a heuristic learning framework and a sub-area of ML that involves learning patterns in data structures using neural networks with many nodes of artificial neurons called perceptrons [ 10 , 19 , 30 ] (see Figure 2 ). Artificial neurons can take several inputs and work according to a mathematical calculation, returning a result in a process similar to a biological neuron [ 19 ]. The simplest neural network, known as a single-layer perceptron [ 30 ], is composed of at least one input, one output, and a processor [ 31 ]. Three different types of DL algorithms can be differentiated: multilayered perceptron (MLP) with more than one hidden layer, CNN, and recurrent neural networks (RNNs) [ 32 ].

An external file that holds a picture, illustration, etc.
Object name is jimaging-09-00207-g002.jpg

Differences in the progress stages between traditional ML methods and DL methods.

One important consideration towards generic neural networks is they are extremely low-bias learning systems. As dictated by the bias–variance trade-off, this means that neural networks, in the most generic form employed in the first DRL models, tend to be sample-inefficient and require large amounts of data to learn. A narrow hypothesis set can speed the learning process if it contains the correct hypothesis or if the specific biases the learner adopts happen to fit with the material to be learned [ 27 ]. Several proposals for algorithms and models have emerged, some of which have been extensively used in different contexts, such as CNNs, autoencoders, and multilayer feedback RNN [ 10 ]. For datasets of images, speech, and text, among others, it is necessary to use different network models in order to maximize system performance [ 33 ]. DL models are often used for image feature extraction and recognition, given their higher performance when dealing with some of the traditional ML problems [ 10 ].

DL techniques differ from traditional ML in some notable ways (see also Figure 2 ):

  • Training a DNN implies the definition of a loss function, which is responsible for calculating the error made in the process given by the difference between the expected output value and that produced by the network. One of the most used loss functions in regression problems is the mean squared error (MSE) [ 30 ]. In the training phase, the weight vector that minimizes the loss function is adjusted, meaning it is not possible to obtain analytical solutions effectively. The loss function minimization method usually used is gradient descent [ 30 ].
  • Activation functions are fundamental in the process of learning neural network models, as well as in the interpretation of complex nonlinear functions. The activation function adds nonlinear features to the model, allowing it to represent more than one linear function, which would not happen otherwise, no matter how many layers it had. The Sigmoid function is the most commonly used activation function in the early stages of studying neural networks [ 30 ].
  • As their capacity to learn and adjust to data is greater than that of traditional ML models, it is more likely that overfitting situations will occur in DL models. For this reason, regularization represents a crucial and highly effective set of techniques used to reduce the generalization errors in ML. Some other techniques that can contribute to achieving this goal are increasing the size of the training dataset, stopping at an early point in the training phase, or randomly discarding a portion of the output of neurons during the training phase [ 30 ].
  • In order to increase stability and reduce convergence times in DL algorithms, optimizers are used, with which greater efficiency in the hyperparameter adjustment process is also possible [ 30 ].

In the last decades, three main mathematical tools have been studied for image modeling and representation, mainly because of their proven modeling flexibility and adaptability. These methods are the ones based on probability statistics, wavelet analysis, and partial differential equations [ 34 , 35 ]. In image processing procedures, it is sometimes necessary to reduce the number of input data. An image can be translated into millions of pixels for tasks, such as classifications, meaning that data entry would make the processing very difficult. In order to overcome some difficulties, the image can be transformed into a reduced set of features, selecting and measuring some representative properties of raw input data in a more reduced form [ 2 ]. Since DL technologies can automatically mine and analyze the data characteristics of labeled data [ 13 , 14 ], this makes DL very suitable for image processing and segmentation applications [ 14 ]. Several approaches use autoencoders, a set of unsupervised algorithms, for feature selection and data dimensionality reduction [ 31 ].

Among the many DL models, CNNs have been widely used in image processing problems, proving more powerful capabilities in image processing than traditional algorithms [ 36 ]. As shown in Figure 3 , a CNN, like a typical neural network, comprises an input layer, an output layer, and several hidden layers [ 37 ]. A single hidden layer in a CNN typically consists of a convolutional layer, a pooling layer, a fully connected layer [ 38 ], and a normalization layer.

An external file that holds a picture, illustration, etc.
Object name is jimaging-09-00207-g003.jpg

Illustration of the structure of a CNN.

Additionally, the number of image-processing applications based on CNNs is also increasing daily [ 10 ]. Among the different DL structures, CNNs have proven to be more efficient in image recognition problems [ 20 ]. On the other hand, they can be used to improve image resolution, enhancing their applicability in real problems, such as the transmission or storage of images or videos [ 39 ].

DL models are frequently used in image segmentation and classification problems, as well as object recognition and image segmentation, and they have shown good results in natural language processing problems. As an example, face recognition applications have been extensively used in multiple real-life examples, such as airports and bank security and surveillance systems, as well as mobile phone functionalities [ 10 ].

There are several possible applications for image-processing techniques. There has been a fast development in terms of surveillance tools like CCTV cameras, making inspecting and analyzing footage more difficult for a human operator. Several studies show that human operators can miss a significant portion of the screen action after 20 to 40 minutes of intensive monitoring [ 18 ]. In fact, object detection has become a demanding study field in the last decade. The proliferation of high-powered computers and the availability of high-speed internet has allowed for new computer vision-based detection, which has been frequently used, for example, in human activity recognition [ 18 ], marine surveillance [ 40 ], pedestrian identification [ 18 ], and weapon detection [ 41 ].

One alternative application of ML in image-processing problems is image super-resolution (SR), a family of technologies that involve recovering a super-resolved image from a single image or a sequence of images of the same scene. ML applications have become the most mainstream topic in the single-image SR field, being effective at generating a high-resolution image from a single low-resolution input. The quality of training data and the computational demand remain the two major obstacles in this process [ 42 ].

3.3.2. Reinforcement Learning Concepts

RL is a set of ML algorithms that use a mathematical framework that can learn to optimize control strategies directly from the data [ 4 , 43 ] based on a reward function in a Markov decision process [ 44 , 45 ]. The Markov decision process (MDP) is a stochastic process used to model the decision-making process of a dynamic system. The decision process is sequential, where actions/decisions depend on the current state and the system environment, influencing not only the immediate rewards but also the entire decision process [ 4 ]. One commonly referenced RL problem is the multi-armed bandit , in which an agent selects one of n different options and receives a reward depending on the selection. This problem illustrates how RL can provide a trade-off between exploration (trying different arms) and exploitation (playing the arm with the best results) [ 44 ]. This group of algorithms is derived from behaviorist psychology, where an intelligent body explores the external environment and updates its strategy with feedback signals to maximize the cumulative reward [ 43 ], which means the action is exploitative [ 46 ].

In RL, the behavior of the Markov decision process is determined by a reward function [ 4 ]. The basis of a DRL network is made up of an agent and an environment, following an action-reward type of operation. The interaction begins in the environment with the sending of its state to the agent, which takes an action consistent with the state received, according to which it is subsequently rewarded or penalized by the environment [ 4 , 44 , 46 , 47 , 48 ]. RL is considered an autonomous learning technique that does not require labeled data but for which search and value function approximation are vital tools [ 4 ]. Often, the success of RL algorithms depends on a well-designed reward function [ 45 ]. Current RL methods still present some challenges, namely the efficiency of the learning data and the ability to generalize to new scenarios [ 49 ]. Nevertheless, this group of techniques has been used with tremendous theoretical and practical achievements in diverse research topics such as robotics, gaming, biological systems, autonomous driving, computer vision, healthcare, and others [ 44 , 48 , 50 , 51 , 52 , 53 ].

One common technique in RL is random exploration, where the agent makes a decision on what to do randomly, regardless of its progress [ 46 ]. This has become impractical in some real-world applications since learning times can often become very large. Recently, RL has shown a significant performance improvement compared to non-exploratory algorithms [ 46 , 54 ]. Another technique, inverse reinforcement learning (IRL), uses an opposite strategy by aiming to find a reward function that can explain the desired behavior [ 45 ]. In a recent study using IRL, Hwang et al. [ 45 ] proposed a new RL method, named option compatible reward inverse reinforcement learning , which applies an alternative framework to the compatible reward method. The purpose was to assign reward functions to a hierarchical IRL problem that is introduced while making the knowledge transfer easier by converting the information contained in the options into a numerical reward value. While the authors concluded that their novel algorithm was valid in several classical benchmark domains, they remarked that applying it to real-world problems still required extended evaluation.

RL models have been used in a wide variety of practical applications. For example, the COVID-19 pandemic was one of the health emergencies with the widest impact that humans have encountered in the past century. Many studies were directed towards this topic, including many that used ML techniques to several effects. Zong and Luo (2022) [ 55 ] conducted a study where they employed a custom epidemic simulation environment for COVID-19 where they applied a new multi-agent RL-based framework to explore optimal lockdown resource allocation strategies. The authors used real epidemic transmission data to calibrate the employed environment to obtain results more consistent with the real situation. Their results indicate that the proposed approach can adopt a flexible allocation strategy according to the age distribution of the population and economic conditions. These insights could be extremely valuable for decision-makers in supply chain management.

Some technical challenges blocked the combination of DNN with RL until 2015, when breakthrough research demonstrated how the integration could work in complex domains, such as Atari video games [ 29 , 56 ], leading to rapid progress toward improving and scaling DRL [ 27 ]. Some of the first successful applications of DRL came with the success of the deep Q network algorithm [ 56 ]. Currently, the application of DRL models to computer vision problems, such as object detection and tracking or image segmentation, has gained emphasis, given the good results it has produced [ 31 ]. RL, along with supervised and unsupervised methods, are the three main pattern recognition models used for research [ 57 ].

The initial advances in RL were boosted by the good performance of the [ 56 ] replay algorithm, as well as the use of two networks, one with fixed weights, which serves as the basis for a second network, for which the weights are iteratively updated during training, replacing the first one when the learning process ends. With the aim of reducing the high convergence times of DRL algorithms, several distributed framework approaches [ 58 ] have been proposed. This suit of methods has been successfully used for applications in computer vision [ 59 ] and in robotics [ 58 ].

3.4. Current Challenges

Considering everything that has been discussed previously, some of the main challenges that AI image processing faces are common across multiple subjects. Most applications require a large volume of images that are difficult to obtain. Indeed, due to the large amount of data, the process of extracting features from a dataset can become very time and resource-consuming. Some models, such as CNNs, can potentially have millions of parameters to be learned, which might require considerable effort to obtain sufficient labeled data [ 60 ]. Since AI models are heavily curated for a given purpose, the model obtained will likely be inapplicable outside of the specific domain in which it was trained. The performance of a model can be heavily impacted by the data available, meaning the accuracy of the outcome can also vary heavily [ 61 ]. An additional limitation that has been identified during research is the sensitivity of models regarding noisy or biased data [ 60 ]. A meticulous and properly designed data-collection plan is essential, often complemented by a prepossessing phase to ensure good-quality data. Some researchers have turned their attention to improving the understanding of the many models. Increased focus has been placed on the way the weights of a neural network can sometimes be difficult to decipher and extract useful information from, which can lead to wrong assumptions and decisions [ 62 ]. In order to facilitate communication and discussion, some authors have also attempted to provide a categorization system of DL methodologies based on their applications [ 31 ].

4. Image Processing Developments

The topic of ML has been studied with very broad applications and in multiple areas that require data collection and processing. Considering recent publications from the last 7 years (2017–2023), we see that several studies have been developed dealing with different subjects, with proposals of many different models. In particular, we found a considerable amount of research papers showing interest in using DL in medicine, engineering, and biology. When we consider the volume of research developed, there is a clear increase in published research papers targeting image processing and DL, over the last decades. A search using the terms “image processing deep learning” in Springerlink generated results demonstrating an increase from 1309 articles in 2005 to 30,905 articles in 2022, only considering review and research papers. In the aggregator Science Direct , we saw a similar result, demonstrating an increase from 1173 in 2005 to 27,393 scientific manuscripts in 2022. The full results across the referred timeline can be observed in Figure 4 . These results validate an upward trend in attention to DL methods, as also described in the previous section.

An external file that holds a picture, illustration, etc.
Object name is jimaging-09-00207-g004.jpg

Number of research articles found using the search query “image processing deep learning” for two different aggregators.

A lot of recent literature, especially in the medical field, has attempted to address the biggest challenges, mainly derived from data scarcity and model performance [ 14 , 61 , 62 , 63 , 64 ]. Some research has focused on improving perforce or reducing the computational requirements in models such as CNNs [ 60 , 65 , 66 ] using techniques such as model pruning or compression. These have the objective of reducing the model’s overall size or operating cost. In the next section, we will discuss relevant approaches taken on the subject to illustrate how the scientific community has been using ML methods to solve specific data-driven problems and discuss some of the implications.

4.1. Domains

Studies involving image processing can be found on topics such as several infrastructure monitoring applications [ 13 , 67 , 68 ] in road pavement [ 69 , 70 , 71 ], remote sensing images [ 12 ], image reconstruction [ 72 ], detecting and quantifying plant diseases [ 73 , 74 , 75 , 76 , 77 ], identification of pests in plant crops [ 17 , 78 , 79 ], automated bank cheque verification [ 80 ] or even for graphical search [ 11 , 81 , 82 , 83 ]. There is also an ample amount of research using ML algorithms in the medical field. DL techniques have been applied in infection monitoring [ 64 , 84 , 85 ], in developing personalized advice for treatment [ 19 , 86 ], in diagnosing several diseases like COVID-19 [ 63 , 87 , 88 , 89 ], or imaging procedures including radiology [ 14 , 63 , 90 , 91 ] and pathology imaging [ 19 ] or in cancer screening [ 91 , 92 , 93 , 94 ].

While most modern research hasn’t focused on traditional ML techniques, there are still some valuable lessons to be taken from these studies, with interesting results obtained in engineering subjects. In 2022, Pratap and Sardana [ 21 ] conducted and published a review on image processing in materials science and engineering using ML. In this study, the authors reviewed several research materials focusing on ML, the ML model selection, and the image processing technique used, along with the context of the problem. The authors suggested SimpleCV as a possible framework, specifically for digital image processing. This type of approach was justified by the authors since materials have a 3D structure but most of the analysis on image processing that has been done is of 2D images [ 21 ]. Image super-resolution (SR) is another interesting application of ML concepts for image processing challenges that has attracted some attention in the past decades [ 15 , 42 ]. In 2016, Zhao et al. [ 42 ] proposed a framework for single-image super-resolution tasks, consisting of kernel blur estimation, to improve the training quality as well as the model performance. Using the kernel blur estimation, the authors adopted a selective patch processing strategy combined with sparse recovery. While their result indicated a better level of performance than several super-resolution approaches, some of the optimization problems encountered were, themselves, extraordinarily time-consuming, and as such, not a suitable solution for efficiency improvement. Research such as those can often serve as inspiration to address nuanced engineering problems that may be more specific to certain research subjects. As an example, in the last decade, the automobile industry has made a concerted shift towards intelligent vehicles equipped with driving assistance systems, with new vision systems in some higher-end cars. Some vision systems include cameras mounted in the car, which can be used by engineers to obtain large quantities of images and develop many of the future self-driving car functionalities [ 66 ].

Some advanced driver assistance systems (ADAS) that use AI have been proposed to assist drivers and attempt to significantly decrease the number of accidents. These systems often employ technologies such as image sensors, global positioning, radar imaging, and computer vision techniques. Studies have been developed that tested a number of different image processing techniques to understand their accuracy and limitations and found good results with traditional ML methods like SVM and optimum-path forest classifier [ 95 ] or K-Means clustering [ 11 ]. One potential benefit of using this approach is that some traditional methods can be less costly to apply and can be used as complementary on many different subjects. Rodellar et al. [ 16 ] investigated the existing research on the analysis of blood cells, using image processing. The authors acknowledged the existence of subtle morphological differences for some lymphoma and leukemia cell types, that are difficult to identify in routine screening. Some of their most curious findings were that the methods most commonly used in the classification of PB cells were Neural Networks, Decision Trees (DT), and SVM. The authors noted that image-based automatic recognition systems could position themselves as new modules of the existing analyzers or even that new systems could be built and combined with other well-established ones.

4.1.1. Research Using Deep Learning

Regarding Deep Learning methodologies, many studies attempt to improve the performance of DL models, which we highlight next. In their research, Monga et al. [ 96 ] conducted a review of usage and research involving Deep Neural Networks (DNN) that covered some of the most popular techniques for algorithm unrolling in several domains of signal and image processing. The authors extensively covered research developed on a technique called algorithm unrolling or unfolding. This method can provide a concrete and systematic connection between iterative algorithms, which are used widely in signal processing, and DNNs. This type of application has recently attracted enormous attention both in theoretical investigations and practical applications. The authors noted that while substantial progress has been made, more work needs to be done to comprehend the mechanism behind the unrolling network behavior. In particular, they highlight the need to clarify why some of the state-of-the-art networks perform so well on several recognition tasks. In a study published by Zeng et al. [ 20 ], a correction neural network model named Boundary Regulated Network (BR-Net) was proposed. It used high-resolution remote satellite images as the source, and the features of the image were extracted through convolution, pooling, and classification. The model accuracy was additionally increased through training on the experimental dataset in a particular area. In their findings, the authors indicated a performance improvement of 15%, while the recognition speed was also increased by 20%, compared with the newly researched models, further noting that, for a considerably large amount of data, the model will have poor generalization ability. In Farag [ 66 ], the investigation focused on the ability of a CNN model to learn safe driving maneuvers based on data collected using a front-facing camera. Their data collection happened using urban routes and was performed by an experienced driver. The author developed a 17-layer behavior cloning CNN model with four drop-out layers added to prevent overfitting during training. The results looked promising enough, whereby a small amount of training data from a few tracks was sufficient to train the car to drive safely on multiple tracks. For such an approach, one possible shortcoming is that the approach taken may require a massive number of tracks in order to be able to generalize correctly for actual street deployment.

Some modern research has focused on expanding the practical applications of DL models in image processing:

  • One of the first DL models used for video prediction, inspired by the sequence-to-sequence model usually used in natural language processing [ 97 ], uses a recurrent long and short term memory network (LSTM) to predict future images based on a sequence of images encoded during video data processing [ 97 ].
  • In their research, Salahzadeh et al. [ 98 ] presented a novel mechatronics platform for static and real-time posture analysis, combining 3 complex components. The components included a mechanical structure with cameras, a software module for data collection and semi-automatic image analysis, and a network to provide the raw data to the DL server. The authors concluded that their device, in addition to being inexpensive and easy to use, is a method that allows postural assessment with great stability and in a non-invasive way, proving to be a useful tool in the rehabilitation of patients.
  • Studies in graphical search engines and content-based image retrieval (CBIR) systems have also been successfully developed recently [ 11 , 82 , 99 , 100 ], with processing times that might be compatible with real-time applications. Most importantly, the corresponding results of these studies appeared to show adequate image retrieval capabilities, displaying an undisputed similarity between input and output, both on a semantic basis and a graphical basis [ 82 ]. In a review by Latif et al. [ 101 ], the authors concluded that image feature representation, as it is performed, is impossible to be represented by using a unique feature representation. Instead, it should be achieved by a combination of said low-level features, considering they represent the image in the form of patches and, as such, the performance is increased.
  • In their publication, Rani et al. [ 102 ] reviewed the current literature found on this topic from the period from 1995 to 2021. The authors found that researchers in microbiology have employed ML techniques for the image recognition of four types of micro-organisms: bacteria, algae, protozoa, and fungi. In their research work, Kasinathan and Uyyala [ 17 ] apply computer vision and knowledge-based approaches to improve insect detection and classification in dense image scenarios. In this work, image processing techniques were applied to extract features, and classification models were built using ML algorithms. The proposed approach used different feature descriptors, such as texture, color, shape, histograms of oriented gradients (HOG) and global image descriptors (GIST). ML was used to analyze multivariety insect data to obtain the efficient utilization of resources and improved classification accuracy for field crop insects with a similar appearance.

As the most popular research area for image processing, research studies using DL in the medical field exist in a wide variety of subjects. Automatic classifiers for imaging purposes can be used in many different medical subjects, often with very good results. However, the variety of devices, locations, and sampling techniques used can often lead to undesired or misunderstood results. One clear advantage of these approaches is that some exams and analyses are based on a human inspection, which can be time-consuming, require extensive training for the personnel, and may also be subject to subjectivity and variability in the observers [ 16 , 103 , 104 ]. In 2023, Luis et al. applied explainable artificial intelligence (xAI) as a way to test the application of different classifiers for monkeypox detection and to better understand the results [ 62 ]. With a greater focus on properly interpreting the model results, approaches such as these are increasingly more common. Recently, Melanthota et al. [ 32 ] conducted a review of research regarding DL-based image processing in optical microscopy. DL techniques can be particularly useful in this topic since manual image analysis of tissue samples tends to be a very tedious and time-consuming process due to the complex nature of the biological entities, while the results can also be highly subjective. The authors concluded that DL models perform well in improving image resolution in smartphone-based microscopy, being an asset in the development and evolution of healthcare solutions in remote locations. The authors also identified an interesting application of DL to monitor gene expression and protein localization in organisms. Overall, it was noted how CNN-based DL networks have emerged as a model with great potential for medical image processing.

Brain image segmentation is a subject addressed by a vast number of researchers who seek to develop systems for accurate cancer diagnosis able to differentiate cancer cells from healthy ones [ 105 , 106 , 107 , 108 , 109 , 110 , 111 ]. A problem that such approaches can mitigate is that human verification of magnetic resonance imaging to locate tumors can be prone to errors. In a recent study, Devunooru et al. [ 105 ] provided a taxonomy system for the key components needed to develop an innovative brain tumor diagnosis system based on DL models. The taxonomy system, named data image segmentation processing and viewing (DIV), comprised research that had been developed since 2016. The results indicated that the majority of the proposed approaches only applied two factors from the taxonomy system, namely data and image segmentation, ignoring a third important factor, which is "view". The comprehensive framework developed by the authors considers all three factors to overcome the limitations of state-of-the-art solutions. Finally, the authors consider that efforts should be made to increase the efficiency of approaches used in image segmentation problems, as well as in problems processing large quantities of medical images.

In their review, Yedder et al. [ 112 ] focused on studying state-of-the-art medical image reconstruction algorithms focused on DL-based methods. The main focus of his research was the reconstruction of biomedical images as an important source of information for the elaboration of medical diagnoses. The authors’ work focused on the differences observed by applying conventional reconstruction methods in contrast to learning-based methods. They showed particular interest in the success of DL in computer vision and medical imaging problems, as well as its recent rise in popularity, concluding that DL-based methods appeared to adequately address the noise sensitivity and the computational inefficiency of iterative methods. Furthermore, the authors noted that the use of DL methods in medical image reconstruction encompassed an ever-increasing number of modalities, noting a clear trend in the newer art toward unsupervised approaches, primarily instigated by the constraints in realistic or real-world training data.

4.1.2. Research Using Reinforcement Learning

Finally, we will finalize our state-of-the-art review by referencing research that used reinforcement learning approaches, mostly in combination with deep learning methods. RL research has been developed in several topics, including robotics [ 113 , 114 , 115 ], design automation [ 25 ], energy management strategies for hybrid vehicles [ 43 ], parameter estimation in the context of biological systems [ 44 , 116 , 117 ], in facial motion learning [ 48 , 50 , 118 ], and have also been successfully applied in closed-world environments, such as games [ 51 , 54 , 119 , 120 ]. In the topic of image processing, some pertinent studies were found, especially using DRL [ 31 , 47 , 57 , 121 ]. Many novel applications continue to be proposed by researchers. A study conducted in 2022 by Dai et al. [ 122 ] explored effective healthcare strategies for simulated human bodies through the combination of DRL methods with conceptual embedding techniques. In this instance, the DNN architecture was used to recreate the transformation function of the input-output characteristics in a human body, using a dataset containing 990 tongue images of nine body constitution (BC) types. The authors concluded that the proposed framework could be particularly useful when applied to a high-dimensional dynamic system of the human body. Amongst the most relevant research encountered, we highlight the following:

In order to overcome the challenges in computer vision, in terms of data-efficiency or generalizing to new environments, a study from 2020 by Laskin et al. [ 49 ] presented a reinforcement learning module with augmented data leveraging, which could be incorporated in typical RL systems to effortlessly improve their overall performance. The authors remarked that data augmentations could potentially improve data efficiency in RL methods operating from pixels, even without significant changes to the underlying RL algorithm. The proposed approach by Laskin et al. [ 49 ] could help make deep RL be more practical for solving real-world problems. In a different example, Khayyat and Elrefaei Khayyat and Elrefaei [ 47 ] successfully developed a system for retrieving ancient images from Arabic manuscripts through an RL agent. The main benefit of this approach was the reduction of data dimensionality, which leads to increased accuracy in image classification and retrieval tasks. Image visual features, extracted using a pre-trained VGG19 convolutional neural network, are fused with textual features through a concatenation and hash merge layer. The success achieved in this scenario may also suggest that the model can be applied to other types of images.

Amongst the recent advancements in DRL focusing on computing optimization is the work presented by Ren et al. [ 57 ], which proposed a system for image stereo-matching algorithms with rule constraints and parallax estimation. Initially, the edge pixel constraint rules were established, and adjustments were made to the image blocks; then, the image parallax estimation was performed, and a DRL analysis was executed by a CNN in an iterative way. The results showed the proposed algorithm was able to complete convergence quickly, with an accuracy of up to more than 95%. However, the matching targets were not clearly defined, particularly in small objects with curved surfaces, which could limit their practicality. Due to a large number of existing models, in 2022, Le et al. [ 31 ] conducted an extensive review of the state-of-the-art advances using DRL in computer vision research. The main objective was to propose a categorization of DRL methodologies, present the potential advantages and limitations in computer vision, and discuss future research directions. The authors propose to divide DRL methods into seven categories, depending on their applications: (i) landmark localization, (ii) object detection, (iii) object tracking, (iv) registration on both 2D image and 3D image volumetric data, (v) image segmentation, (vi) video analysis, and (vii) other applications. Some of the most promising approaches selected by the authors to create new insights into this research field included inverse DRL, multi-agent DRL, meta DRL, and imitation learning.

5. Discussion and Future Directions

Although the advances and successes of ML are undeniable, particularly in the field of digital image processing, there are still important limitations, both in terms of its operational mode and in terms of its design. One of the most important is the fact that, for the most part, the algorithms developed to date are trained to perform a certain task, being able to solve a particular problem. The generalization capacity of existing ML models is limited, making it difficult to apply them to solve problems other than those for which they were trained. Although it is possible to apply learning transfer techniques with the aim of using existing models in new contexts, the results still fall short of the needs.

As previously noted, another one of the limitations we identified concerns the models’ efficiency. ML, in particular DL techniques, requires a large amount of data and computational resources to train and run the models, which may be infeasible or impractical in some scenarios or applications. This requires techniques that can reduce the cost and time of training and inference, as well as increase the robustness and generalization of the models. Some examples of these techniques are model compression, model pruning, model quantization, and knowledge distillation, among others.

Additionally, it is important to highlight the difficulty in interpreting DL models, given their complexity and opacity, which makes it difficult to understand their internal functioning, as well as the results produced. This requires techniques that can explain the functioning, logic, and reliability of models, as well as the factors that influence their decisions. Some examples of these techniques are the visualization of activations, sensitivity analysis, attribution of importance, and generation of counterfacts, among others.

No less important are the limitations that deserve some reflection related to ethics and responsibility since DL has a major impact on society, business, and people. This requires the use of techniques that can guarantee the privacy, security, transparency, justice, and accountability of models, as well as avoid or mitigate their possible negative effects. Some examples of techniques that can help in the mitigation of such limitations are homomorphic encryption, federated learning, algorithmic auditing, and bias detection.

6. Conclusions

In this review, we analyzed some of the most recent works developed in ML, particularly using DL and RL methods or combinations of these. It is becoming increasingly obvious that image processing systems are applied in the most diverse contexts and have seen increasingly more impressive results as the methods have matured. Some of the observed trends appear to indicate a prevalence of certain techniques in certain research topics, which is not surprising. Amongst these trends, we observed:

  • Interest in image-processing systems using DL methods has exponentially increased over the last few years. The most common research disciplines for image processing and AI are medicine, computer science, and engineering.
  • Traditional ML methods are still extremely relevant and are frequently used in fields such as computational biology and disease diagnosis and prediction or to assist in specific tasks when coupled with other more complex methods. DL methods have become of particular interest in many image-processing problems, particularly because of their ability to circumvent some of the challenges that more traditional approaches face.
  • A lot of attention from researchers seems to focus on improving model performance, reducing computational resources and time, and expanding the application of ML models to solve concrete real-world problems.
  • The medical field seems to have developed a particular interest in research using multiple classes and methods of learning algorithms. DL image processing has been useful in analyzing medical exams and other imaging applications. Some areas have also still found success using more traditional ML methods.
  • Another area of interest appears to be autonomous driving and driver profiling, possibly powered by the increased access to information available both for the drivers and the vehicles alike. Indeed, modern driving assistance systems have already implemented features such as (a) road lane finding, (b) free driving space finding, (c) traffic sign detection and recognition, (d) traffic light detection and recognition, and (e) road-object detection and tracking. This research field will undoubtedly be responsible for many more studies in the near future.
  • Graphical search engines and content-based image retrieval systems also present themselves as an interesting topic of research for image processing, with a diverse body of work and innovative approaches.

We found interesting applications using a mix of DL and RL models. The main advantage of these approaches is having the potential of DL to process and classify the data and use reinforcement methods to capitalize on the historical feedback of the performed actions to fine-tune the learning hyperparameters. This is one area that seems to have become a focus point of research, with an increasing number of studies being developed in an area that is still recent. This attention will undoubtedly lead to many new developments and breakthroughs in the following years, particularly in computer vision problems, as this suite of methods becomes more mature and more widely used.

Acknowledgments

We thank the reviewers for their very helpful comments.

Abbreviations

The following abbreviations are used in this manuscript:

Funding Statement

This manuscript is a result of the research project “DarwinGSE: Darwin Graphical Search Engine”, with code CENTRO-01-0247-FEDER-045256, co-financed by Centro 2020, Portugal 2020 and the European Union through the European Regional Development Fund.

Author Contributions

Conceptualization, S.J., J.V. and J.A.; formal analysis, S.J., J.V. and J.A.; funding acquisition, C.M.; Investigation, S.J. and J.V.; methodology, S.J., J.V. and J.A.; project administration, C.M.; supervision, S.J. and C.M.; validation, S.J., J.V., J.A. and C.M.; writing—original draft, J.V. and J.A.; writing—review and editing, S.J., J.V., J.A. and C.M. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Top 10 Digital Image Processing Project Topics

We guide research scholars in choosing novel digital image processing project topics. What is meant by digital image processing? Digital Image Processing is a method of handling images to get different insights into the digital image. It has a set of technologies to analyze the image in multiple aspects for better human / machine image interpretation . To be clearer, it is used to improve the actual quality of the image or to abstract the essential features from the entire picture is achieved through digital image processing projects.

This page is about the new upcoming Digital Image Processing Project Topics for scholars who wish to create a masterpiece in their research career!!!

Generally, the digital image is represented in the form of pixels which are arranged in array format. The dimension of the rectangular array gives the size of the image (MxN), where M denotes the column and N denotes the row. Further, x and y coordinates are used to signify the single-pixel position of an image. At the same time, the x value increases from left to right, and the y value increases from top to bottom in the coordinate representation of the image. When you get into the DIP research field, you need to know the following key terminologies.

Top 10 Digital Image Processing Project Topics Guidance

Important Digital Image Processing Terminologies  

  • Stereo Vision and Super Resolution
  • Multi-Spectral Remote Sensing and Imaging
  • Digital Photography and Imaging
  • Acoustic Imaging and Holographic Imaging
  • Computer Vision and Graphics
  • Image Manipulation and Retrieval
  • Quality Enrichment in Volumetric Imaging
  • Color Imaging and Bio-Medical Imaging
  • Pattern Recognition and Analysis
  • Imaging Software Tools, Technologies and Languages
  • Image Acquisition and Compression Techniques
  • Mathematical Morphological Image Segmentation

Image Processing Algorithms

In general, image processing techniques/methods are used to perform certain actions over the input images, and according to that, the desired information is extracted in it. For that, input is an image, and the result is an improved/expected image associated with their task. It is essential to find that the algorithms for image processing play a crucial role in current real-time applications. Various algorithms are used for various purposes as follows, 

  • Digital Image Detection
  • Image Reconstruction
  • Image Restoration
  • Image Enhancement
  • Image Quality Estimation
  • Spectral Image Estimation
  • Image Data Compression

For the above image processing tasks, algorithms are customized for the number of training and testing samples and also can be used for real-time/online processing. Till now, filtering techniques are used for image processing and enhancement, and their main functions are as follows, 

  • Brightness Correction
  • Contrast Enhancement
  • Resolution and Noise Level of Image
  • Contouring and Image Sharpening
  • Blurring, Edge Detection and Embossing

Some of the commonly used techniques for image processing can be classified into the following, 

  • Medium Level Image Processing Techniques – Binarization and Compression
  • Higher Level Image Processing Techniques – Image Segmentation
  • Low-Level Image Processing Techniques – Noise Elimination and Color Contrast Enhancement
  • Recognition and Detection Image Processing Algorithms – Semantic Analysis

Next, let’s see about some of the traditional image processing algorithms for your information. Our research team will guide in handpicking apt solutions for research problems . If there is a need, we are also ready to design own hybrid algorithms and techniques for sorting out complicated model . 

Types of Digital Image Processing Algorithms

  • Hough Transform Algorithm
  • Canny Edge Detector Algorithm
  • Scale-Invariant Feature Transform (SIFT) Algorithm
  • Generalized Hough Transform Algorithm
  • Speeded Up Robust Features (SURF) Algorithm
  • Marr–Hildreth Algorithm
  • Connected-component labeling algorithm: Identify and classify the disconnected areas
  • Histogram equalization algorithm: Enhance the contrast of image by utilizing the histogram
  • Adaptive histogram equalization algorithm: Perform slight alteration in contrast for the  equalization of the histogram
  • Error Diffusion Algorithm
  • Ordered Dithering Algorithm
  • Floyd–Steinberg Dithering Algorithm
  • Riemersma Dithering Algorithm
  • Richardson–Lucy deconvolution algorithm : It is also known as a deblurring algorithm, which removes the misrepresentation of the image to recover the original image
  • Seam carving algorithm : Differentiate the edge based on the image background information and also known as content-aware image resizing algorithm
  • Region Growing Algorithm
  • GrowCut Algorithm
  • Watershed Transformation Algorithm
  • Random Walker Algorithm
  • Elser difference-map algorithm: It is a search based algorithm primarily used for X-Ray diffraction microscopy to solve the general constraint satisfaction problems
  • Blind deconvolution algorithm : It is similar to Richardson–Lucy deconvolution to reconstruct the sharp point of blur image. In other words, it’s the process of deblurring the image.

Nowadays, various industries are also utilizing digital image processing by developing customizing procedures to satisfy their requirements. It may be achieved either from scratch or hybrid algorithmic functions . As a result, it is clear that image processing is revolutionary developed in many information technology sectors and applications.  

Research Digital Image Processing Project Topics

Digital Image Processing Techniques

  • In order to smooth the image, substitutes neighbor median / common value in the place of the actual pixel value. Whereas it is performed in the case of weak edge sharpness and blur image effect.
  • Eliminate the distortion in an image by scaling, wrapping, translation, and rotation process
  • Differentiate the in-depth image content to figure out the original hidden data or to convert the color image into a gray-scale image
  • Breaking up of image into multiple forms based on certain constraints. For instance: foreground, background
  • Enhance the image display through pixel-based threshold operation 
  • Reduce the noise in an image by the average of diverse quality multiple images 
  • Sharpening the image by improving the pixel value in the edge
  • Extract the specific feature for removal of noise in an image
  • Perform arithmetic operations (add, sub, divide and multiply) to identify the variation in between the images 

Beyond this, this field will give you numerous Digital Image Processing Project Topics for current and upcoming scholars . Below, we have mentioned some research ideas that help you to classify analysis, represent and display the images or particular characteristics of an image.

Latest 11 Interesting Digital Image Processing Project Topics

  • Acoustic and Color Image Processing
  • Digital Video and Signal Processing
  • Multi-spectral and Laser Polarimetric Imaging
  • Image Processing and Sensing Techniques
  • Super-resolution Imaging and Applications
  • Passive and Active Remote Sensing
  • Time-Frequency Signal Processing and Analysis
  • 3-D Surface Reconstruction using Remote Sensed Image
  • Digital Image based Steganalysis and Steganography
  • Radar Image Processing for Remote Sensing Applications
  • Adaptive Clustering Algorithms for Image processing

Moreover, if you want to know more about Digital Image Processing Project Topics for your research, then communicate with our team. We will give detailed information on current trends, future developments, and real-time challenges in the research grounds of Digital Image Processing.

Why Work With Us ?

Senior research member, research experience, journal member, book publisher, research ethics, business ethics, valid references, explanations, paper publication, 9 big reasons to select us.

Our Editor-in-Chief has Website Ownership who control and deliver all aspects of PhD Direction to scholars and students and also keep the look to fully manage all our clients.

Our world-class certified experts have 18+years of experience in Research & Development programs (Industrial Research) who absolutely immersed as many scholars as possible in developing strong PhD research projects.

We associated with 200+reputed SCI and SCOPUS indexed journals (SJR ranking) for getting research work to be published in standard journals (Your first-choice journal).

PhDdirection.com is world’s largest book publishing platform that predominantly work subject-wise categories for scholars/students to assist their books writing and takes out into the University Library.

Our researchers provide required research ethics such as Confidentiality & Privacy, Novelty (valuable research), Plagiarism-Free, and Timely Delivery. Our customers have freedom to examine their current specific research activities.

Our organization take into consideration of customer satisfaction, online, offline support and professional works deliver since these are the actual inspiring business factors.

Solid works delivering by young qualified global research team. "References" is the key to evaluating works easier because we carefully assess scholars findings.

Detailed Videos, Readme files, Screenshots are provided for all research projects. We provide Teamviewer support and other online channels for project explanation.

Worthy journal publication is our main thing like IEEE, ACM, Springer, IET, Elsevier, etc. We substantially reduces scholars burden in publication side. We carry scholars from initial submission to final acceptance.

Related Pages

Our benefits, throughout reference, confidential agreement, research no way resale, plagiarism-free, publication guarantee, customize support, fair revisions, business professionalism, domains & tools, we generally use, wireless communication (4g lte, and 5g), ad hoc networks (vanet, manet, etc.), wireless sensor networks, software defined networks, network security, internet of things (mqtt, coap), internet of vehicles, cloud computing, fog computing, edge computing, mobile computing, mobile cloud computing, ubiquitous computing, digital image processing, medical image processing, pattern analysis and machine intelligence, geoscience and remote sensing, big data analytics, data mining, power electronics, web of things, digital forensics, natural language processing, automation systems, artificial intelligence, mininet 2.1.0, matlab (r2018b/r2019a), matlab and simulink, apache hadoop, apache spark mlib, apache mahout, apache flink, apache storm, apache cassandra, pig and hive, rapid miner, support 24/7, call us @ any time, +91 9444829042, [email protected].

Questions ?

Click here to chat with us

Research Topics

Biomedical Imaging

Biomedical Imaging

The current plethora of imaging technologies such as magnetic resonance imaging (MR), computed tomography (CT), position emission tomography (PET), optical coherence tomography (OCT), and ultrasound provide great insight into the different anatomical and functional processes of the human body.

Computer Vision

Computer Vision

Computer vision is the science and technology of teaching a computer to interpret images and video as well as a typical human. Technically, computer vision encompasses the fields of image/video processing, pattern recognition, biological vision, artificial intelligence, augmented reality, mathematical modeling, statistics, probability, optimization, 2D sensors, and photography.

Image Segmentation/Classification

Image Segmentation/Classification

Extracting information from a digital image often depends on first identifying desired objects or breaking down the image into homogenous regions (a process called 'segmentation') and then assigning these objects to particular classes (a process called 'classification'). This is a fundamental part of computer vision, combining image processing and pattern recognition techniques.

Multiresolution Techniques

Multiresolution   Techniques

The VIP lab has a particularly extensive history with multiresolution methods, and a significant number of research students have explored this theme. Multiresolution methods are very broad, essentially meaning than an image or video is modeled, represented, or features extracted on more than one scale, somehow allowing both local and non-local phenomena.

Remote Sensing

Remote Sensing

Remote sensing, or the science of capturing data of the earth from airplanes or satellites, enables regular monitoring of land, ocean, and atmosphere expanses, representing data that cannot be captured using any other means. A vast amount of information is generated by remote sensing platforms and there is an obvious need to analyze the data accurately and efficiently.

Scientific Imaging

Scientific Imaging

Scientific Imaging refers to working on two- or three-dimensional imagery taken for a scientific purpose, in most cases acquired either through a microscope or remotely-sensed images taken at a distance.

Stochastic Models

Stochastic Models

In many image processing, computer vision, and pattern recognition applications, there is often a large degree of uncertainty associated with factors such as the appearance of the underlying scene within the acquired data, the location and trajectory of the object of interest, the physical appearance (e.g., size, shape, color, etc.) of the objects being detected, etc.

Video Analysis

Video Analysis

Video analysis is a field within  computer vision  that involves the automatic interpretation of digital video using computer algorithms. Although humans are readily able to interpret digital video, developing algorithms for the computer to perform the same task has been highly evasive and is now an active research field.

Deep Evolution Figure

Evolutionary Deep Intelligence

Deep learning has shown considerable promise in recent years, producing tremendous results and significantly improving the accuracy of a variety of challenging problems when compared to other machine learning methods.

Discovered Radiomics Sequencer

Discovery Radiomics

Radiomics, which involves the high-throughput extraction and analysis of a large amount of quantitative features from medical imaging data to characterize tumor phenotype in a quantitative manner, is ushering in a new era of imaging-driven quantitative personalized cancer decision support and management. 

Discovered Radiomics Sequencer

Sports Analytics

Sports Analytics is a growing field in computer vision that analyzes visual cues from images to provide statistical data on players, teams, and games. Want to know how a player's technique improves the quality of the team? Can a team, based on their defensive position, increase their chances to the finals? These are a few out of a plethora of questions that are answered in sports analytics.

Share via Facebook

  • Contact Waterloo
  • Maps & Directions
  • Accessibility

The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is co-ordinated within the Office of Indigenous Relations .

Digital image processing realized by memristor-based technologies

  • Open access
  • Published: 28 September 2023
  • Volume 18 , article number  120 , ( 2023 )

Cite this article

You have full access to this open access article

  • Lei Wang 1 ,
  • Qingyue Meng 1 ,
  • Huihui Wang 1 ,
  • Jiyuan Jiang 1 ,
  • Xiang Wan 1 ,
  • Xiaoyan Liu 1 ,
  • Xiaojuan Lian 1 &
  • Zhikuang Cai 1  

1769 Accesses

Explore all metrics

Today performance and operational efficiency of computer systems on digital image processing are exacerbated owing to the increased complexity of image processing. It is also difficult for image processors based on complementary metal–oxide–semiconductor (CMOS) transistors to continuously increase the integration density, causing by their underlying physical restriction and economic costs. However, such obstacles can be eliminated by non-volatile resistive memory technologies (known as memristors), arising from their compacted area, speed, power consumption high efficiency, and in-memory computing capability. This review begins with presenting the image processing methods based on pure algorithm and conventional CMOS-based digital image processing strategies. Subsequently, current issues faced by digital image processing and the strategies adopted for overcoming these issues, are discussed. The state-of-the-art memristor technologies and their challenges in digital image processing applications are also introduced, such as memristor-based image compression, memristor-based edge and line detections, and voice and image recognition using memristors. This review finally envisages the prospects for successful implementation of memristor devices in digital image processing.

Similar content being viewed by others

research work in digital image processing

Memristor—The New Computing Element for Data-Centric Real-Time Image Processing

Memristive computing devices and applications.

Mohammed A. Zidan, An Chen, … Wei D. Lu

research work in digital image processing

Neuromorphic computing with memristive devices

Wen Ma, Mohammed A. Zidan & Wei D. Lu

Avoid common mistakes on your manuscript.

Introduction

Digital image processing technology, a technique for processing image information with computers or real-time hardware, mainly involves image coding and compression, image enhancement and restoration, image segmentation, image recognition, and so on. Common algorithms include the familiar single/multi-scale retinex algorithm for image enhancement [ 1 , 2 ], Sobel [ 3 ] and Canny operators for image edge detection [ 4 ], and Gaussian filtering algorithm for image denoising [ 5 ]. Table 1 summarizes the classical algorithms used in image enhancement and restoration, image segmentation and recognition classification processing. Although computer image processing is highly sophisticated and abundant, the processing speed is not ideal for complex or real-time situations. Therefore, researchers are always keen to explore image processing schemes that conform to the human visual system inspired by the neuromorphic system.

Owing to the increasing demand for high-performance computing in the digital image processing field, traditional Von Neumann computer architectures become highly inefficient. In recent years, neuromorphic hardware systems have gained significant attention. Such systems can potentially provide bio-perception and information processing capabilities within a compact and energy-efficient platform [ 6 , 7 ]. Inspired by the human brain, neuromorphic computing can address the inherent limitations of traditional von Neumann architectures [ 8 ]. Therefore, the construction of biorealistic synaptic primitives with rich spatiotemporal dynamics is indispensable for low-power neuromorphic hardware [ 9 ]. Neuromorphic vision has special advantages over conventional machine vision and has attracted considerable interest due to its ability to imitate human visual perception [ 10 , 11 , 12 ].

Traditional digital image processing architectures that incorporate neuromorphic systems have inherent parallel processing capabilities of cellular nonlinear network (CNN)-based architectures, making them effective platforms for various image processing tasks [ 13 , 14 ], bringing scalability, simplicity, and power efficiency to VLSI implementations [ 15 ]. Sufficient CNN templates have been found to perform detail extraction tasks such as edge detection in vision systems [ 16 ]. Moreover, Johnson et al. [ 17 , 18 , 19 , 20 ] developed a pulse-coupled neural network (PCNN) by studying the γ-band synchronous spike dynamics. The concept of "pulse-coupled neural network" first appeared in [ 17 ] and the classical PCNN appeared in [ 21 , 22 , 23 ]. Pulse-coupled neural networks simulate the behavior of optic nerve cells in the visual cortex of mammals (e.g., cats) [ 18 , 19 , 24 ]. It was shown that PCNNs are single-layer neural networks with a two-dimensional matrix structure, whose size and layout depend on the input image. It is promising for real-time image processing as it can perform image fusion [ 25 , 26 ], image segmentation [ 27 , 28 ], and object detection [ 29 , 30 ] without requiring training process.

However, the architectures mentioned above involve complex software algorithms, which intangibly add processing time and reduce the system efficiency. Therefore, researchers are inspired by the human visual system and try to establish a novel digital image processing architecture. The human visual system (HVS) integrates perceptual and processing, which involves filtering or suppressing noise and enhancing target features with the retina, followed by parallel high-level image processing in the visual cortex. [ 11 , 12 , 41 , 42 , 43 ]. In digital general-purpose processors, many image processing applications require multiple operations per second, even though these applications do not require floating-point precision [ 44 ]. In a memristor-based image processing network, the image processing time and iterations required for the program are directly reduced on account of the fast-switching speed and low power consumption of the memristor, which can not only store information but also compute and process it.

Algorithms with memristor behavior have an impact on digital image processing as they introduce nonlinear effects in digital image processing algorithms that may lead to more complex and diverse image processing results. In addition, some complicated algorithms require a considerable computational resource, resulting in slower operation and the need to optimize the algorithm or use better computer hardware. Therefore, it is necessary to carefully consider the effects of the memristor behaviors on the processing results when applying them to digital image processing algorithms and to select appropriate processing methods to control and adjust their effects to meet specific image processing demands. In terms of digital image processing results, the quality of images processed in this way is not optimal when dealing with images generated under special conditions (e.g., poor lighting conditions, excessive noise, etc.) or large-scale image data. Therefore, to reduce power consumption and training costs, hardware digital image processing architectures based on memristor networks that enable massively parallelism and minimize data transfers have emerged.

Memristor is the crucial component for the analog visual system's enhancement and inhibition effects. The principle of lateral inhibition of biological neurons is shown in Fig.  1 a. When a neuron is excited through stimulation, and a neighboring neuron is stimulated, the excitation occurring in the neighboring neuron has an inhibitory effect on the former, and this feature coincides with the properties of the memristor. The memristor, introduced by L. Chua in 1971 [ 45 ] from the completeness of the circuit, is the fourth elementary two-terminal circuit element characterized by a nonlinear constitutive relationship between flux and charge and was consciously discovered by Strukov D B and his team in 2008 on the nanoscale metal oxides [ 46 ]. Moreover, memristors have optimal write energy and standby power, where the majority of pulse-code modulated (PCM) devices and resistive random access memories (RRAM) have write energies of about 10–100 pJ and 100 fJ–10 pJ [ 47 ], respectively. Several studies have proved that the computational energy efficiency of memristors exceeds that of today's graphics processing units by two orders of magnitude [ 48 ]. Here, the enhancement process of a 3 × 3 grayscale image is used to explain the process of employing a memristor array structure to a hardware digital image, as shown in Fig.  1 b. A two-dimensional image consists of many pixel points, and in view of a grayscale image, the grayscale value of the image is mapped to the input voltage (current) of the array, and the output voltage (current) is obtained through vector operation or interaction (enhancement or suppression) between the memristors by the array of equal size, and then the opposite step is performed as the previous one, i.e., the voltage or current is mapped to the grayscale values of 0 to 255. Finally, the processed results can be obtained. In memristor-based image processing networks, the fast-switching speed, and low power consumption of memristors directly reduce the image processing time and the iterations required for the program because of their capabilities not only for storing information but also for calculating and manipulating it.

figure 1

Digital image processing architecture for neuromorphic systems combined with memristors. a Diagram of the interaction between biological neurons. b Process of 3 × 3 Gy image processing based on memristor array, take enhancement as an example

In this article, we focus on helping the reader understand the current status of memristor devices and image processing based on memristor circuits. We present recent research on the application of memristors in hardware image processing and compare the implementation of pure software image processing and memristor-based image processing. Their advantages, disadvantages, and existing problems are subsequently analyzed. This paper is divided into four parts. “ Memristor ” section escribes the theory of memristors and presents the reasons for their application in the field of neuromorphology. “ Image quality assessment metrics ” section introduces several commonly used image evaluation metrics to facilitate later comparisons of the effects of memristor-based hardware digital image processing. “ Discussion ” section lists the current research on the application of memristor-based circuits in various aspects of image processing. Finally, we conclude with a discussion of the prospects for the development and openness of hardware digital image processing and summarize the work of this paper.

With the continuous development of big data, the Internet of Things, artificial intelligence and other technologies, it is urgent to put forward a new computing system to deal with dense data. The human brain can process and store data simultaneously, thus reducing energy consumption and greatly improving the efficiency of computing. Therefore, building brain-like operations and developing intelligent brain-like devices is an essential breakthrough in AI research [ 49 ]. Researchers at HP Labs have experimentally confirmed that memristors are a new type of nonlinear two-terminal nanoscale component with switching characteristics, memory capability, and continuous input and output properties [ 46 ]. Due to its inherent property of analog inputs and outputs, memristor-based memories can allow for higher accuracy than conventional binary memories. Compared to dynamic random access memory, memristors maintain their state after power loss, making memristor-based memories non-volatile [ 50 , 51 ]. Notably, the combination of memristors and nanowire crossbar interconnection has become a topic of great interest to researchers [ 52 , 53 ]. The memristor crossbar array structure combines the features of high storage density, high precision and fast access speed of memristors with the massively parallel processing of crossbar arrays, enabling the structure to possess strong information processing capabilities and easy compatibility with large-scale integrated circuits (VLSI). Considering the advantages above, it has broad application prospects in arithmetic operation, mode comparison, information processing, and virtual reality. This section introduces the memristors commonly applied in the hardware architecture of digital image processing and their working mechanisms. At the end of this section, we also summarize the electrical performance of memristors with different structures at the current stage (Table 2 ).

Memristors for image processing

Memristor is a nonlinear resistor with memory capability whose resistance is affected by the amount of charge or magnetic flux passing through it. In 1971, Chua [ 45 , 54 ] theoretically proposed the memristor (short for memory resistor) based on the symmetry argument of circuit theory. Memristance (resistance of a memristor) was defined as the ratio between the magnetic flux φ and charge q passing through the memristor (i.e., \(M = {\text{d}}\varphi /{\text{d}}q\) ) by Chua (Fig.  2 a). As φ and q are time integrals of voltage and current, respectively. Then,

figure 2

a Four basic circuit elements and their respective relationships. b A typical hysteresis loop of the memristor. c Diagram illustrating the structure of a neuromorphic crossbar comprised of memristor synapses and CMOS neurons [ 58 ]. d TEM cross-section of the Ta/HfO2/Pt device. Measurements run with the top Ta electrode biased and the bottom Pt electrode grounded [ 59 ]. e Typical I–V curve showing resistor switching behavior, with black arrows indicating device switching direction [ 59 ]. f High- and low-resistance states have been demonstrated for devices with 120 billion switching cycles at -3.05 V/100 ns RESET and 1.3 V/100 ns SET pulses [ 59 ]. g Retention testing of eight different levels at 150 °C (> 104 s) confirmed the non-volatile characteristics and demonstrated the device's suitability for multi-level memory [ 59 ]. h 2. 20 enhancement/inhibition epochs were realized, each of these pulses comprising 39 pulses [ 59 ]. i Device structure and cross sectional TEM image of the Ag–TiO 2 nanocomposite-based memristor [ 35 ]. j Schematic of optically gated electrically driven synaptic modulation operation [ 35 ]. k I–V curve of a memristor device after 15 min of exposure to visible light [ 35 ]. l Long-term conductance augmentation and inhibition stimulated by 50 positive/negative pulses (± 2 V, 50 ms) [ 35 ]

This equation shows that the unit of M is the same as the resistance, i.e., ohm (Ω). In 1976, Chua and Kang elucidated the strong dependence of memristive systems on the implementation of state variables and provided a generalized definition of memristive systems derived from memristors [ 54 ], which can be mathematically defined as:

where w is an internal state variable, and in general R and f are explicit functions of time. If an arbitrary periodic voltage (current) signal is applied to an ideal memristor and the excitation voltage (current) and response voltage (current) are then plotted, a diagonal "8"-shaped tight pinch hysteresis return is obtained, as shown in Fig.  2 b, which was used by Chua as a landmark criterion for memristor phenomena [ 55 ]. This definition was eventually refined in Chua's latest publication [ 56 , 57 ]. This pinched hysteresis loop of current voltage ( i  −  v ) has also become the most representative feature of the memristor. The shape of this loop varies with the amplitude and frequency of the input waveform, but the common feature is the absence of positive and negative values in each cycle and the passage through the origin of the coordinates. Meanwhile, direct experimental support for memristor neuromorphic systems such as spike-timing-dependent plasticity originated from a hybrid system of memristor synapses and CMOS neurons (Fig.  2 c).

Here we focus on two typical types of memristor structures and device performance applied in digital image processing. Jiang et al. reported a Ta/HfO 2 /Pt memristor (Fig.  2 d) [ 59 ] with low programming voltages (Fig.  2 e), fast switching speeds (≤ 5 ns), high endurance (120 billion cycles) (Fig.  2 f) and reliable retention (> > 10 years extrapolated at 85 °C). In addition, potentiation and depression were demonstrated over 2 20 epochs (Fig.  2 h), indicating that the device can be used for multi-level non-volatile memories (Fig.  2 g) and neuromorphic computing applications. Shan et al. developed a plasmonic optoelectronic memristor [ 35 ] (Fig.  2 i) that relies on optical excitation in an Ag-TiO 2 nanocomposite film and the effects of localized surface plasmon resonance (LSPR). Fully light-induced and light-gated synaptic plasticity functions were achieved in the single device (Fig.  2 j), including reversible synaptic potentiation/suppression under visible and ultraviolet illumination and modulation of the STDP learning rule (Fig.  2 k, l), which can be utilized for visual sensing and low-level image pre-processing (including contrast enhancement and noise reduction).

Working mechanism

The mechanism lies in the fact that synapses are intrinsically two-terminal devices, which share a striking similarity with memristive devices [ 45 , 46 ]. The advantage of this structure is that it can potentially provide connectivity and functional density comparable to biological systems, rather than operating in a digital computer manner [ 60 ]. These devices consist of a simple metal − insulator − metal (MIM) layer structure. The forming process creates localized conducting filaments, and the movement of these filaments leads to discrete and abrupt resistive switching characteristics [ 51 , 61 , 62 , 63 , 64 ]. Specifically, the switching kinetics dominated by anion migration in semiconductors can be understood as follows. There are some mobile oxygen ions in the p-type storage medium, as schematically illustrated in Fig.  3 a-i. These moving oxygen ions migrate toward the TE when the top electrode (TE) is positively biased and then accumulate near the TE, thus creating a large number of cationic vacancies in the TE (Fig. 3a-ii). Once the fully p-type semiconductor conducting filament is formed, the device will switch to the low-resistance state (LRS) (Fig. 3a-iii). Most of the Joule heat will be generated at the thinnest part of the conducting filament when TE is negatively biased, greatly accelerating the movement of oxygen ions in that region. The oxygen ions flowing in this region will rapidly migrate toward the BE driven by the electric field, and as a result, the concentration of cationic vacancies at the thinnest part of the CF of the p-type semiconductor will be significantly reduced, resulting in the CF breaking off there, at which point the device is in the high-resistance state (HRS) (Fig. 3a-iv). When semiconductor (TiOx) junctions/two dynamic metal (Pt) are operated in series, a range of device states occur.

figure 3

a Schematic of anion migration dominated switching kinetics in p-type semiconductors. (i) The initial state with random distribution of mobile oxygen ions. (ii) The nucleation and subsequent growth of p-type CFs composed of cation vacancies from anode to cathode during the forming process. (iii) Full CF LRS in the thinnest region near the cathode. (iv) The thinnest region of the CF portion ruptured by the HRS [ 65 ]. b Schematic of BMThCE-based device, and the chemical structures of the photochromic diarylethene (UV: ultraviolet light; VIS: visible light) [ 35 ]. c I–V characteristics of the BMThCE-based memories ITO/o-BMThCE/Al and d ITO/c-BMThCE/Al [ 35 ]

Slightly different from electrically induced RS memories, the physical mechanisms of optical effects in optical memristors include photovoltaic effects and light-induced chemical reactions/configuration changes, etc. Photovoltaic effects typically involve the creation of free carriers, the separation of photogenerated electron–hole pairs, and the generation of voltages or currents from incident photons [ 66 ]. The separation of electron–hole is highly correlated with the Schottky barrier between metal and semiconductor or the internal electric field induced by the heterojunction interface (heterojunction system) [ 67 ]. This causes the holes to move toward positive electrode and electrons to the negative electrode, which subsequently extracts charge to the external circuit and generates an open circuit voltage. Photochemical reactions entail photons absorption, which excite molecules and cause chemical changes such as ionization and isomerization [ 68 ] (Fig.  3 b). The photo-induced switching behavior is tightly linked to conformational changes within the photoactive material, which may lead to changes in chemical bonds and energy bands. The photo-induced transition between conformational structures does have a remarkable impact on the RS type as the energy level changes, which can greatly modulate the device performance in a precise and energy-efficient manner (Fig.  3 c,d).

Reportedly, memristors respond to light and electrical stimuli [ 69 , 70 , 71 , 72 ]. Neuromorphic computing implementations in the electrical and optical domains requires a full combination of the integrated processing power of the electrical domain and low energy consumption and high bandwidth of the optical fields. Memristors have become both state modulators and photodetectors for their particular characteristics, capable of processing both electrical and optical signals. The common methods to realize synaptic or neuronal behavior include modulating the memristor state with electrical and optical programming signals, i.e., resistance or optical transmittance. In addition, the programmed input and readout signals are located in different domains, thus enabling direct conversion of optoelectronic signals, which is extremely attractive. For example, an electrical (optical) signal can change the optical (electrical) signal in a state modulator (photodetector).

Image quality assessment metrics

Image quality assessment metrics play an important role in various image processing applications. Digital images suffer from various distortions during the process of acquisition, processing, compression, storage, transmission and reproduction, any of which may leads to a degradation of visual quality. Image quality assessment metrics are available for optimizing algorithms and parameter settings of image processing systems and benchmarking them, and dynamically monitoring and adjusting image quality. Two types of metrics exist for assessing image quality, subjective and objective image quality assessment metrics. They are briefly described below.

Subjective image quality assessment metrics

Subjective assessment, also called subjective evaluation, is to evaluate the quality of an image through the subjective perception of a person as an observer and can most truly reflect the human visual perception. Common subjective evaluations are absolute and relative evaluations. The former involves the observers rating the original image and the image to be evaluated, and the latter involves the observers comparing the given image based on their own subjective feeling without any reference. The final evaluation score for both methods is the average of each evaluation score.

The subjective evaluation criterion uses the Mean Opinion Score (MOS):

where \(k \in \left\{ {1,2, \ldots K} \right\}\) is the evaluation level of the observer, S i is the evaluation score corresponding to the level, and N i is the number of evaluators for each type of score.

Objective image quality assessment metrics

Unlike the subjective assessment of images, objective evaluation assesses the quality of the image by establishing a mathematical model, scoring the image texture, sharpness, focus and other aspects and calculating the results, which can scientifically reflect the human eye's subjective perception of the image. It can be divided into full-reference, half-reference and no-reference image quality assessment methods according to whether the corresponding reference image can be found [ 87 ]. This section presents several common objective image quality assessment metrics, which are as follows.

Mean Square Error (MSE): an expected value of the squared difference between the true and estimated values of a parameter. Assuming that the reference image is f , image to be measured is g , and size of two images is M  ×  N . The grayscale values of the pixels are noted as f ( i , j ), g ( i , j ), and the mean squared error can be expressed as:

Peak Signal to Noise Ratio (PSNR): a calculation of the ratio of the maximum power of a signal to the power value of the noise. The larger the value, the smaller the distortion. The formula for calculating the PSNR is shown in Eq. ( 6 ).

Structural Similarity (SSIM): A well-known qualify metric developed by Wang et al. [ 87 ] for measuring the similarity between two images. It is thought to be associated with the perception quality of the HVS. SSIM is designed to model any image distortion as a mixture of three factors, loss of correlation, contrast distortion, luminance distortion. The SSIM is defined as:

Note that C1, C2, C3 are positive constants, aiming at avoiding the denominator to be 0 and σ fg is the covariance between f and g . The first item in (8) is the luminance comparison function which indicates the proximity of the average brightness of two images ( μ f and μg ). This factor acquires the maximum 1 only if μ f  =  μ g . The second one is the contrast function, which measures how closely two images compare, where contrast is measured in terms of standard deviation σ f and σ g . The maximum value of this term is 1 only when σ f  =  σ g . The last one is the structural contrast function representing the relevant coefficient between the two images f and g . Hence, the positive value range of SSIM is [0, 1], where the value of 1 means that f  =  g and 0 means no correlation between images.

Applications of memristor in digital image processing

Memristors have been widely employed in simulating artificial synapses because of their complex analog behavior since the rediscovery of the reversible resistive switching effect. Meanwhile, memristors can also be integrated with CMOS logic devices to serve as programmable switches [ 88 ], logic units [ 89 , 90 ], etc. With the development of CMOS technology, it has allowed the large-scale integration of integrated and excited (I&F) neurons on a single chip [ 91 , 92 , 93 , 94 ]. To overcome the need for CMOS circuits with numerous transistors and high-power consumption, it was found that memristor (memory + resistor) devices were invented and successfully used as synapses with low power consumption and high speed [ 95 , 96 , 97 , 98 ]. The computing and storage capabilities exhibited by the memristor crossbar array are expected to relieve the Von Neumann bottleneck, save the required area and energy, and increase the computing speed, which can be used in the fields of image processing and neural networks.

Image logical operation

Image logic operations, also known as image Boolean operations, are implemented based on pixels between two or more images [ 99 ], and the grayscale distribution of the operation results is different from that of the participating operations, mainly including AND, OR, NOT and XOR. In the field of image processing, AND and OR operations usually act as templates to extract sub-images from an image. NOT operation is used to reverse an image, thereby enhancing it. XOR operation is applied to encrypt and decrypt an image. They are commonly employed in the pre-processing of complex image processing, such as image segmentation, target detection and recognition, etc.

A class of advanced vision microprocessors was previously integrated on the same small hardware platform based on cellular neural networks (CNNs) [ 100 ], general-purpose computers [ 101 ], and CMOS light detection matrices. Photodetector-acquired data in these intelligent sensor arrays were processed under a set of local and global rules that were applied simultaneously and equally to each cell's neighborhood, endowing CNN’s general-purpose machines with massively parallel computing capabilities. The spatial resolution of these processors is rather low due to the limitation of the processing units that can be integrated within the integrated circuit area although these advanced processors processed images at a relatively fast speed.

To address the above problems, Zhou et al. [ 102 ] gave a memristor-based architecture combining memory and image processing functionalities. Unlike conventional memory systems, the architecture can perform image logic operations with the assistance of extra memristors, where the system is composed of a memory array, computing array, simulating voltage generator, address system, and a read and writing system. They considered the NOT, AND, OR, and XOR operations on images. In Fig.  4 a, b, the experimental results showed that the architecture was functionally correct and could reduce memory access considerably compared to the scheme proposed in [ 103 ]. However, this architecture has a problem that the running memristor needs to be reset when the next image processing operation occurs. How to correctly select the operating memristor will be the problem to be solved.

figure 4

a Resulting image of the XOR operation stored in memristor memory [ 102 ]. b Images used in logic operations [ 102 ]. c Schematic of proposed memristor-based one-bit approximate full adder [ 104 ]. d Circuit implementation of the M-CNN processing element C (i, j) (i ∈ {1, … M}, j ∈ {1, … N}). The capacitor, memristor, and resistor are unvaried from cell to cell, i.e., Cxi, j = Cx, mxi, j = mx, and Ryi, j = Ry [ 105 ]. e Input binary image and output binary image visualizing the data stored in the memristances at steady state [ 105 ]

Subsequently, Muthulakshmi et al. proposed a plausible approach [ 104 ] toward approximate computing with memristors for designing an 8-bit Ripple Carry Adder (RCA), which performed bitwise pixel addition of two grayscale images with the same size and compared the design with images obtained by the exact addition method. The researchers found that the delay of one-bit accurate and approximate adder was obtained as 32.56 and 0.2316 ns, respectively, demonstrating a significant reduction in the delay. Figure  4 c briefly draws a one-bit approximate full adder based on the memristor. As far as Lena's image processing results are concerned, parameters values like Structural Content (SC), MSE, PSNR, Mean Absolute Error (MAE), and Normalized Absolute Error (NAE) were found low, as all bits were approximated by the worst-case scenario, in contrast to the findings of Almurib et al. [ 106 ].

In recent years, Ronald Tetzlaff et al. proposed a new memristive computing paradigm theory based on CNNs [ 105 ], which laid a theoretical foundation for the realization of the new memristive computing paradigm of Memristor-CNNs(M-CNNs). In this theory, inserting non-volatile memristors, which endowed dynamic arrays with the ability to store data into and retrieve data from resistively switched memristors, reduces the use of additional memory modules. Figure  4 d shows the circuit diagram of the M-CNNs unit. The left side of the circuit is an independent power supply, and the original linear resistance in the circuit is replaced by a non-volatile memristor. Figure  4 e is the binary raw input image and the encoded image stored in the memristor, which is the complement of the original image. Images can be encoded in the memristor cellular array, which opens a new direction for future image processing. Nevertheless, the proposed framework relies on the combination of memristor and logic operation algorithms more than the system program of the entire memristor networks, and it is not generalized for the processing of digital images. However, it still provides a referable idea for memristor-based hardware digital image processing.

Image compression

Image compression is an important image processing technique that eliminates redundant data and converts captured data into a manageable size for efficient processing and transmission. Image compression is achieved by reducing spatial redundancy, spectral redundancy, and temporal redundancy between data, and this process requires many matrix operations. Discrete wavelet transform (DWT) and discrete cosine transform (DCT) are widely used for image compression applications. Image compression using DCT is based on the principle of retaining the low-frequency signals which represent the main information content and removing the high-frequency signals denoting the image details and edge information. Hong et al. [ 107 ] proposed a one-dimensional DCT (1D-DCT) and one-dimensional IDCT (1D-IDCT) computational circuit from the angle of analog circuitry. The 2D circuit was designed based on the 1D circuit using parallel processing (Fig.  5 a) and analyzed by simulating the circuit, which has an average computational accuracy of over 98.4%, and the average computation time has been reduced from milliseconds in MATLAB to subtle level. The proposed 2D circuit was applied for image compression. Considering that the complexity of DCT transform is relatively high in actual image compression, the common practice is to divide the image into multiple blocks, then perform DCT transform and inverse transform on the images in each block, followed by combining these blocks to improve the efficiency of the transform. Considering that larger blocks could reduce the effect of image blocks, 8 × 8 blocks are mostly used. It was found that the compression effect was significant at high speed.

figure 5

a Circuit Design of DCT and IDCT [ 107 ]. b Schematic diagram of vector–matrix multiplication operation (VMM). Multiplication is performed by Ohm's law, where the injection current is the product of the voltage applied across the row and the conductance of the crossing cells, and the currents on each column are summed according to Kirchhoff's current law. The total current from each column is converted to a voltage by an impedance amplifier (TIA), which also provides a virtual ground for the column wires [ 108 ]. c The original image used for compression was input into the crossbar for two-dimensional (2D) DCT block by block. The white arrow shows the block processing sequence. The lower image shows a representative image block to be processed [ 108 ]. d Left: Image blocks were converted to voltages applied to the row lines of the crossbar, with neighboring lines having voltage pairs of the same amplitude, representing image pixel intensities, but opposite polarity. Right: Differential DCT wrote to a 128 × 64 array, with a small number of stuck "on" or "off" memristors visibly disrupted by the pattern [ 108 ]. e Images decoded from 2D DCT by software and f experiment. Before decoding, only the frequencies representing the top 15% of the spectral intensity (20:3 compression ratio) were retained [ 108 ]. g (i) Reference image. (ii) Images obtained using the direct mapping in [ 108 , 109 , 110 ]. (iii) Images obtained using the proposed 2D DCT reconstruction [ 111 ]. h Schematic illustration of a proposed physical crossbar array implementation and read circuitry [ 112 ]. i From top to bottom are the initial 50 k-byte image as the simulated input, the intermediate image representing the fuzzy logic level processing and the final 25 k-byte image after mapping back to the binary bitmap [ 112 ]

DCT has a superior performance in terms of energy compression; but the entire calculation process is more complicated, which increases the burden on the calculation process. Compared with DCT, DWT exhibits a higher peak signal-to-noise ratio (PSNR) and faster image compression speed. However, traditional image compression methods, such as JPEG2000, require complex hardware to implement the calculation process. Therefore, directions such as reducing computing energy, required area, and image quality have become research hotspots.

Li et al. proposed a large-scale memristor crossbar switch for analog computing [ 108 , 110 ] to achieve image compression by structuring an array of memristors up to 128 × 64 crossed hafnium oxide (HFO 2 ) memristors [ 59 ] with sufficient accuracy and high-speed energy efficiency to realize analog vector multiplication. The proposed memristor array structure is presented in Fig.  5 b, where the researchers construct a "1T1R" model, i.e., a memristor is integrated into a single piece on top of a metal oxide semiconductor transistor as an access device in each cell, for precisely adjusting the conductance of each memristor in the crossbar. The original compressed image was input into the array for pre-processing (Fig.  5 c), the voltage was corresponding to the conductance value of the memristor, and vector matrix multiplication was performed (Fig.  5 d), comparing the compression effect of software and hardware, as shown in Fig.  5 e, f. The advantage of this framework is that the memristor longitudinal hardware VMM can directly process the analog signal acquired from the sensor, without the additional peripherals such as analog-to-digital converters (ADCs) and consuming additional time and energy. In addition, it can provide threshold gating circuits at considerably lower latency and energy cost, if only specific features need to be detected in the signal. This flexibility, along with low latency and high energy efficiency, makes analog longitudinal computing ideal for diverse edge and IoT computing.

To overcome the drawback that series computations are vulnerable to errors, Zhang et al. [ 111 ] fundamentally rethought how to implement image compression using resistive cross arrays (RCAs). The key idea is to reorganize the computation so that it natively matches the characteristics of the underlying resistive hardware, while the employed spectral optimization technique, quantization optimization technique, and 2D DCT reconstruction technique improve the robustness to errors for high-speed and efficient small-module processing. Meanwhile, simulation results showed that the quality of image processing was significantly improved (Fig.  5 g), while the latency and power were reduced by 21% and 62%, respectively, facilitating the large-scale utilization of RCA with cost reduction requirements.

We compared multiple image compression techniques by two different datasets (Berkeley segmentation dataset and standard dataset) in terms of compressed quality parameters (MSE, PSNR, and SSIM), compression ratio, latency, power, and area as shown in Table 3 . Here, D stands for direct mapping [ 108 , 110 ]. D-P stands for the D method but its implementation is pipelined to maximize the throughput [ 109 ]. R stands for the proposed framework which is applied only with 2D DCT reconstruction. RF stands for the R method extended by spectral optimization. RFQ method is the RF method extended by hardware-friendly quantization. The normalized performance is shown in bold in the table, and according to Eqs. ( 5 ), ( 6 ), ( 7 ), the high image quality is featured by lower MSE, higher PSNR and SSIM. Tabulated results show that the proposed model has an image quality very similar to that of the human eye despite a slightly higher MSE and slightly lower PSNR and SSIM. Compared to previous work in [ 108 , 110 ], the image quality is improved while the latency and power consumption are reduced by 51% and 24% or 3% and 61%, respectively.

Currently, integrated circuits that perform mathematical operations in artificial visual perception and image processing are mainly constructed of traditional digital logic gates. However, Boolean logic operations are not the most optimal alternative for brain computing, given the ambiguity possessed by biological neural networks. Previous studies have synthesized the optoelectronic properties of memristors and used a single optical gated memristor to build logic gates to realize logical OR and logical AND operations, while the important part of the logical NOT operation is missing in these gates, which requires a complicated operation to perform. Dan Berco et al. proposed a programmable photo-memristor gate [ 112 ], and this device can be used for image compression immediately during image acquisition, no additional memory modules are required (Fig.  5 h, i). This design significantly reduces the number of processors and memories and time interactions. The smallest module of the designed structure consisted of two memristors and a resistor, which were used as building blocks in the design and simulation of the matrix multiplication unit by using logical operations (NIMPLY-AND) to form an effective in-situ compression of the image. However, the framework only performed single-channel image processing, and its effectiveness for more complex image processing was not explicit. The photoelectric properties exhibited by memristors in this case provide a new way of thinking for the development of intelligent vision.

Image segmentation

Image segmentation, the division of target regions in an image from other regions as the name implies, is a crucial pre-processing for image recognition and computer vision. Among them, edge detection has become an overwhelming approach to image segmentation due to its distinguishing feature of different gray levels at the boundaries. Edge information is frequently utilized in image analysis, recognition and understanding. Therefore, edge detection and extraction are particularly important in image processing, and this technique is common in medical imaging, face and fingerprint recognition, traffic control systems and so on.

Image edge detection extraction contributes to clinical diagnosis, and to address the shortcomings of traditional medical image fusion algorithms, Zhu et al. [ 34 ] constructed a memristive pulse-coupled neural network (M-PCNN) for medical image processing, and memorized threshold generator circuit is shown in Fig.  6 a. The principle is that when an image is input to the M-PCNN, spiking neurons transmit stimuli to neighboring neurons and impel them to release pulses [ 114 ] to detect grayscale mutations of edges, which in turn enables edge detection. The edges were found clearer and richer obtained by using M-PCNN in medical image edge extraction (Fig.  6 b). In addition, the integration of memristors into PCNNs significantly reduces the size of PCNNs while making the network biologically functional, which may facilitate the development of hardware implementations of neural networks. Although the core of the architecture was M-PCNN, which could simultaneously exploit the properties of specific linear additivity and nonlinear multiplicative coupling, allowing the introduction of a memristor to bring the network closer to a biological neural network, the peripheral circuits are needed to be redesigned for achieving the results of the different image processing, and the impact of the peripheral circuits on each processing method could not be ignored.

figure 6

a Circuit diagram of the proposed M-PCNN structure and the memorized threshold generator circuit [ 34 ]. b Extraction of edges of CT images using different methods, in order from top to bottom from left to right: source image, using LOG operator, using canny operator and using the proposed M-PCNN with different memorizer parameters [ 34 ]. c (i) Flow-based computing with crossbar circuits. (ii) Crossbar design. (iii) Crossbars for edge detection on input-aware pixel pairs (median PSNR = 6 dB) [ 113 ]. d The input grayscale image, the computed edge image and the output edge image obtained via majority-based combination of approximately correct input-aware crossbar outputs, respectively [ 113 ]. e Schematic of the 3D circuits composed of high-density staircase output electrodes (blue) and pillar input electrodes (red). Sideview of 3D row banks and column side showing unique staircase electrodes. Each row bank in the 3D array operates independently [ 36 ]. f Comparison between the hardware and software edge detection of video frames [ 36 ]

Chakraborty, D.'s team [ 113 ] explored the design of a stream-based cross-circuit with approximately correct input perception, producing multiple 8 × 8 cross-switching circuits (Fig. 6c-iii) belonging to two groups, one of which performs approximate edge detection for a specific application subset of the input values, and the other executes threshold-based edge detection for all possible pixel pairs with a high degree of accuracy (~ 85% accuracy). Outputs from the individual crossbars are combined using the majority function to yield the final output image. Figure  6 c-i depicts a flow-based computation using the simple Boolean formula "a AND b" [ 115 ], where the data are added to the two-dimensional array of nanoscale memristors (Boolean operation is performed by adding the data to Fig. 6c-i a and b), and the current passing through the crossbar performs the desired computation. The current goes from the rightmost nanowire to the leftmost nanowire if and only if the formula "a AND b" is true. An example of a cross-switch that realizes the Boolean formula ¬A∧¬B∧¬C is shown in Fig. 6c-ii. Where the green circle indicates a memristor in the ON state, the gray circle indicates a memristor in the OFF state, and the blue circle assumes the value of literals. The team tested the edge extraction performance of the architecture on the BSD500 database [ 116 ] and utilized the PSNR metric to assess the quality of the output image, showing that the results (Fig.  6 d) obtained from the input-aware approximation computation were significantly better than those produced by the more accurate general-purpose crossover. The cross array used for approximate computation, although effective in terms of accuracy and overall quality of edge extraction, adopts standard peripheral circuits and lacks exploration of the effect of peripheral circuits, which have an impact on the overall performance of the method in terms of efficiency and correctness.

Notably, most of the existing memristive systems are based on 2D arrays. As the units are only connected horizontally and vertically, such a 2D design sometimes cannot meet the complex topology of CNNs. Li et al. designed a 3D memristor circuit with a complex neural network shown in Fig.  6 e [ 36 ], successfully extracted fine edge features using a 3D array and again obtained comparable results between software and hardware implementations of kernel operations (Fig.  6 f). We found that despite the variability inherent in memristors, the actual processing results are comparable to software, while having pixel-level parallelism. The 3D array can be further expanded for parallel processing between different pixels, channels and filters over multiple convolutional layers. Compared to its 2D counterpart, this structure can conduct all computations in real-time and can be kernel vertically integrated directly to the 2D image sensor array, providing a significant speedup when running complex neural network models. This promises its application to cloud edge processors in IoT networks.

The randomness of ion transport in traditional oxide-based memristors introduces variability to the system, which makes it challenging for CNNs to operate in memristor arrays, affecting the learning accuracy. To overcome this challenge, researchers have developed new structural memristors such as 2D and 3D. Li et al. proposed a 2D heterostructure memristor array [ 37 ] due to the unique physical properties of 2D materials, 2D material memristors exhibit better scalability. The team confirmed that the nine memristors in the 3 × 3 cross-array (Fig.  7 a) were able to achieve a uniform and consistent five-state map by adjusting the compliance current. The intensity of the original image was converted to voltage and input to the memory array for convolution calculation to extract the edges of the image, as shown in Fig.  7 c, where in the hardware processing results are similar to the software processing. Figure  7 b also shows other image processing results, such as Gaussian softening, sharpening, and embossing. These demonstrate the potential of CNN operating in a diaphragm array.

figure 7

Convolutional image processing implemented using the PdSeOx/PdSe2 memristor crossbar array. a The whole process of convolution image processing using the memristor longitudinal array. b Results of image processing in five states implemented by adjusting the compliance current, mapping the weights of [-4 to 4]. c Hardware and software processed vertical and horizontal edge extractions. The Prewitt kennels are for horizontal and vertical edge detections [ 37 ]

Image enhancement and restoration

Enhancement and fusion.

Image enhancement processing is a major branch of digital image processing. Many images are usually captured with poor visual effects because of the environment and other conditions, which requires image enhancement techniques to improve human visual effects, for example, extracting characteristic parameters of target objects from digital images, highlighting certain features of target objects in images, etc., which are beneficial to the tracking, recognition and understanding of targets in images. Gradually, image enhancement technology has been involved in various aspects of human life and social production, such as the aerospace field, biomedical field, industrial production, public safety field and so on. To obtain good performance, some traditional image enhancement algorithms, such as the de-hashing algorithms of Tan and Oakley [ 117 ], Tan [ 118 ], He et al. [ 119 ], Tarel et al. [ 120 ], Nishino et al. [ 121 ], Meng et al. [ 122 ], and Sulami et al. [ 123 ], must pay the cost of a relatively heavy computational burden.

To avoid the maximum possible complex calculations generated by image enhancement, Zhu et al. [ 31 ] introduced memristor arrays into the image enhancement algorithm to subtly process images twice by the inherent properties of memristors. The algorithm uses a coarse transmission map and nonlinear memristor property with high efficiency, greatly reducing the computational cost, and the image quality evaluation reveals that it maintains a comparable performance with the classical algorithm (Fig.  8 a). In addition, it was found that memristor-based image enhancement (MIEA) is more efficient than the classical algorithm in computing complexity and fine transmission map speeds. It takes 0.047 s on an Intel i7-9700 K CPU (14 nm), which is 90% less execution time than the 0.542 s in [ 119 ]. In the article [ 39 , 48 ], the computational energy efficiency exceeds that of today's graphics processing units by two magnitudes. The presented processing fully exploits the memristor feature, but essentially the whole framework remains algorithmic. It is not considered to be a complete hardware-based digital image processing because the image pre-processing using memristors is only one step in the whole structure.

figure 8

a MIEA overview: Red route: fine-tuning of the memristor crossover array using a rough image; Blue route: second fine-tuning of the architecture based on the original image. The final image was derived from current normalization [ 31 ]. b Structure of the device and 32 × 128 fabricated memristor array [ 32 ]. c Memristive array hardware system applied to image processing. (i) The origin DCT matrix; (ii) Array read current after processing by origin DCT matrix; (iii) the programming error matrix of (ii) [ 32 ]. d Specific image processing flow with Ag 2 S memristor arrays: encoding 3 × 3 convolution kernel values as array inputs, recording post-synaptic currents from the bottom electrodes after multiply-accumulate computation (MAC) operation as outputs, and mapping them to image grayscale values [ 33 ]. e Group 1: Sharpening operation. (i) Result of software-based simulation, hardware outputs of filament-type memristor (FTM)(ii) and interface-type memristor (ITM)(iii). Group 2: Softening operation: (iv) Software simulation result, hardware outcomes of FTM (v) and ITM (vi) [ 33 ]. f The fusing structure of NSCT-based M-PCNN [ 34 ]. g Source image: left: CT image, right: MRT image [ 34 ]. h Comparison of M-PCNN (i) and PCNN (ii) fusion results. From left to right are the fused images, the difference between the fused image and the CT image, and the difference between the fused image and the MRT image [ 34 ]

Later, Zhang et al. [ 32 ] proposed an array-level enhancement method that uses flexible combinations of multiple arrays to handle different layers of varying accuracy importance. 4096 1T1R cells, arranged as 32 × 128, were fabricated, as shown in Fig.  8 b. This memristor array demonstrates the multi-level characteristics of the measurement. The size of the discrete cosine variation matrix was matched to the size of the memristor array, each element in the matrix mapped to the array, the voltages were input to the rows and the output results were generated by accumulating the currents in each row. Comparing the original discrete matrix with the current matrix produces the programming error (Fig.  8 c). They stored the transformation matrix in the array and the performance of the image processing is sensitive to changes, which suggests that the array-level enhancement method can reduce the programmed multi-level data changes.

Zhu et al. [ 33 ] applied Ag 2 S flexible memristors to digital image sharpening and blurring, subtly adopting the switching mechanism of the device's different interface resistors. The processing principle was that original image pixel values (from 0 to 255) were linearly mapped to read voltages (amplitude from 0 to 25.5 mV), two convolution kernel values were mapped into a cross-array (Fig.  8 d) to modulate the conductance value of the device, and the structure shared a common bottom electrode so that two types of currents could be collected—the current generated by current summation and the output current generated by the multiplication of voltage and conductance. The grayscale image of the final processed result (Fig.  8 e software (i and iv) and hardware processing results (ii, iii, v, and vi)) was generated by output currents. Apparently, after the sharpening operation, the outline of the "horse" was clearer, while the softening operation blurred the outline of the "horse" and its surroundings. However, this method inevitably applies convolutional operations in the processing, which increases the computational complexity and requires multiple iterations of training, consuming more space in the case of larger convolutional kernels.

Image fusion, another level of image enhancement, refers to the extraction of the image data about the same target acquired by multiple source channels through image processing and computer technology, and so forth, to maximize the favorable information in the respective channels and finally synthesize them into a high-quality image, reducing the uncertainty and redundancy of the output. The research of image fusion technology is on the rise, and the application fields are spread over remote-sensing image processing, visible light image processing, medical image processing, infrared image processing, and so on. Zhu et al. [ 34 ] fused CT and MRI images by using M-PCNN, whose structure is shown in Fig.  8 f. The fusion process consists of the following four steps: first, to avoid the blending phenomenon of low-frequency subbands and thus overcome the pseudo-Gibbs phenomenon [ 124 ], the CT images and MRI were sampled by using the non-sampled contour transform (NSCT); second, stimulating M-PCNN neurons by utilizing the spatial frequency of NSCT transform domain coefficients. Again, the coefficients, with a high ignition number as the coefficients of the fused images. Finally, the new images were fused using the inverse NSCT algorithm after fusing the high and low frequency coefficients of the two images by the two-channel M-PCNN. The results of image fusion of the original image (Fig.  8 g) using PCNN and M-PCNN are shown in Fig.  8 h. The subjective visual comparison shows that M-PCNN for medical image fusion has a superior fusion performance, from which more detailed information can also be obtained.

Restoration

Similar to image enhancement, the restoration of images also targets at improving the overall quality of the image. Instead of image enhancement techniques, which focus on increasing the contrast and processing the image according to the receiver's preference, image restoration focus on removing the blurred parts of the image and repairing or reconstructing the degraded image, which can be considered as the reverse process of image degradation.

Considering that the noise generated during medical imaging degrades the image quality and blurs the observed tissue boundaries, which affects medical diagnosis, it is important to remove the noise while preserving the boundary and structural information. Zhu et al. [ 34 ] used the proposed M-PCNN structure for image denoising, where each neuron is connected to the corresponding pixel point and is also to the adjacent 3 × 3 neurons [ 125 ]. In most cases, the correlation between pixel values of noise and surrounding is weak and significantly different, and neurons mainly have two states: stimulated and unstimulated. The main principle of image denoising using M-PCNN is to judge and distinguish the noise, i.e., to judge whether each neuron and its neighboring neurons are stimulated or not. The obtained result adjusts the brightness of the corresponding pixel values for the purpose of noise reduction and image recovery. M-PCNN was proved to be a more suitable method for removing pretzel noise and retaining edge information, as seen through comparison (Fig.  9 a).

figure 9

a M-PCNN for medical image denoising. In order, the source CT images, images with salt and pepper noise, images processed by median filter denoising, images denoised by averaging filtering, and images denoised by M-PCNN [ 34 ]. b New LPF based on memristor bridge [ 126 ]. c Testing images and filter results. i) Standard clean images. ii) Image with white Gaussian noise. iii) Denoise by proposed adaptive Gaussian filter [ 126 ]. d Illustration of a neuromorphic visual system utilizing plasma photoelectric memristors for visual sensing, low-level image pre-processing, and high-level image processing (i.e., recognition) [ 35 ]. e Images comprising the ideal image (i), the real image with 10% random noise (ii) and the noise-reduced image (iii) after pre-processing. Image taken from the Yale Face Database B [ 127 ] [ 35 ]

Suppressing the high-frequency signal while passing the low-frequency signal is the main function of the low-pass filter (LPF), an important part of image denoising [ 128 ]. Yongbin et al. [ 126 ] designed a new LPF based on memristor bridge circuit described in [ 129 ], consisting of four identical memristors that can perform zero, positive and negative synaptic weightings (Fig.  9 b). They discussed the memristor bridge-based LPF with its cutoff frequency varying over time and found a way to design a memristive Gaussian filter and its application to image processing inspired by the memristive filter and its typical characteristic. The principle is that the adaptive Gaussian kernel will change in different situations and the character of their LPF with variable cutoff frequency, which can be combined with the Gaussian filter to denoise the image. The researcher added Gaussian white noise with standard deviation σ = 10/255 to the 512 × 512-pixel image, and the Gaussian template size was set to 11 × 11. Then, the Gaussian filter proposed in the literature [ 130 ] and designed adaptive Gaussian filter were respectively used to denoise these noisy images, and the filtering results are shown in Fig.  9 c. The designed filter circuit combined the variable parameters based on the memristor bridge with Gaussian filters, which provided a new idea for the image-filtering algorithm. However, the problem of the memristor being only a piece of the puzzle rather than the overall system architecture remains, and the purpose of hardware implementation is not completely reached.

Recently, the authors developed a plasmonic photomemory in Ag-TiO 2 nanocomposite films that relies on optical excitation and the effects of localized surface plasmon resonance [ 35 ]. Such a device can integrate visual perception, low-level image pre-processing (including noise reduction and contrast enhancement), and high-level image processing functions (Fig.  9 d). (Fig.  9 d). They utilized an 80 × 80 phototransistor array to construct a neuromorphic vision system that comprised of two components. The preprocessed images were fed into an artificial neural network with two layers of nerves based on photomemristors to implement image learning and recognition (Fig. 9d-ii), and the low-level preprocessed images were obtained (Fig.  9 e). The images show that the background noise was further smoothed after noise reduction.

Image recognition and classification

Image recognition, one of the mechanisms of computer vision, is based on the main features of an image and is a technique that analyzes the original image overall to reach a prediction of the category that it belongs to. In human image recognition, it is necessary to exclude redundant information from the input, extract the key information and integrate the information obtained at each stage to obtain a complete impression. The image recognition process is similar to it. One of the most important models for image recognition is the convolutional neural networks, but it has not yet been fully hardwared through memristor crossbars [ 131 ], an array of crossbars with memristor devices at each intersection. In addition, it is extremely challenging to achieve software-equivalent results because of its high variability, low device throughput, and other non-ideal characteristics [ 132 , 133 , 134 , 135 ].

Chu et al. [ 38 ] designed a visual pattern recognition neuromorphic hardware system (Fig.  10 a), which consists of an artificial photoreceptor, a PCMO-based memristor array and CMOS neurons. Among them, an artificial photoreceptor converted images into input voltage pulses, memristor arrays [ 136 ] were used for synaptic connections, and leaky integrate-and-fire (I&F) neurons serve as output neurons. An improved spike-timing dependent plasticity algorithm was proposed for accordingly adjusting the memristor states or synaptic weights during system training. The system operated on the principle that when one of the output neurons integrated the current flowing through the memristor, it would reach a certain threshold earlier than the other neurons. An inhibitory signal from the discharged neuron would freeze all neurons and reset their internal state so that the recognition process could be restarted with the next test image. The system has been successfully trained and recognized digital images from 0 ~ 9, and random noise was added before applying the training images to the system for recognition, after which it was found that as the noise level increases, the recognition rate decreases correspondingly, for example, to 85% at 10% noise level. Adjusting the resistive state or synaptic weights of the memristors with algorithms during the training of the system, however, invariably increases the training time and cost. More attention needs to be paid to this point in the future implementation of memristor-based systems for hardware digital image processing as well.

figure 10

a Neuromorphic system for visual pattern recognition [ 38 ]. b Architecture of the simulated memristor-based neural processing unit [ 39 ]. c Schematic illustration of the high-level in-sensor computing by employing the sensing memristor as a synapse to implement weight updating [ 40 ]. d Output image after 60,000 training epochs for the sensing memristor under different RH levels [ 40 ]

Then, Yao et al. [ 39 ] fabricated a memristor cross-array implementing CNNs that integrated eight 2048 one-transistor-one-memristor (1T1R) arrays and constructed a complete memristor-based five-layer CNN for MNIST image recognition [ 137 ], with an experimental correct recognition rate of 96.19% using this hybrid training scheme on the entire test dataset. In addition, the convolution kernel was replicated to three parallel memristor convolvers to reduce the mCNN latency by roughly one-third. In addition, the memristor-based CNN neuromorphic system (Fig.  10 b) was shown to be is more energy efficient than state-of-the-art graphics processing units by over two orders of magnitude, and its highly integrated neuromorphic system provides a feasible solution. While improving CNN computational efficiency, it is expected to provide a plausible non-Von Neumann hardware solution for edge computing and deep neural networks, as well as being memristor-based. Given that present research on the neuromorphological aspects of memristors has mainly focused on single sensory processing such as vision, hearing, smell, and touch, whereas the human perceptual system can perceive and process diverse types of information simultaneously in complex environments at the same time. Wang et al. proposed an MXene-ZnO-based multimodal flexible sensing memristor that combines visual data sensing, relative humidity (RH) sensing, and pre-processing functions [ 40 ]. In the simulation, single-layer perception (SLP) consists of 28 × 28 neurons (785 input neurons), ten output neurons (ten categories, from 0 to 9), and fully connected 785 × 10 synaptic weights. Presynaptic neurons perceived inputs of 28 × 28 MNIST digits and transmitted them as synaptic forward potentials. The synaptic weights were modulated by humidity modulation and light modulation, to which the post-synaptic neurons then responded and performed the perceptual task (Fig.  10 c). The researchers found that the artificial vision network trained at 60% RH captured more features from selected alphabets compared to low/high humidity (Fig.  10 d) and could achieve high recognition accuracy after 60,000 training cycles. These results show that multimodal sensing memristors can be applied to both low and high-level sensor computing by reducing power consumption and chip area.

Memristor, a new type of non-volatile storage element, is expected to constitute a new type of image processor to accelerate the process of image processing. Applying the memristor network to image processing has the following advantages: from the device level, firstly, the memristor, with its non-volatility (the stored information is still maintained after the power is cut-off), can be used to store the model parameters and the intermediate results, which can be applied to realize a more efficient data transfer and persistence in image processing tasks. For image processing tasks that require real-time performance, such as real-time target detection and tracking, memristors possess very fast response speeds, and therefore memristor networks can realize high-speed image processing and analysis. In addition, because of their particular properties, memristor networks can better mimic the synaptic connections in the biological nervous system, which can be used in image processing to achieve a process that is more similar to that of the human visual system. On an architectural level, for image processing in mobile devices and embedded systems, the low energy consumption properties of memristors can contribute to prolonging the battery life of the devices. For robust image processing in noisy environments, the nonlinear characteristics of the memristor can be used for noise suppression and filtering, and it is tolerant to noise in the input data. Additionally, the structure and characteristics of the memristor networks can support massively parallel computation for processing tasks that require dealing with a large amount of data, thus accelerating the image processing task.

Despite the advantages of a system framework for image processing via memristor networks, there are still several challenges in implementing hardware applications of digital image processing in neuromorphic systems. Due to the nonlinear properties of the memristors, their incorporation into the Modified Nodal Analysis (MNA) structure results in the system of circuit equations becoming a nonlinear system of equations and increasing the complexity of the solution. Solving the nonlinear equation system usually requires the use of numerical computation methods, such as Newton's method and the Quasi-Newton Methods, which consume more computational resources and time as compared to the methods for solving the linear equation system. In addition, the memristor's memristor value may vary with time and operating state, which will increase the complexity of analyzing the dynamic behavior of the circuit. For the I–V characteristics of memristors, it may be necessary to use more precise and complicated models for modeling and matching, which also increases the challenge of circuit analysis. Therefore, it is necessary to comprehensively consider the computational resources and time, the accuracy and efficiency of circuit analysis when designing a large-scale memristor network.

In the former study, we found that in the hardware implementation of neural networks, the offline training approach requires the use of computer-assisted training of the neural network to obtain the weight update values, and then adjust the resistance values of the memristor arrays, and such frequent hardware and software data interactions do not provide real-time weight updates. However, as the scale of the number of cells in the neural network increases, for example, when a cellular neural network needs to be utilized to process a large-size image, the circuit structure of the neural network becomes complicated, which can cause inconvenience in updating the weight templates. Therefore, it would be more practical to transfer the training algorithm from software to hardware and to be able to update the resistance value of the memristor in real-time.

Meanwhile, how to fabricate networks on a large scale with memristor devices and how to determine the relationship between the size of that network and the actual image pixel size. The current use of memristor networks for slightly larger pixel digital image processing almost always slices the image, corresponding to memristor networks as small arrays, which will generate a new problem, no connection between individual blocks, meaning that the whole image cannot be processed. Thus, how to link the impact of a single pixel in a digital image to the surrounding pixels for better performance must also be addressed in subsequent studies of this system architecture. To this end, we can consider the interactions that exist between the memristors and apply them in digital image processing, which can reduce the network computational difficulty and the influence of peripheral circuits. When using the memristor system network to process pictures, how to effectively input the data into the array on a large scale and realize the integration and high efficiency of the whole memristor-based digital image processing network is also a problem that needs to be solved in the subsequent research. To tackle this challenge, the design of a batch write and read module is an optimal choice; however, the design simulation and debugging of the read and write module in SPICE is also a considerable amount of work. Currently, researchers have a long way to go in the field of memristor network-based digital image processing.

We are convinced that the memristor network for image processing capability would also be enhanced with the constant advancement of memristor technology and the future tendency of memristor networks in image processing may be the following. First, memristors could rapidly process and analyze large-scale image data because of their efficient data storage and processing capabilities in image processing, which may also be extensively applied in real-time image processing, such as video processing, machine vision, etc. Memristors feature low power consumption and high energy efficiency, which can be better integrated with other low-power image processing technologies, like deep learning, to achieve highly efficient and energy-saving performance. In addition, memristors can be incorporated with deep learning technologies and other technologies to realize more high-efficiency artificial intelligence algorithms and be broadly used in image recognition, autonomous driving, smart home and other domains. The study of the experimental and physical mechanism theory of memristors will lay a firm foundation for their application in the field of digital image processing. We believe that experimental optimization of the memristor fabrication process, combined with theoretical device simulation tools, could provide an operational non-Von Neumann hardware solution for hardware image processing and neuromorphic networks.

The conventional von Neumann computer architecture has become gradually less efficient as the demand for high-performance computing in the domain of digital image processing grows. This paper, on the premise of reviewing the commonly employed classical algorithms for digital image processing, reviewed a novel hardware architecture of memristor-based networks that is different from the algorithms. Its principle is based on the fact that the properties of the memristor device itself simulate the mechanism of retinal operation in the human visual system, i.e., the augmentation and inhibition between neurons. Subsequently, we elaborated on the properties of the memristors used in digital image processing, and their possible physical behavioral mechanisms. In addition, we detailed the state-of-the-art applications of memristor networks in digital image processing, involving image logic operations, image compression, image segmentation, image enhancement and restoration, and image classification and recognition. Compared with conventional digital image processing architectures, networks based on memristors, a key component in artificial visual systems, have the advantages of lower power consumption, smaller size, higher integration, and better flexibility. In addition, image processing networks constructed by introducing memristors feature faster processing times, higher efficiency, and better reliability than pure software algorithm architectures alone. This review guides the study of the application of using memristors and their neuromorphic properties in hardware digital image processing.

Data availability

Data can be obtained from authors under reasonable request.

Code availability

Not applicable.

Xie SJ, et al. Intensity variation normalization for finger vein recognition using guided filter based singe scale retinex. Sensors. 2015;15(7):17089–105. https://doi.org/10.3390/s150717089 .

Article   Google Scholar  

Rahman Z-u, Jobson DJ, Woodell GA. Multi-scale retinex for color image enhancement. Paper presented at the Proceedings of 3rd IEEE international conference on image processing; 1996. https://doi.org/10.1109/ICIP.1996.560995 .

Kanopoulos N, Vasanthavada N, Baker RL. Design of an image edge detection filter using the Sobel operator. IEEE J Solid-State Circuits. 1988;23(2):358–67. https://doi.org/10.1109/4.996 .

Yuan L, Xu X. Adaptive image edge detection algorithm based on canny operator. In: Paper presented at the 2015 4th international conference on Advanced Information Technology and Sensor Application (AITS); 2015. https://doi.org/10.1109/AITS.2015.14 .

Deng G, Cahill L. An adaptive Gaussian filter for noise reduction and edge detection. Paper presented at the 1993 IEEE conference record nuclear science symposium and medical imaging conference; 1993. https://doi.org/10.1109/NSSMIC.1993.373563 .

Camilleri P, et al. A Neuromorphic aVLSI network chip with configurable plastic synapses. Paper presented at the 7th International Conference on Hybrid Intelligent Systems (HIS 2007); 2007. https://doi.org/10.1109/HIS.2007.60 .

Partzsch J, Schuffny R. Analyzing the scaling of connectivity in neuromorphic hardware and in models of neural networks. IEEE Trans Neural Netw. 2011;22(6):919–35. https://doi.org/10.1109/TNN.2011.2134109 .

Tian H, et al. A novel artificial synapse with dual modes using bilayer graphene as the bottom electrode. Nanoscale. 2017;9(27):9275–83. https://doi.org/10.1039/C7NR03106H .

Article   CAS   Google Scholar  

Du C, et al. Biorealistic implementation of synaptic functions with oxide memristors through internal ionic dynamics. Adv Func Mater. 2015;25(27):4290–9. https://doi.org/10.1002/adfm.201501427 .

Wu L, et al. Emulation of biphasic plasticity in retinal electrical synapses for light-adaptive pattern pre-processing. Nanoscale. 2021;13(6):3483–92. https://doi.org/10.1039/D0NR08012H .

Wang Y, et al. Data-driven deep learning for automatic modulation recognition in cognitive radios. IEEE Trans Veh Technol. 2019;68(4):4074–7. https://doi.org/10.1109/TVT.2019.2900460 .

Egmont-Petersen M, de Ridder D, Handels H. Image processing with neural networks—a review. Pattern Recogn. 2002;35(10):2279–301. https://doi.org/10.1016/S0031-3203(01)00178-9 .

Shi BE, Chua LO. Resistive grid image filtering: input/output analysis via the CNN framework. IEEE Trans Circuits Syst I Fundam Theory Appl. 1992;39(7):531–48. https://doi.org/10.1109/81.257286 .

Kinget P, Steyaert MS. A programmable analog cellular neural network CMOS chip for high speed image processing. IEEE J Solid-State Circuits. 1995;30(3):235–43. https://doi.org/10.1109/4.364437 .

Li H, et al. Edge detection of noisy images based on cellular neural networks. Commun Nonlinear Sci Numer Simul. 2011;16(9):3746–59. https://doi.org/10.1016/j.cnsns.2010.12.017 .

Aizenberg IN. Processing of noisy and small-detailed gray-scale images using cellular neural networks. J Electron Imaging. 1997;6(3):272–85. https://doi.org/10.1117/12.269905 .

Johnson JL, Ritter D. Observation of periodic waves in a pulse-coupled neural network. Opt Lett. 1993;18(15):1253–5. https://doi.org/10.1364/OL.18.001253 .

Johnson JL. Pulse-coupled neural networks. Paper presented at the adaptive computing: mathematics, electronics, and optics: a critical review; 1994. https://doi.org/10.1117/12.171194 .

Johnson JL. Pulse-coupled neural nets: translation, rotation, scale, distortion, and intensity signal invariance for images. Appl Opt. 1994;33(26):6239–53. https://doi.org/10.1364/AO.33.006239 .

Johnson JL. Time signatures of images. Paper presented at the proceedings of 1994 IEEE international conference on neural networks (ICNN'94); 1994. https://doi.org/10.1109/ICNN.1994.374368 .

Kinser JM, Johnson JL. Object isolation. Opt Mem Neural Netw. 1996;5:137–46.

Google Scholar  

Kinser JM, Johnson JL. Stabilized input with a feedback pulse-coupled neural network. Opt Eng. 1996;35(8):2158–61. https://doi.org/10.1117/1.600797 .

Kinser JM. Simplified pulse-coupled neural network. Paper presented at the applications and science of artificial neural networks II; 1996. https://doi.org/10.1117/12.235951 .

Zhan K, et al. Computational mechanisms of pulse-coupled neural networks: a comprehensive review. Arch Comput Methods Eng. 2017;24(3):573–88. https://doi.org/10.1007/s11831-016-9182-3 .

Huang W, Jing Z. Multi-focus image fusion using pulse coupled neural network. Pattern Recogn Lett. 2007;28(9):1123–32. https://doi.org/10.1016/j.patrec.2007.01.013 .

Wang Z, Wang S, Guo L. Novel multi-focus image fusion based on PCNN and random walks. Neural Comput Appl. 2018;29(11):1101–14. https://doi.org/10.1007/s00521-016-2633-9 .

Fu J, et al. Image segmentation by EM-based adaptive pulse coupled neural networks in brain magnetic resonance imaging. Comput Med Imaging Graph. 2010;34(4):308–20. https://doi.org/10.1016/j.compmedimag.2009.12.002 .

Wang M, et al. Medical images segmentation based on improved three-dimensional pulse coupled neural network. Int J Wirel Mob Comput. 2017;13(1):72–7. https://doi.org/10.1504/IJWMC.2017.087358 .

Chen Y, et al. Region-based object recognition by color segmentation using a simplified PCNN. IEEE Trans Neural Netw Learn Syst. 2014;26(8):1682–97. https://doi.org/10.1109/TNNLS.2014.2351418 .

Ni Q, Gu X. Video attention saliency mapping using pulse coupled neural network and optical flow. Paper presented at the 2014 International joint conference on neural networks (IJCNN); 2014. https://doi.org/10.1109/IJCNN.2014.6889424 .

Zhu R, et al. Memristor-based image enhancement: high efficiency and robustness. IEEE Trans Electron Devices. 2020;68(2):602–9. https://doi.org/10.1109/TED.2020.3045684 .

Zhang W, et al. Array-level boosting method with spatial extended allocation to improve the accuracy of memristor based computing-in-memory chips. Sci China Inf Sci. 2021;64(6):160–406. https://doi.org/10.1007/s11432-020-3198-9 .

Zhu Y, et al. Full-inorganic flexible Ag2S memristor with interface resistance-switching for energy-efficient computing. ACS Appl Mater Interfaces. 2022;14(38):43482–9. https://doi.org/10.1021/acsami.2c11183 .

Zhu S, Wang L, Duan S. Memristive pulse coupled neural network with applications in medical image processing. Neurocomputing. 2017;227:149–57. https://doi.org/10.1016/j.neucom.2016.07.068 .

Shan X, et al. Plasmonic optoelectronic memristor enabling fully light-modulated synaptic plasticity for neuromorphic vision. Adv Sci. 2022;9(6):2104632. https://doi.org/10.1002/advs.202104632 .

Lin P, et al. Three-dimensional memristor circuits as complex neural networks. Nat Electron. 2020;3(4):225–32. https://doi.org/10.1038/s41928-020-0397-9 .

Li Y, et al. In-memory computing using memristor arrays with ultrathin 2D PdSeO x /PdSe 2 heterostructure. Adv Mater. 2022. https://doi.org/10.1002/adma.202201488 .

Chu M, et al. Neuromorphic hardware system for visual pattern recognition with memristor array and CMOS neuron. IEEE Trans Industr Electron. 2014;62(4):2410–9. https://doi.org/10.1109/TIE.2014.2356439 .

Yao P, et al. Fully hardware-implemented memristor convolutional neural network. Nature. 2020;577(7792):641–6. https://doi.org/10.1038/s41586-020-1942-4 .

Wang Y, et al. MXene-ZnO memristor for multimodal in-sensor computing. Adv Func Mater. 2021;31(21):2100144. https://doi.org/10.1002/adfm.202100144 .

Choi C, et al. Human eye-inspired soft optoelectronic device using high-density MoS2-graphene curved image sensor array. Nat Commun. 2017;8(1):1–11. https://doi.org/10.1038/s41467-017-01824-6 .

Lee W, et al. High-resolution spin-on-patterning of perovskite thin films for a multiplexed image sensor array. Adv Mater. 2017;29(40):1702902. https://doi.org/10.1002/adma.201702902 .

Zhou F, Chai Y. Near-sensor and in-sensor computing. Nat Electron. 2020;3(11):664–71. https://doi.org/10.1038/s41928-020-00501-9 .

Roska T. Analogic CNN computing: architectural, implementation, and algorithmic advances—a review. Paper presented at the 1998 Fifth IEEE international workshop on cellular neural networks and their applications. Proceedings (Cat. No. 98TH8359); 1998. https://doi.org/10.1109/CNNA.1998.685320

Chua L. Memristor-the missing circuit element. IEEE Trans Circuit Theory. 1971;18(5):507–19. https://doi.org/10.1109/TCT.1971.1083337 .

Strukov DB, et al. The missing memristor found. Nature. 2008;453(7191):80–3. https://doi.org/10.1038/nature06932 .

Yu S. Neuro-inspired computing with emerging nonvolatile memorys. Proc IEEE. 2018;106(2):260–85. https://doi.org/10.1109/JPROC.2018.2790840 .

Ambrogio S, et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature. 2018;558(7708):60–7. https://doi.org/10.1038/s41586-018-0180-5 .

Xue W, et al. Optoelectronic memristor for neuromorphic computing. Chin Phys B. 2020;29(4):048401. https://doi.org/10.1088/1674-1056/ab75da .

Lu W, et al. A scanning probe microscopy based assay for single-walled carbon nanotube metallicity. Nano Lett. 2009;9(4):1668–72. https://doi.org/10.1021/nl900194j .

Yang JJ, et al. Memristive switching mechanism for metal/oxide/metal nanodevices. Nat Nanotechnol. 2008;3(7):429–33. https://doi.org/10.1038/nnano.2008.160 .

Jo SH, Kim K-H, Lu W. High-density crossbar arrays based on a Si memristive system. Nano Lett. 2009;9(2):870–4. https://doi.org/10.1021/nl8037689 .

Afifi A, Ayatollahi A, Raissi F. Implementation of biologically plausible spiking neural network models on the memristor crossbar-based CMOS/nano circuits. Paper presented at the 2009 European conference on circuit theory and design; 2009. https://doi.org/10.1109/ECCTD.2009.5275035 .

Chua LO, Kang SM. Memristive devices and systems. Proc IEEE. 1976;64(2):209–23. https://doi.org/10.1109/PROC.1976.10092 .

Chua LO. The fourth element. Proc IEEE. 2012;100(6):1920–7. https://doi.org/10.1109/JPROC.2012.2190814 .

Chua L. Memristor, Hodgkin-Huxley, and edge of chaos. Nanotechnology. 2013;24(38):383001. https://doi.org/10.1088/0957-4484/24/38/383001 .

Chua L. If it’s pinched it’sa memristor. Semicond Sci Technol. 2014;29(10):104001. https://doi.org/10.1088/0268-1242/29/10/104001 .

Jo SH, et al. Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett. 2010;10(4):1297–301. https://doi.org/10.1021/nl904092h .

Jiang H, et al. Sub-10 nm Ta channel responsible for superior performance of a HfO 2 memristor. Sci Rep. 2016;6(1):28525. https://doi.org/10.1038/srep28525 .

Lu W, Lieber CM. Nanoelectronics from the bottom up. Nat Mater. 2007;6(11):841–50. https://doi.org/10.1038/nmat2028 .

Waser R, Aono M. Nanoionics-based resistive switching memories. Nat Mater. 2007;6(11):833–40. https://doi.org/10.1038/nmat2023 .

Jo SH, Lu W. CMOS compatible nanoscale nonvolatile resistance switching memory. Nano Lett. 2008;8(2):392–7. https://doi.org/10.1021/nl073225h .

Jo SH, Kim K-H, Lu W. Programmable resistance switching in nanoscale two-terminal devices. Nano Lett. 2009;9(1):496–500. https://doi.org/10.1021/nl803669s .

Liu M, et al. Multilevel resistive switching with ionic and metallic filaments. Appl Phys Lett. 2009;94(23):233106. https://doi.org/10.1063/1.3151822 .

Pan F, et al. Recent progress in resistive random access memories: materials, switching mechanisms, and performance. Mater Sci Eng R Rep. 2014;83:1–59. https://doi.org/10.1016/j.mser.2014.06.002 .

Mazzio KA, Luscombe CK. The future of organic photovoltaics. Chem Soc Rev. 2015;44(1):78–90. https://doi.org/10.1039/C4CS00227J .

Xiao Z, et al. Giant switchable photovoltaic effect in organometal trihalide perovskite devices. Nat Mater. 2015;14(2):193–8. https://doi.org/10.1038/nmat4150 .

Tasdelen MA, Yagci Y. Light-induced click reactions. Angew Chem Int Ed. 2013;52(23):5930–8. https://doi.org/10.1002/anie.201208741 .

Wang H, et al. A ferroelectric/electrochemical modulated organic synapse for ultraflexible, artificial visual-perception system. Adv Mater. 2018;30(46):1803961. https://doi.org/10.1002/adma.201803961 .

He HK, et al. Photonic potentiation and electric habituation in ultrathin memristive synapses based on monolayer MoS2. Small. 2018;14(15):1800079. https://doi.org/10.1002/smll.201800079 .

Berco D, Ang DS, Zhang HZJAIS. An optoneuronic device with realistic retinal expressions for bioinspired machine vision. Adv Intell Syst. 2020;2(2):1900115. https://doi.org/10.1002/aisy.201900115 .

Mennel L, et al. Ultrafast machine vision with 2D material neural network image sensors. Nature. 2020;579(7797):62–6. https://doi.org/10.1038/s41586-020-2038-x .

Lee M-J, et al. A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta 2 O 5 − x /TaO 2 − x bilayer structures. Nat Mater. 2011;10(8):625–30. https://doi.org/10.1038/nmat3070 .

Govoreanu B, et al. High-performance metal-insulator-metal tunnel diode selectors. IEEE Electron Device Lett. 2013;35(1):63–5. https://doi.org/10.1109/LED.2013.2291911 .

Choi BJ, et al. Electrical performance and scalability of Pt dispersed SiO 2 nanometallic resistance switch. Nano Lett. 2013;13(7):3213–7. https://doi.org/10.1021/nl401283q .

Wu S, et al. Bipolar resistance switching in transparent ITO/LaAlO 3 /SrTiO 3 memristors. ACS Appl Mater Interfaces. 2014;6(11):8575–9. https://doi.org/10.1021/am501387w .

Jang J-W, et al. Optimization of conductance change in Pr 1 − x Ca x MnO 3 -based synaptic devices for neuromorphic systems. IEEE Electron Device Lett. 2015;36(5):457–9. https://doi.org/10.1109/LED.2015.2418342 .

Baek K, et al. In situ TEM observation on the interface-type resistive switching by electrochemical redox reactions at a TiN/PCMO interface. Nanoscale. 2017;9(2):582–93. https://doi.org/10.1039/C6NR06293H .

Raeis-Hosseini N, et al. Reliable Ge2Sb2Te5-integrated high-density nanoscale conductive bridge random access memory using facile nitrogen-doping strategy. Adv Electron Mater. 2018;4(11):1800360. https://doi.org/10.1002/aelm.201800360 .

Poddar S, et al. Down-scalable and ultra-fast memristors with ultra-high density three-dimensional arrays of perovskite quantum wires. Nano Lett. 2021;21(12):5036–44. https://doi.org/10.1021/acs.nanolett.1c00834 .

Bhattacharjee S, et al. Insights into multilevel resistive switching in monolayer MoS2. ACS Appl Mater Interfaces. 2020;12(5):6022–9. https://doi.org/10.1021/acsami.9b15677 .

Xu X, et al. A bioinspired artificial injury response system based on a robust polymer memristor to mimic a sense of pain, sign of injury, and healing. Adv Sci. 2022;9(15):2200629. https://doi.org/10.1002/advs.202200629 .

Guo L, et al. Stacked two-dimensional MXene composites for an energy-efficient memory and digital comparator. ACS Appl Mater Interfaces. 2021;13(33):39595–605. https://doi.org/10.1021/acsami.1c11014 .

Xiong W, et al. Flexible poly(vinyl alcohol)–graphene oxide hybrid nanocomposite based cognitive memristor with pavlovian-conditioned reflex activities. Adv Electron Mater. 2020;6(5):1901402. https://doi.org/10.1002/aelm.201901402 .

Kim H, et al. Quasi-2D halide perovskites for resistive switching devices with ON/OFF ratios above 109. NPG Asia Mater. 2020;12(1):21. https://doi.org/10.1038/s41427-020-0202-2 .

Albano LG, et al. Ambipolar resistive switching in an ultrathin surface-supported metal–organic framework vertical heterojunction. Nano Lett. 2020;20(2):1080–8. https://doi.org/10.1021/acs.nanolett.9b04355 .

Wang Z, et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12. https://doi.org/10.1109/TIP.2003.819861 .

Strukov DB, Likharev KK. CMOL FPGA: a reconfigurable architecture for hybrid digital circuits with two-terminal nanodevices. Nanotechnology. 2005;16(6):888. https://doi.org/10.1088/0957-4484/16/6/045 .

Xia Q, et al. Memristor-CMOS hybrid integrated circuits for reconfigurable logic. Nano Lett. 2009;9(10):3640–5. https://doi.org/10.1021/nl901874j .

Cho K, Lee S-J, Eshraghian K. Memristor-CMOS logic and digital computational components. Microelectron J. 2015;46(3):214–20. https://doi.org/10.1016/j.mejo.2014.12.006 .

Brink S, et al. A learning-enabled neuron array IC based upon transistor channel models of biological phenomena. IEEE Trans Biomed Circuits Syst. 2012;7(1):71–81. https://doi.org/10.1109/TBCAS.2012.2197858 .

Indiveri G, Chicca E, Douglas R. A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity. IEEE Trans Neural Netw. 2006;17(1):211–21. https://doi.org/10.1109/TNN.2005.860850 .

Bofill-i-Petit A, Murray AF. Synchrony detection and amplification by silicon neurons with STDP synapses. IEEE Trans Neural Netw. 2004;15(5):1296–304. https://doi.org/10.1109/TNN.2004.832842 .

Chicca E, et al. A VLSI recurrent network of integrate-and-fire neurons connected by plastic synapses with long-term memory. IEEE Trans Neural Netw. 2003;14(5):1297–307. https://doi.org/10.1109/TNN.2003.816367 .

Kim H, et al. Neural synaptic weighting with a pulse-based memristor circuit. IEEE Trans Circuits Syst I Regul Pap. 2011;59(1):148–58. https://doi.org/10.1109/TCSI.2011.2161360 .

Adhikari SP, et al. Memristor bridge synapse-based neural network and its learning. IEEE Trans Neural Netw Learn Syst. 2012;23(9):1426–35. https://doi.org/10.1109/TNNLS.2012.2204770 .

Cantley KD, et al. Hebbian learning in spiking neural networks with nanocrystalline silicon TFTs and memristive synapses. IEEE Trans Nanotechnol. 2011;10(5):1066–73. https://doi.org/10.1109/TNANO.2011.2105887 .

Cantley KD, et al. Neural learning circuits utilizing nano-crystalline silicon transistors and memristors. IEEE Trans Neural Netw Learn Syst. 2012;23(4):565–73. https://doi.org/10.1109/TNNLS.2012.2184801 .

Castleman KR. Digital image processing. Englewood cliffs: Prentice Hall; 1996. https://doi.org/10.5555/225496 .

Book   Google Scholar  

Chua LO, Yang L. Cellular neural networks: theory. IEEE Trans Circuits Syst. 1988;35(10):1257–72. https://doi.org/10.1109/31.7600 .

Roska T, Chua LO. The CNN universal machine: an analogic array computer. IEEE Trans Circuits Syst II Analog Digit Signal Process. 1993;40(3):163–73. https://doi.org/10.1109/82.222815 .

Zhou J, et al. A memristor-based architecture combining memory and image processing. Sci China Inf Sci. 2014;57(5):1–12. https://doi.org/10.1007/s11432-013-4887-5 .

Hu X, et al. Memristive crossbar array with applications in image processing. Sci China Inf Sci. 2012;55(2):461–72. https://doi.org/10.1007/s11432-011-4410-9 .

Muthulakshmi S, Dash CS, Prabaharan S. Memristor augmented approximate adders and subtractors for image processing applications: an approach. AEU Int J Electron Commun. 2018;91:91–102. https://doi.org/10.1016/j.aeue.2018.05.003 .

Tetzlaff R, et al. Theoretical foundations of memristor cellular nonlinear networks: memcomputing with bistable-like memristors. IEEE Trans Circuits Syst I Regul Pap. 2019;67(2):502–15. https://doi.org/10.1109/TCSI.2019.2940909 .

Almurib HA, Kumar TN, Lombardi F. Inexact designs for approximate low power addition by cell replacement. Paper presented at the 2016 Design, automation & test in Europe conference & exhibition (DATE); 2016.

Hong Q, et al. Circuit design and application of discrete cosine transform based on memristor. IEEE J Emerg Sel Top Circuits Syst. 2023. https://doi.org/10.1109/JETCAS.2023.3243569 .

Li C, et al. Large memristor crossbars for analog computing. Paper presented at the 2018 IEEE international symposium on circuits and systems (ISCAS); 2018. https://doi.org/10.1109/ISCAS.2018.8351877 .

Hu M, Strachan JP. Accelerating discrete Fourier transforms with dot-product engine. Paper presented at the 2016 IEEE international conference on rebooting computing (ICRC); 2016. https://doi.org/10.1109/ICRC.2016.7738682 .

Li C, et al. Analogue signal and image processing with large memristor crossbars. Nat Electron. 2018;1(1):52–9. https://doi.org/10.1038/s41928-017-0002-z .

Zhang B, Uysal N, Ewetz R. Computational restructuring: Rethinking image compression using resistive crossbar arrays. IEEE Trans Comput Aided Des Integr Circuits Syst. 2020;40(5):836–49. https://doi.org/10.1109/TCAD.2020.3010714 .

Berco D, Ang DS, Kalaga PS. Programmable photoelectric memristor gates for in situ image compression. Adv Intell Syst. 2020;2(9):2000079. https://doi.org/10.1002/aisy.202000079 .

Chakraborty D, et al. Input-aware flow-based computing on memristor crossbars with applications to edge detection. IEEE J Emerg Sel Top Circuits Syst. 2019;9(3):580–91. https://doi.org/10.1109/JETCAS.2019.2933774 .

Cheng S, Qiguang M, Pengfei X. A novel algorithm of remote sensing image fusion based on Shearlets and PCNN. Neurocomputing. 2013;117:47–53. https://doi.org/10.1016/j.neucom.2012.10.025 .

Jha SK, et al. Computation of Boolean formulas using sneak paths in crossbar computing. In: Google Patents. U.S. Patent 9319047; 2016.

Martin D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. Paper presented at the Proceedings eighth IEEE international conference on computer vision. ICCV 2001; 2001. https://doi.org/10.1109/ICCV.2001.937655 .

Tan K, Oakley JP. Enhancement of color images in poor visibility conditions. Paper presented at the Proceedings 2000 international conference on image processing (Cat. No. 00CH37101); 2000. https://doi.org/10.1109/ICIP.2000.899827 .

Tan RT. Visibility in bad weather from a single image. Paper presented at the 2008 IEEE conference on computer vision and pattern recognition; 2008. https://doi.org/10.1109/CVPR.2008.4587643 .

He K, Sun J, Tang X. Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell. 2010;33(12):2341–53. https://doi.org/10.1109/TPAMI.2010.168 .

Tarel J-P, et al. Vision enhancement in homogeneous and heterogeneous fog. IEEE Intell Transp Syst Mag. 2012;4(2):6–20. https://doi.org/10.1109/MITS.2012.2189969 .

Nishino K, Kratz L, Lombardi S. Bayesian defogging. Int J Comput Vision. 2012;98:263–78. https://doi.org/10.1007/s11263-011-0508-1 .

Meng G, et al. Efficient image dehazing with boundary constraint and contextual regularization. Paper presented at the Proceedings of the IEEE international conference on computer vision; 2013.

Sulami M, et al. Automatic recovery of the atmospheric light in hazy images. Paper presented at the 2014 IEEE international conference on computational photography (ICCP); 2014. https://doi.org/10.1109/ICCPHOT.2014.6831817 .

Xiao-Bo Q, et al. Image fusion algorithm based on spatial frequency-motivated pulse coupled neural networks in nonsubsampled contourlet transform domain. Acta Autom Sin. 2008;34(12):1508–14. https://doi.org/10.1016/S1874-1029(08)60174-3 .

Wei S, Hong Q, Hou M. Automatic image segmentation based on PCNN with adaptive threshold time constant. Neurocomputing. 2011;74(9):1485–91. https://doi.org/10.1016/j.neucom.2011.01.005 .

Yongbin Y, et al. Memristor bridge-based low pass filter for image processing. J Syst Eng Electron. 2019;30(3):448–55. https://doi.org/10.21629/JSEE.2019.03.02 .

Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell. 1997;19(7):711–20. https://doi.org/10.1109/34.598228 .

Motwani MC, et al. Survey of image denoising techniques. Paper presented at the Proceedings of GSPX; 2004.

Kim H, et al. Memristor bridge synapses. Proc IEEE. 2011;100(6):2061–70. https://doi.org/10.1109/JPROC.2011.2166749 .

Wang M, et al. A new image denoising method based on Gaussian filter. Paper presented at the 2014 International conference on information science, electronics and electrical engineering; 2014. https://doi.org/10.1109/InfoSEEE.2014.6948089 .

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. https://doi.org/10.1038/nature14539 .

Wu H, et al. Device and circuit optimization of RRAM for neuromorphic computing. Paper presented at the 2017 IEEE international electron devices meeting (IEDM); 2017. https://doi.org/10.1109/IEDM.2017.8268372 .

Xia Q, Yang JJ. Memristive crossbar arrays for brain-inspired computing. Nat Mater. 2019;18(4):309–23. https://doi.org/10.1038/s41563-019-0291-x .

Ding K, et al. Phase-change heterostructure enables ultralow noise and drift for memory operation. Science. 2019;366(6462):210–5. https://doi.org/10.1126/science.aay0291 .

Welser J, Pitera JW, Goldberg C. Future computing hardware for AI. Paper presented at the 2018 IEEE international electron devices meeting (IEDM); 2018. https://doi.org/10.1109/IEDM.2018.8614482 .

Park S, et al. Nanoscale RRAM-based synaptic electronics: toward a neuromorphic computing device. Nanotechnology. 2013;24(38):384009. https://doi.org/10.1088/0957-4484/24/38/384009 .

LeCun Y, et al. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324. https://doi.org/10.1109/5.726791 .

Download references

We acknowledged the financial support of NSFC (Grants No. 61974073, 61964012, and 62204128), and the Science and Technology Department of Jiangsu Province (Grants No. BK20211273, BZ2021031, and BK20220399).

Author information

Authors and affiliations.

College of Integrated Circuit Science and Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China

Lei Wang, Qingyue Meng, Huihui Wang, Jiyuan Jiang, Xiang Wan, Xiaoyan Liu, Xiaojuan Lian & Zhikuang Cai

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization contributed by LW and YM; methodology contributed by LW, YM, HW, and YJ; software contributed by LW, YM, and YL; validation contributed by LW, YM, and XW; writing–original draft preparation contributed by LW, YM, and JL; writing–review and editing contributed by KC. All authors have read and agreed to the published version of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Lei Wang , Xiaojuan Lian or Zhikuang Cai .

Ethics declarations

Competing interests.

There is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Wang, L., Meng, Q., Wang, H. et al. Digital image processing realized by memristor-based technologies. Discover Nano 18 , 120 (2023). https://doi.org/10.1186/s11671-023-03901-w

Download citation

Received : 14 August 2023

Accepted : 19 September 2023

Published : 28 September 2023

DOI : https://doi.org/10.1186/s11671-023-03901-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Image processing
  • Array-level networks
  • Neuromorphic system
  • Computer-in-memory
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 05 December 2018

Application research of digital media image processing technology based on wavelet transform

  • Lina Zhang 1 ,
  • Lijuan Zhang 2 &
  • Liduo Zhang 3  

EURASIP Journal on Image and Video Processing volume  2018 , Article number:  138 ( 2018 ) Cite this article

7062 Accesses

12 Citations

Metrics details

With the development of information technology, people access information more and more rely on the network, and more than 80% of the information in the network is replaced by multimedia technology represented by images. Therefore, the research on image processing technology is very important, but most of the research on image processing technology is focused on a certain aspect. The research results of unified modeling on various aspects of image processing technology are still rare. To this end, this paper uses image denoising, watermarking, encryption and decryption, and image compression in the process of image processing technology to carry out unified modeling, using wavelet transform as a method to simulate 300 photos from life. The results show that unified modeling has achieved good results in all aspects of image processing.

1 Introduction

With the increase of computer processing power, people use computer processing objects to slowly shift from characters to images. According to statistics, today’s information, especially Internet information, transmits and stores more than 80% of the information. Compared with the information of the character type, the image information is much more complicated, so it is more complicated to process the characters on the computer than the image processing. Therefore, in order to make the use of image information safer and more convenient, it is particularly important to carry out related application research on image digital media. Digital media image processing technology mainly includes denoising, encryption, compression, storage, and many other aspects.

The purpose of image denoising is to remove the noise of the natural frequency in the image to achieve the characteristics of highlighting the meaning of the image itself. Because of the image acquisition, processing, etc., they will damage the original signal of the image. Noise is an important factor that interferes with the clarity of an image. This source of noise is varied and is mainly derived from the transmission process and the quantization process. According to the relationship between noise and signal, noise can be divided into additive noise, multiplicative noise, and quantization noise. In image noise removal, commonly used methods include a mean filter method, an adaptive Wiener filter method, a median filter, and a wavelet transform method. For example, the image denoising method performed by the neighborhood averaging method used in the literature [ 1 , 2 , 3 ] is a mean filtering method which is suitable for removing particle noise in an image obtained by scanning. The neighborhood averaging method strongly suppresses the noise and also causes the ambiguity due to the averaging. The degree of ambiguity is proportional to the radius of the field. The Wiener filter adjusts the output of the filter based on the local variance of the image. The Wiener filter has the best filtering effect on images with white noise. For example, in the literature [ 4 , 5 ], this method is used for image denoising, and good denoising results are obtained. Median filtering is a commonly used nonlinear smoothing filter that is very effective in filtering out the salt and pepper noise of an image. The median filter can both remove noise and protect the edges of the image for a satisfactory recovery. In the actual operation process, the statistical characteristics of the image are not needed, which brings a lot of convenience. For example, the literature [ 6 , 7 , 8 ] is a successful case of image denoising using median filtering. Wavelet analysis is to denoise the image by using the wavelet’s layering coefficient, so the image details can be well preserved, such as the literature [ 9 , 10 ].

Image encryption is another important application area of digital image processing technology, mainly including two aspects: digital watermarking and image encryption. Digital watermarking technology directly embeds some identification information (that is, digital watermark) into digital carriers (including multimedia, documents, software, etc.), but does not affect the use value of the original carrier, and is not easily perceived or noticed by a human perception system (such as a visual or auditory system). Through the information hidden in the carrier, it is possible to confirm the content creator, the purchaser, transmit the secret information, or determine whether the carrier has been tampered with. Digital watermarking is an important research direction of information hiding technology. For example, the literature [ 11 , 12 ] is the result of studying the image digital watermarking method. In terms of digital watermarking, some researchers have tried to use wavelet method to study. For example, AH Paquet [ 13 ] and others used wavelet packet to carry out digital watermark personal authentication in 2003, and successfully introduced wavelet theory into digital watermark research, which opened up a new idea for image-based digital watermarking technology. In order to achieve digital image secrecy, in practice, the two-dimensional image is generally converted into one-dimensional data, and then encrypted by a conventional encryption algorithm. Unlike ordinary text information, images and videos are temporal, spatial, visually perceptible, and lossy compression is also possible. These features make it possible to design more efficient and secure encryption algorithms for images. For example, Z Wen [ 14 ] and others use the key value to generate real-value chaotic sequences, and then use the image scrambling method in the space to encrypt the image. The experimental results show that the technology is effective and safe. YY Wang [ 15 ] et al. proposed a new optical image encryption method using binary Fourier transform computer generated hologram (CGH) and pixel scrambling technology. In this method, the order of pixel scrambling and the encrypted image are used as keys for decrypting the original image. Zhang X Y [ 16 ] et al. combined the mathematical principle of two-dimensional cellular automata (CA) with image encryption technology and proposed a new image encryption algorithm. The image encryption algorithm is convenient to implement, has good security, large key amount, good avalanche effect, high degree of confusion, diffusion characteristics, simple operation, low computational complexity, and high speed.

In order to realize the transmission of image information quickly, image compression is also a research direction of image application technology. The information age has brought about an “information explosion” that has led to an increase in the amount of data, so that data needs to be effectively compressed regardless of transmission or storage. For example, in remote sensing technology, space probes use compression coding technology to send huge amounts of information back to the ground. Image compression is the application of data compression technology on digital images. The purpose of image compression is to reduce redundant information in image data and store and transmit data in a more efficient format. Through the unremitting efforts of researchers, image compression technology is now maturing. For example, Lewis A S [ 17 ] hierarchically encodes the transformed coefficients, and designs a new image compression method based on the local estimation noise sensitivity of the human visual system (HVS). The algorithm can be easily mapped to 2-D orthogonal wavelet transform to decompose the image into spatial and spectral local coefficients. Devore R A [ 18 ] introduced a novel theory to analyze image compression methods based on wavelet decomposition compression. Buccigrossi R W [ 19 ] developed a probabilistic model of natural images based on empirical observations of statistical data in the wavelet transform domain. The wavelet coefficient pairs of the basis functions corresponding to adjacent spatial locations, directions, and scales are found to be non-Gaussian in their edges and joint statistical properties. They proposed a Markov model that uses linear predictors to interpret these dependencies, where amplitude is combined with multiplicative and additive uncertainty and indicates that it can interpret statistical data for various images, including photographic images, graphic images, and medical images. In order to directly prove the efficacy of the model, an image encoder called Embedded Prediction Wavelet Image Coder (EPWIC) was constructed in their research. The subband coefficients use a non-adaptive arithmetic coder to encode a bit plane at a time. The encoder uses the conditional probability calculated from the model to sort the bit plane using a greedy algorithm. The algorithm considers the MSE reduction for each coded bit. The decoder uses a statistical model to predict coefficient values based on the bits it has received. Although the model is simple, the rate-distortion performance of the encoder is roughly equivalent to the best image encoder in the literature.

From the existing research results, we find that today’s digital image-based application research has achieved fruitful results. However, this kind of results mainly focus on methods, such as deep learning [ 20 , 21 ], genetic algorithm [ 22 , 23 ], fuzzy theory, etc. [ 24 , 25 ], which also includes the method of wavelet analysis. However, the biggest problem in the existing image application research is that although the existing research on digital multimedia has achieved good research results, there is also a problem. Digital multimedia processing technology is an organic whole. From denoising, compression, storage, encryption, decryption to retrieval, it should be a whole, but the current research results basically study a certain part of this whole. Therefore, although one method is superior in one of the links, it is not necessary whether this method will be suitable for other links. Therefore, in order to solve this problem, this thesis takes digital image as the research object; realizes unified modeling by three main steps of encryption, compression, and retrieval in image processing; and studies the image processing capability of multiple steps by one method.

Wavelet transform is a commonly used digital signal processing method. Since the existing digital signals are mostly composed of multi-frequency signals, there are noise signals, secondary signals, and main signals in the signal. In the image processing, there are also many research teams using wavelet transform as a processing method, introducing their own research and achieving good results. So, can we use wavelet transform as a method to build a model suitable for a variety of image processing applications?

In this paper, the wavelet transform is used as a method to establish the denoising encryption and compression model in the image processing process, and the captured image is simulated. The results show that the same wavelet transform parameters have achieved good results for different image processing applications.

2.1 Image binarization processing method

The gray value of the point of the image ranges from 0 to 255. In the image processing, in order to facilitate the further processing of the image, the frame of the image is first highlighted by the method of binarization. The so-called binarization is to map the point gray value of the image from the value space of 0–255 to the value of 0 or 255. In the process of binarization, threshold selection is a key step. The threshold used in this paper is the maximum between-class variance method (OTSU). The so-called maximum inter-class variance method means that for an image, when the segmentation threshold of the current scene and the background is t , the pre-attraction image ratio is w0, the mean value is u0, the background point is the image ratio w1, and the mean value is u1. Then the mean of the entire image is:

The objective function can be established according to formula 1:

The OTSU algorithm makes g ( t ) take the global maximum, and the corresponding t when g ( t ) is maximum is called the optimal threshold.

2.2 Wavelet transform method

Wavelet transform (WT) is a research result of the development of Fourier transform technology, and the Fourier transform is only transformed into different frequencies. The wavelet transform not only has the local characteristics of the Fourier transform but also contains the transform frequency result. The advantage of not changing with the size of the window. Therefore, compared with the Fourier transform, the wavelet transform is more in line with the time-frequency transform. The biggest characteristic of the wavelet transform is that it can better represent the local features of certain features with frequency, and the scale of the wavelet transform can be different. The low-frequency and high-frequency division of the signal makes the feature more focused. This paper mainly uses wavelet transform to analyze the image in different frequency bands to achieve the effect of frequency analysis. The method of wavelet transform can be expressed as follows:

Where ψ ( t ) is the mother wavelet, a is the scale factor, and τ is the translation factor.

Because the image signal is a two-dimensional signal, when using wavelet transform for image analysis, it is necessary to generalize the wavelet transform to two-dimensional wavelet transform. Suppose the image signal is represented by f ( x , y ), ψ ( x ,  y ) represents a two-dimensional basic wavelet, and ψ a , b , c ( x ,  y ) represents the scale and displacement of the basic wavelet, that is, ψ a , b , c ( x ,  y ) can be calculated by the following formula:

According to the above definition of continuous wavelet, the two-dimensional continuous wavelet transform can be calculated by the following formula:

Where \( \overline{\psi \left(x,y\right)} \) is the conjugate of ψ ( x ,  y ).

2.3 Digital water mark

According to different methods of use, digital watermarking technology can be divided into the following types:

Spatial domain approach: A typical watermarking algorithm in this type of algorithm embeds information into the least significant bits (LSB) of randomly selected image points, which ensures that the embedded watermark is invisible. However, due to the use of pixel bits whose images are not important, the robustness of the algorithm is poor, and the watermark information is easily destroyed by filtering, image quantization, and geometric deformation operations. Another common method is to use the statistical characteristics of the pixels to embed the information in the luminance values of the pixels.

The method of transforming the domain: first calculate the discrete cosine transform (DCT) of the image, and then superimpose the watermark on the front k coefficient with the largest amplitude in the DCT domain (excluding the DC component), usually the low-frequency component of the image. If the first k largest components of the DCT coefficients are represented as D =, i  = 1, ..., k, and the watermark is a random real sequence W =, i  = 1, ..., k obeying the Gaussian distribution, then the watermark embedding algorithm is di = di(1 + awi), where the constant a is a scale factor that controls the strength of the watermark addition. The watermark image I is then obtained by inverse transforming with a new coefficient. The decoding function calculates the discrete cosine transform of the original image I and the watermark image I * , respectively, and extracts the embedded watermark W * , and then performs correlation test to determine the presence or absence of the watermark.

Compressed domain algorithm: The compressed domain digital watermarking system based on JPEG and MPEG standards not only saves a lot of complete decoding and re-encoding process but also has great practical value in digital TV broadcasting and video on demand (VOD). Correspondingly, watermark detection and extraction can also be performed directly in the compressed domain data.

The wavelet transform used in this paper is the method of transform domain. The main process is: assume that x ( m ,  n ) is a grayscale picture of M * N , the gray level is 2 a , where M , N and a are positive integers, and the range of values of m and n is defined as follows: 1 ≤  m  ≤  M , 1 ≤  n  ≤  N . For wavelet decomposition of this image, if the number of decomposition layers is L ( L is a positive integer), then 3* L high-frequency partial maps and a low-frequency approximate partial map can be obtained. Then X k , L can be used to represent the wavelet coefficients, where L is the number of decomposition layers, and K can be represented by H , V , and D , respectively, representing the horizontal, vertical, and diagonal subgraphs. Because the sub-picture distortion of the low frequency is large, the picture embedded in the watermark is removed from the picture outside the low frequency.

In order to realize the embedded digital watermark, we must first divide X K , L ( m i ,  n j ) into a certain size, and use B ( s , t ) to represent the coefficient block of size s * t in X K , L ( m i ,  n j ). Then the average value can be expressed by the following formula:

Where ∑ B ( s ,  t ) is the cumulative sum of the magnitudes of the coefficients within the block.

The embedding of the watermark sequence w is achieved by the quantization of AVG.

The interval of quantization is represented by Δ l according to considerations of robustness and concealment. For the low-level L th layer, since the coefficient amplitude is large, a larger interval can be set. For the other layers, starting from the L -1 layer, they are successively decremented.

According to w i  = {0, 1}, AVG is quantized to the nearest singular point, even point, D ( i , j ) is used to represent the wavelet coefficients in the block, and the quantized coefficient is represented by D ( i ,  j ) ' , where i  = 1, 2,. .., s ; j  = 1,2,..., t . Suppose T  =  AVG /Δ l , TD = rem(| T |, 2), where || means rounding and rem means dividing by 2 to take the remainder.

According to whether TD and w i are the same, the calculation of the quantized wavelet coefficient D ( i ,  j ) ' can be as follows:

Using the same wavelet base, an image containing the watermark is generated by inverse wavelet transform, and the wavelet base, the wavelet decomposition layer number, the selected coefficient region, the blocking method, the quantization interval, and the parity correspondence are recorded to form a key.

The extraction of the watermark is determined by the embedded method, which is the inverse of the embedded mode. First, wavelet transform is performed on the image to be detected, and the position of the embedded watermark is determined according to the key, and the inverse operation of the scramble processing is performed on the watermark.

2.4 Evaluation method

Filter normalized mean square error.

In order to measure the effect before and after filtering, this paper chooses the normalized mean square error M description. The calculation method of M is as follows:

where N 1 and N 2 are Pixels before and after normalization.

Normalized cross-correlation function

The normalized cross-correlation function is a classic algorithm of image matching algorithm, which can be used to represent the similarity of images. The normalized cross-correlation is determined by calculating the cross-correlation metric between the reference map and the template graph, generally expressed by NC( i , j ). If the NC value is larger, it means that the similarity between the two is greater. The calculation formula for the cross-correlation metric is as follows:

where T ( m , n ) is the n th row of the template image, the m th pixel value; S ( i , j ) is the part under the template cover, and i , j is the coordinate of the lower left corner of the subgraph in the reference picture S.

Normalize the above formula NC according to the following formula:

Peak signal-to-noise ratio

Peak signal-to-noise ratio is often used as a measure of signal reconstruction quality in areas such as image compression, which is often simply defined by mean square error (MSE). Two m  ×  n monochrome images I and K , if one is another noise approximation, then their mean square error is defined as:

Then the peak signal-to-noise ratio PSNR calculation method is:

Where Max is the maximum value of the pigment representing the image.

Information entropy

For a digital signal of an image, the frequency of occurrence of each pixel is different, so it can be considered that the image digital signal is actually an uncertainty signal. For image encryption, the higher the uncertainty of the image, the more the image tends to be random, the more difficult it is to crack. The lower the rule, the more regular it is, and the more likely it is to be cracked. For a grayscale image of 256 levels, the maximum value of information entropy is 8, so the more the calculation result tends to be 8, the better.

The calculation method of information entropy is as follows:

Correlation

Correlation is a parameter describing the relationship between two vectors. This paper describes the relationship between two images before and after image encryption by correlation. Assuming p ( x ,  y ) represents the correlation between pixels before and after encryption, the calculation method of p ( x ,  y ) can be calculated by the following formula:

3 Experiment

3.1 image parameter.

The images used in this article are all from the life photos, the shooting tool is Huawei meta 10, the picture size is 1440*1920, the picture resolution is 96 dbi, the bit depth is 24, no flash mode, there are 300 pictures as simulation pictures, all of which are life photos, and no special photos.

3.2 System environment

The computer system used in this simulation is Windows 10, and the simulation software used is MATLAB 2014B.

3.3 Wavelet transform-related parameters

For unified modeling, the wavelet decomposition used in this paper uses three layers of wavelet decomposition, and Daubechies is chosen as the wavelet base. The Daubechies wavelet is a wavelet function constructed by the world-famous wavelet analyst Ingrid Daubechies. They are generally abbreviated as dbN, where N is the order of the wavelet. The support region in the wavelet function Ψ( t ) and the scale function ϕ ( t ) is 2 N-1, and the vanishing moment of Ψ( t ) is N . The dbN wavelet has good regularity, that is, the smooth error introduced by the wavelet as a sparse basis is not easy to be detected, which makes the signal reconstruction process smoother. The characteristic of the dbN wavelet is that the order of the vanishing moment increases with the increase of the order (sequence N), wherein the higher the vanishing moment, the better the smoothness, the stronger the localization ability of the frequency domain, and the better the band division effect. However, the support of the time domain is weakened, and the amount of calculation is greatly increased, and the real-time performance is deteriorated. In addition, except for N  = 1, the dbN wavelet does not have symmetry (i.e., nonlinear phase), that is, a certain phase distortion is generated when the signal is analyzed and reconstructed. N  = 3 in this article.

4 Results and discussion

4.1 results 1: image filtering using wavelet transform.

In the process of image recording, transmission, storage, and processing, it is possible to pollute the image signal. The digital signal transmitted to the image will appear as noise. These noise data will often become isolated pixels. One-to-one isolated points, although they do not destroy the overall external frame of the image, but because these isolated points tend to be high in frequency, they are portable on the image as a bright spot, which greatly affects the viewing quality of the image, so to ensure the effect of image processing, the image must be denoised. The effective method of denoising is to remove the noise of a certain frequency of the image by filtering, but the denoising must ensure that the noise data can be removed without destroying the image. Figure  1 is the result of filtering the graph using the wavelet transform method. In order to test the wavelet filtering effect, this paper adds Gaussian white noise to the original image. Comparing the white noise with the frequency analysis of the original image, it can be seen that after the noise is added, the main image frequency segment of the original image is disturbed by the noise frequency, but after filtering using the wavelet transform, the frequency band of the main frame of the original image appears again. However, the filtered image does not change significantly compared to the original image. The normalized mean square error before and after filtering is calculated, and the M value before and after filtering is 0.0071. The wavelet transform is well protected to protect the image details, and the noise data is better removed (the white noise is 20%).

figure 1

Image denoising results comparison. (The first row from left to right are the original image, plus the noise map and the filtered map. The second row from left to right are the frequency distribution of the original image, the frequency distribution of the noise plus the filtered Frequency distribution)

4.2 Results 2: digital watermark encryption based on wavelet transform

As shown in Fig.  2 , the watermark encryption process based on wavelet transform can be seen from the figure. Watermarking the image by wavelet transform does not affect the structure of the original image. The noise is 40% of the salt and pepper noise. For the original image and the noise map, the wavelet transform method can extract the watermark well.

figure 2

Comparison of digital watermark before and after. (The first row from left to right are the original image, plus noise and watermark, and the noise is removed; the second row are the watermark original, the watermark extracted from the noise plus watermark, and the watermark extracted after denoising)

According to the method described in this paper, the image correlation coefficient and peak-to-noise ratio of the original image after watermarking are calculated. The correlation coefficient between the original image and the watermark is 0.9871 (the first column and the third column in the first row in the figure). The watermark does not destroy the structure of the original image. The signal-to-noise ratio of the original picture is 33.5 dB, and the signal-to-noise ratio of the water-jet printing is 31.58SdB, which proves that the wavelet transform can achieve watermark hiding well. From the second row of watermarking results, the watermark extracted from the image after noise and denoising, and the original watermark correlation coefficient are (0.9745, 0.9652). This shows that the watermark signal can be well extracted after being hidden by the wavelet transform.

4.3 Results 3: image encryption based on wavelet transform

In image transmission, the most common way to protect image content is to encrypt the image. Figure  3 shows the process of encrypting and decrypting an image using wavelet transform. It can be seen from the figure that after the image is encrypted, there is no correlation with the original image at all, but the decrypted image of the encrypted image reproduces the original image.

figure 3

Image encryption and decryption process diagram comparison. (The left is the original image, the middle is the encrypted image, the right is the decryption map)

The information entropy of Fig.  3 is calculated. The results show that the information entropy of the original image is 3.05, the information entropy of the decrypted graph is 3.07, and the information entropy of the encrypted graph is 7.88. It can be seen from the results of information entropy that before and after encryption. The image information entropy is basically unchanged, but the information entropy of the encrypted image becomes 7.88, indicating that the encrypted image is close to a random signal and has good confidentiality.

4.4 Result 4: image compression

Image data can be compressed because of the redundancy in the data. The redundancy of image data mainly manifests as spatial redundancy caused by correlation between adjacent pixels in an image; time redundancy due to correlation between different frames in an image sequence; spectral redundancy due to correlation of different color planes or spectral bands. The purpose of data compression is to reduce the number of bits required to represent the data by removing these data redundancy. Since the amount of image data is huge, it is very difficult to store, transfer, and process, so the compression of image data is very important. Figure  4 shows the result of two compressions of the original image. It can be seen from the figure that although the image is compressed, the main frame of the image does not change, but the image sharpness is significantly reduced. The Table  1 shows the compressed image properties.

figure 4

Image comparison before and after compression. (left is the original image, the middle is the first compression, the right is the second compression)

It can be seen from the results in Table 1 that after multiple compressions, the size of the image is significantly reduced and the image is getting smaller and smaller. The original image needs 2,764,800 bytes, which is reduced to 703,009 after a compression, which is reduced by 74.5%. After the second compression, only 182,161 is left, which is 74.1% lower. It can be seen that the wavelet transform can achieve image compression well.

5 Conclusion

With the development of informatization, today’s era is an era full of information. As the visual basis of human perception of the world, image is an important means for humans to obtain information, express information, and transmit information. Digital image processing, that is, processing images with a computer, has a long history of development. Digital image processing technology originated in the 1920s, when a photo was transmitted from London, England to New York, via a submarine cable, using digital compression technology. First of all, digital image processing technology can help people understand the world more objectively and accurately. The human visual system can help humans get more than 3/4 of the information from the outside world, and images and graphics are the carriers of all visual information, despite the identification of the human eye. It is very powerful and can recognize thousands of colors, but in many cases, the image is blurred or even invisible to the human eye. Image enhancement technology can make the blurred or even invisible image clear and bright. There are also some relevant research results on this aspect of research, which proves that relevant research is feasible [ 26 , 27 ].

It is precisely because of the importance of image processing technology that many researchers have begun research on image processing technology and achieved fruitful results. However, with the deepening of image processing technology research, today’s research has a tendency to develop in depth, and this depth is an in-depth aspect of image processing technology. However, the application of image processing technology is a system engineering. In addition to the deep requirements, there are also systematic requirements. Therefore, if the unified model research on multiple aspects of image application will undoubtedly promote the application of image processing technology. Wavelet transform has been successfully applied in many fields of image processing technology. Therefore, this paper uses wavelet transform as a method to establish a unified model based on wavelet transform. Simulation research is carried out by filtering, watermark hiding, encryption and decryption, and image compression of image processing technology. The results show that the model has achieved good results.

Abbreviations

Cellular automata

Computer generated hologram

Discrete cosine transform

Embedded Prediction Wavelet Image Coder

Human visual system

Least significant bits

Video on demand

Wavelet transform

H.W. Zhang, The research and implementation of image Denoising method based on Matlab[J]. Journal of Daqing Normal University 36 (3), 1-4 (2016)

J.H. Hou, J.W. Tian, J. Liu, Analysis of the errors in locally adaptive wavelet domain wiener filter and image Denoising[J]. Acta Photonica Sinica 36 (1), 188–191 (2007)

Google Scholar  

M. Lebrun, An analysis and implementation of the BM3D image Denoising method[J]. Image Processing on Line 2 (25), 175–213 (2012)

Article   Google Scholar  

A. Fathi, A.R. Naghsh-Nilchi, Efficient image Denoising method based on a new adaptive wavelet packet thresholding function[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 21 (9), 3981 (2012)

MATH   Google Scholar  

X. Zhang, X. Feng, W. Wang, et al., Gradient-based wiener filter for image denoising [J]. Comput. Electr. Eng. 39 (3), 934–944 (2013)

T. Chen, K.K. Ma, L.H. Chen, Tri-state median filter for image denoising.[J]. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 8 (12), 1834 (1999)

S.M.M. Rahman, M.K. Hasan, Wavelet-domain iterative center weighted median filter for image denoising[J]. Signal Process. 83 (5), 1001–1012 (2003)

Article   MATH   Google Scholar  

H.L. Eng, K.K. Ma, Noise adaptive soft-switching median filter for image denoising[C]// IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. IEEE 4 , 2175–2178 (2000)

S.G. Chang, B. Yu, M. Vetterli, Adaptive wavelet thresholding for image denoising and compression[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 9 (9), 1532 (2000)

M. Kivanc Mihcak, I. Kozintsev, K. Ramchandran, et al., Low-complexity image Denoising based on statistical modeling of wavelet Coecients[J]. IEEE Signal Processing Letters 6 (12), 300–303 (1999)

J.H. Wu, F.Z. Lin, Image authentication based on digital watermarking[J]. Chinese Journal of Computers 9 , 1153–1161 (2004)

MathSciNet   Google Scholar  

A. Wakatani, Digital watermarking for ROI medical images by using compressed signature image[C]// Hawaii international conference on system sciences. IEEE (2002), pp. 2043–2048

A.H. Paquet, R.K. Ward, I. Pitas, Wavelet packets-based digital watermarking for image verification and authentication [J]. Signal Process. 83 (10), 2117–2132 (2003)

Z. Wen, L.I. Taoshen, Z. Zhang, An image encryption technology based on chaotic sequences[J]. Comput. Eng. 31 (10), 130–132 (2005)

Y.Y. Wang, Y.R. Wang, Y. Wang, et al., Optical image encryption based on binary Fourier transform computer-generated hologram and pixel scrambling technology[J]. Optics & Lasers in Engineering 45 (7), 761–765 (2007)

X.Y. Zhang, C. Wang, S.M. Li, et al., Image encryption technology on two-dimensional cellular automata[J]. Journal of Optoelectronics Laser 19 (2), 242–245 (2008)

A.S. Lewis, G. Knowles, Image compression using the 2-D wavelet transform[J]. IEEE Trans. Image Process. 1 (2), 244–250 (2002)

R.A. Devore, B. Jawerth, B.J. Lucier, Image compression through wavelet transform coding[J]. IEEE Trans.inf.theory 38 (2), 719–746 (1992)

Article   MathSciNet   MATH   Google Scholar  

R.W. Buccigrossi, E.P. Simoncelli, Image compression via joint statistical characterization in the wavelet domain[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 8 (12), 1688–1701 (1999)

A.A. Cruzroa, J.E. Arevalo Ovalle, A. Madabhushi, et al., A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med Image Comput Comput Assist Interv. 16 , 403–410 (2013)

S.P. Mohanty, D.P. Hughes, M. Salathé, Using deep learning for image-based plant disease detection[J]. Front. Plant Sci. 7 , 1419 (2016)

B. Sahiner, H. Chan, D. Wei, et al., Image feature selection by a genetic algorithm: application to classification of mass and normal breast tissue[J]. Med. Phys. 23 (10), 1671 (1996)

B. Bhanu, S. Lee, J. Ming, Adaptive image segmentation using a genetic algorithm[J]. IEEE Transactions on Systems Man & Cybernetics 25 (12), 1543–1567 (2002)

Y. Egusa, H. Akahori, A. Morimura, et al., An application of fuzzy set theory for an electronic video camera image stabilizer[J]. IEEE Trans. Fuzzy Syst. 3 (3), 351–356 (1995)

K. Hasikin, N.A.M. Isa, Enhancement of the low contrast image using fuzzy set theory[C]// Uksim, international conference on computer modelling and simulation. IEEE (2012), pp. 371–376

P. Yang, Q. Li, Wavelet transform-based feature extraction for ultrasonic flaw signal classification. Neural Comput. & Applic. 24 (3–4), 817–826 (2014)

R.K. Lama, M.-R. Choi, G.-R. Kwon, Image interpolation for high-resolution display based on the complex dual-tree wavelet transform and hidden Markov model. Multimedia Tools Appl. 75 (23), 16487–16498 (2016)

Download references

Acknowledgements

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

This work was supported by

Shandong social science planning research project in 2018

Topic: The Application of Shandong Folk Culture in Animation in The View of Digital Media (No. 18CCYJ14).

Shandong education science 12th five-year plan 2015

Topic: Innovative Research on Stop-motion Animation in The Digital Media Age (No. YB15068).

Shandong education science 13th five-year plan 2016–2017

Approval of “Ports and Arts Education Special Fund”: BCA2017017.

Topic: Reform of Teaching Methods of Hand Drawn Presentation Techniques (No. BCA2017017).

National Research Youth Project of state ethnic affairs commission in 2018

Topic: Protection and Development of Villages with Ethnic Characteristics Under the Background of Rural Revitalization Strategy (No. 2018-GMC-020).

Availability of data and materials

Authors can provide the data.

About the authors

Zaozhuang University, No. 1 Beian Road., Shizhong District, Zaozhuang City, Shandong, P.R. China.

Lina, Zhang was born in Jining, Shandong, P.R. China, in 1983. She received a Master degree from Bohai University, P.R. China. Now she works in School of Media, Zaozhuang University, P.R. China. Her research interests include animation and Digital media art.

Lijuan, Zhang was born in Jining, Shandong, P.R. China, in 1983. She received a Master degree from Jingdezhen Ceramic Institute, P.R. China. Now she works in School of Fine Arts and Design, Zaozhuang University, P.R. China. Her research interests include Interior design and Digital media art.

Liduo, Zhang was born in Zaozhuang, Shandong, P.R. China, in 1982. He received a Master degree from Monash University, Australia. Now he works in School of economics and management, Zaozhuang University. His research interests include Internet finance and digital media.

Author information

Authors and affiliations.

School of Media, Zaozhuang University, Zaozhuang, Shandong, China

School of Fine Arts and Design, Zaozhuang University, Zaozhuang, Shandong, China

Lijuan Zhang

School of Economics and Management, Zaozhuang University, Zaozhuang, Shandong, China

Liduo Zhang

You can also search for this author in PubMed   Google Scholar

Contributions

All authors take part in the discussion of the work described in this paper. The author LZ wrote the first version of the paper. The author LZ and LZ did part experiments of the paper, LZ revised the paper in different version of the paper, respectively. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lijuan Zhang .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Zhang, L., Zhang, L. & Zhang, L. Application research of digital media image processing technology based on wavelet transform. J Image Video Proc. 2018 , 138 (2018). https://doi.org/10.1186/s13640-018-0383-6

Download citation

Received : 28 September 2018

Accepted : 23 November 2018

Published : 05 December 2018

DOI : https://doi.org/10.1186/s13640-018-0383-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Image processing
  • Digital watermark
  • Image denoising
  • Image encryption
  • Image compression

research work in digital image processing

Digital Image Processing

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

M.Tech/Ph.D Thesis Help in Chandigarh | Thesis Guidance in Chandigarh

research work in digital image processing

[email protected]

research work in digital image processing

+91-9465330425

What is Digital Image Processing?

Digital image processing is the process of using computer algorithms to perform image processing on digital images. Latest topics in digital image processing for research and thesis are based on these algorithms. Being a subcategory of digital signal processing, digital image processing is better and carries many advantages over analog image processing. It permits to apply multiple algorithms to the input data and does not cause the problems such as the build-up of noise and signal distortion while processing. As images are defined over two or more dimensions that make digital image processing “a model of multidimensional systems”. The history of digital image processing dates back to early 1920s when the first application of digital image processing came into news. Many students are going for this field for their  m tech thesis  as well as for Ph.D. thesis. There are various thesis topics in digital image processing for M.Tech, M.Phil and Ph.D. students. The list of thesis topics in image processing is listed here. Before going into  topics in image processing , you should have some basic knowledge of image processing.

image-processing

Latest research topics in image processing for research scholars:

  • The hybrid classification scheme for plant disease detection in image processing
  • The edge detection scheme in image processing using ant and bee colony optimization
  • To improve PNLM filtering scheme to denoise MRI images
  • The classification method for the brain tumor detection
  • The CNN approach for the lung cancer detection in image processing
  • The neural network method for the diabetic retinopathy detection
  • The copy-move forgery detection approach using textual feature extraction method
  • Design face spoof detection method based on eigen feature extraction and classification
  • The classification and segmentation method for the number plate detection
  • Find the link at the end to download the latest thesis and research topics in Digital Image Processing

Formation of Digital Images

Firstly, the image is captured by a camera using sunlight as the source of energy. For the acquisition of the image, a sensor array is used. These sensors sense the amount of light reflected by the object when light falls on that object. A continuous voltage signal is generated when the data is being sensed. The data collected is converted into a digital format to create digital images. For this process, sampling and quantization methods are applied. This will create a 2-dimensional array of numbers which will be a digital image.

Why is Image Processing Required?

  • Image Processing serves the following main purpose:
  • Visualization of the hidden objects in the image.
  • Enhancement of the image through sharpening and restoration.
  • Seek valuable information from the images.
  • Measuring different patterns of objects in the image.
  • Distinguishing different objects in the image.

Applications of Digital Image Processing

  • There are various applications of digital image processing which can also be a good topic for the thesis in image processing. Following are the main applications of image processing:
  • Image Processing is used to enhance the image quality through techniques like image sharpening and restoration. The images can be altered to achieve the desired results.
  • Digital Image Processing finds its application in the medical field for gamma-ray imaging, PET Scan, X-ray imaging, UV imaging.
  • It is used for transmission and encoding.
  • It is used in color processing in which processing of colored images is done using different color spaces.
  • Image Processing finds its application in machine learning for pattern recognition.

List of topics in image processing for thesis and research

  • There are various in digital image processing for thesis and research. Here is the list of latest thesis and research topics in digital image processing:
  • Image Acquisition
  • Image Enhancement
  • Image Restoration
  • Color Image Processing
  • Wavelets and Multi Resolution Processing
  • Compression
  • Morphological Processing
  • Segmentation
  • Representation and Description
  • Object recognition
  • Knowledge Base

1. Image Acquisition:

Image Acquisition is the first and important step of the digital image of processing . Its style is very simple just like being given an image which is already in digital form and it involves preprocessing such as scaling etc. It starts with the capturing of an image by the sensor (such as a monochrome or color TV camera) and digitized. In case, the output of the camera or sensor is not in digital form then an analog-to-digital converter (ADC) digitizes it. If the image is not properly acquired, then you will not be able to achieve tasks that you want to. Customized hardware is used for advanced image acquisition techniques and methods. 3D image acquisition is one such advanced method image acquisition method. Students can go for this method for their master’s thesis and research.

2. Image Enhancement:

Image enhancement is one of the easiest and the most important areas of digital image processing. The core idea behind image enhancement is to find out information that is obscured or to highlight specific features according to the requirements of an image. Such as changing brightness & contrast etc. Basically, it involves manipulation of an image to get the desired image than original for specific applications. Many algorithms have been designed for the purpose of image enhancement in image processing to change an image’s contrast, brightness, and various other such things. Image Enhancement aims to change the human perception of the images. Image Enhancement techniques are of two types: Spatial domain and Frequency domain.

3. Image Restoration:

Image restoration involves improving the appearance of an image. In comparison to image enhancement which is subjective, image restoration is completely objective which makes the sense that restoration techniques are based on probabilistic or mathematical models of image degradation. Image restoration removes any form of a blur, noise from images to produce a clean and original image. It can be a good choice for the M.Tech thesis on image processing. The image information lost during blurring is restored through a reversal process. This process is different from the image enhancement method. Deconvolution technique is used and is performed in the frequency domain. The main defects that degrade an image are restored here.

4. Color Image Processing:

Color image processing has been proved to be of great interest because of the significant increase in the use of digital images on the Internet. It includes color modeling and processing in a digital domain etc. There are various color models which are used to specify a color using a 3D coordinate system. These models are RGB Model, CMY Model, HSI Model, YIQ Model. The color image processing is done as humans can perceive thousands of colors. There are two areas of color image processing full-color processing and pseudo color processing. In full-color processing, the image is processed in full colors while in pseudo color processing the grayscale images are converted to colored images. It is an interesting topic in image processing.

research work in digital image processing

COMMENTS

  1. Image Processing: Research Opportunities and Challenges

    Interest in digital image processing methods stems from two principal application areas: improvement of pictorial information for human interpretation; and processing of image data for storage ...

  2. Deep learning models for digital image processing: a review

    Moving to Sect. 3, an in-depth exploration unfolds concerning the diverse range of Deep Learning (DL) models specifically tailored for image preprocessing tasks. Within Sect. 4, a thorough examination ensues, outlining the array of DL methods harnessed for image segmentation tasks, unraveling their techniques and applications.

  3. Frontiers

    The field of image processing has been the subject of intensive research and development activities for several decades. This broad area encompasses topics such as image/video processing, image/video analysis, image/video communications, image/video sensing, modeling and representation, computational imaging, electronic imaging, information forensics and security, 3D imaging, medical imaging ...

  4. 267349 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DIGITAL IMAGE PROCESSING. Find methods information, sources, references or conduct a literature ...

  5. digital image processing Latest Research Papers

    Abstract Digital image processing technologies are used to extract and evaluate the cracks of heritage rock in this paper. Firstly, the image needs to go through a series of image preprocessing operations such as graying, enhancement, filtering and binaryzation to filter out a large part of the noise. Then, in order to achieve the requirements ...

  6. Overview of Research Progress of Digital Image Processing ...

    Abstract. Digital image processing technology has gone through rapid development and is extensively applied in daily life and production, with the rapid development of modern information technology. It plays an inestimable role in remote sensing, medicine, recognition and other fields. This paper briefly introduces the basic concept of digital ...

  7. Image processing

    Image processing is manipulation of an image that has been digitised and uploaded into a computer. Software programs modify the image to make it more useful, and can for example be used to enable ...

  8. Overview of Research Progress of Digital Image Processing Technology

    Digital image processing technology is a method to transform image signals into digital signals, and. then use computer processing to achieve some purpose of image modification. The rapid ...

  9. Advances in image processing using machine learning techniques

    Her research interests include digital signal processing and digital communications. Dr. Jovanovic Dolecek is a member of the Mexican Academy of Sciences, National Researcher System (SNI) Mexico, and the Life Senior member of IEEE. In 2012 she received Science and Technology Puebla State Award for her research work in electronics.

  10. Research on the Digital Image Processing Method Based on ...

    The following is an in-depth study of the Prewitt operator parallel algorithm with different processing strategies, so as to compare and analyze the application performance of CUDA parallel computing in digital image processing. First, process the image line by line. This kind of algorithm needs to complete the initialization of the data on the ...

  11. Applied Sciences

    Digital image processing and analysis are utilized at various stages in computer vision algorithms and it is therefore crucial to develop and implement efficient image processing and analysis algorithms. ... In this work, we investigated the research hypothesis that use dilated filters, rather than the extended or classical ones, and obtained ...

  12. Recent progress in digital image restoration techniques: A review

    For image restoration, there are certain digital image processing techniques that can be categorized as diffusion-based, filtering-based, transformation, features oriented, fusion-based, color-based, statistical, and bio-inspired techniques. Each of these are reviewed in this section. 2.1.1. Diffusion-based methods.

  13. Recent Trends in Image Processing and Pattern Recognition

    The 5th International Conference on Recent Trends in Image Processing and Pattern Recognition (RTIP2R) aims to attract current and/or advanced research on image processing, pattern recognition, computer vision, and machine learning. The RTIP2R will take place at the Texas A&M University—Kingsville, Texas (USA), on November 22-23, 2022, in ...

  14. Developments in Image Processing Using Deep Learning and Reinforcement

    In this work, we discuss the main and more recent improvements, applications, and developments when targeting image processing applications, and we propose future research directions in this field of constant and fast evolution. Keywords: artificial intelligence, deep learning, reinforcement learning, image processing. Go to: 1.

  15. Top 10 Digital Image Processing Project Topics

    Important Digital Image Processing Terminologies. Stereo Vision and Super Resolution. Multi-Spectral Remote Sensing and Imaging. Digital Photography and Imaging. Acoustic Imaging and Holographic Imaging. Computer Vision and Graphics. Image Manipulation and Retrieval. Quality Enrichment in Volumetric Imaging.

  16. Research Topics

    Research Topics. Biomedical Imaging. The current plethora of imaging technologies such as magnetic resonance imaging (MR), computed tomography (CT), position emission tomography (PET), optical coherence tomography (OCT), and ultrasound provide great insight into the different anatomical and functional processes of the human body. Computer Vision.

  17. Digital image processing realized by memristor-based technologies

    Today performance and operational efficiency of computer systems on digital image processing are exacerbated owing to the increased complexity of image processing. It is also difficult for image processors based on complementary metal-oxide-semiconductor (CMOS) transistors to continuously increase the integration density, causing by their underlying physical restriction and economic costs ...

  18. (PDF) A Review on Image Processing

    Abstract. Image Processing includes changing the nature of an image in order to improve its pictorial information for human interpretation, for autonomous machine perception. Digital image ...

  19. Application research of digital media image processing ...

    With the development of information technology, people access information more and more rely on the network, and more than 80% of the information in the network is replaced by multimedia technology represented by images. Therefore, the research on image processing technology is very important, but most of the research on image processing technology is focused on a certain aspect. The research ...

  20. Overview of Research Progress of Digital Image Processing Technology

    By using the simplified image algorithm, the application scope of digital image processing will gradually expand, and will develop in the direction of miniaturization, intelligence, and convenience. Digital image processing technology has gone through rapid development and is extensively applied in daily life and production, with the rapid development of modern information technology. It plays ...

  21. Digital Image Processing

    In this paper we give a tutorial overview of the field of digital image processing. Following a brief discussion of some basic concepts in this area, image processing algorithms are presented with emphasis on fundamental techniques which are broadly applicable to a number of applications. In addition to several real-world examples of such techniques, we also discuss the applicability of ...

  22. What are the current research areas for Digital Image Processing?

    Image processing is a broad and good field for research. It uses in several applications. Therefore, first choose the applications of your interest. However, at present you can go for (1) visual ...

  23. Latest thesis topics in digital image processing| Research Topics

    The history of digital image processing dates back to early 1920s when the first application of digital image processing came into news. Many students are going for this field for their m tech thesis as well as for Ph.D. thesis. There are various thesis topics in digital image processing for M.Tech, M.Phil and Ph.D. students.