A Comprehensive Review of Stereo Matching Algorithms for Depth Map Application
- Monther Yousef
- Ahmad Fauzan
- Rostam Affendi
- Mohd Saad
- Kamarul Hawari
- Nabil Jazli
- 3918-3933
- Oct 9, 2025
- Computer Science
A Comprehensive Review of Stereo Matching Algorithms for Depth Map Application
Monther Yousef1, Ahmad Fauzan2, Rostam Affendi2, Mohd Saad2, Kamarul Hawari3, Nabil Jazli4
1Faculty Technology dan Kejuruteraan Electronic dan Computer, University Technical Malaysia Melaka Durian Tunggal, 76100 Melaka, Malaysia
2Centre for Telecommunication Research and Innovation (CETRI), University Technical Malaysia Melaka, Durian Tunggal, 76100 Melaka, Malaysia
3Faculty of Electrical and Electronic Engineering Technology. University Malaysia Pahang Al-Sultan Abdullah, Lebuh Persiaran Tun Khalil Yaakob, 26300, Kuantan, Pahang, Malaysia
4IT Support Department, Amcorp Services Sdn Bhd, Petaling Jaya, 46050 Selangor, Malaysia.
*Corresponding Author
DOI: https://dx.doi.org/10.47772/IJRISS.2025.909000322
Received: 17 August 2025; Accepted: 22 August 2025; Published: 09 October 2025
ABSTRACT
Accurate depth map estimation from stereo images plays a central role in many computer vision applications, including autonomous navigation, robotic perception, and 3D surface reconstruction. Despite extensive research, traditional stereo matching methods continue to face challenges in weakly textured areas, at depth discontinuities, and within occluded regions, which reduces their reliability when applied in complex and unstructured environments. This study provides a structured review of stereo matching techniques, outlining the transition from purely local and global approaches to more advanced hybrid and segment-based frameworks. A particular emphasis is placed on the role of multi-cost matching functions, which integrate complementary descriptors to improve robustness against texture variations. In addition, edge-preserving cost aggregation strategies, such as segment-based and side-window filtering, are highlighted for their effectiveness in maintaining object boundaries while suppressing noise. To further improve performance, disparity optimization methods such as semi-global matching and adaptive refinement are examined for their ability to strike a balance between computational efficiency and accuracy. Experimental analysis using benchmark datasets demonstrates that combining multi-cost descriptors with adaptive filtering enhances both the consistency and quality of reconstructed depth maps. Overall, the findings suggest that segment-aware and texture-robust stereo matching approaches offer strong potential for enabling scalable and real-time stereo vision systems suited to practical deployment.
Keywords— Stereo Matching, Disparity Estimation, Depth Map, Cost Aggregation, Texture Robustness.
INTRODUCTION
Stereo vision is a critical enabler in 3D applications like autonomous navigation, augmented reality, and robotics, where accurate depth perception is required. Stereo matching, the technique of producing a disparity map from stereo image pairings, is important to stereo vision and serves as the foundation for reconstructing 3D scenes[1], [2]. However, this work remains difficult, especially in areas with thin texture, significant depth discontinuities, occlusions, and variable illumination conditions. [3], [4]. Traditional approaches based on fixed or flexible support windows sometimes reduce border sharpness or produce mismatches [5]. The reviewed paper provides a comprehensive review of stereo matching algorithms. It introduces a novel framework that combines multi-cost combination using pyramid fusion, hybrid random learning-based cost aggregation, and refinement techniques like left-right consistency and Side Window Filtering. This multi-stage approach is designed to increase disparity accuracy, resilience in low-textured areas, and pose estimate precision without relying on motion restrictions. By combining classical methods and recent advances, and validating performance using benchmark datasets like Middlebury and KITTI, the work contributes significantly to the development of stereo vision systems with high-accuracy disparity maps suitable for applications such as smart agriculture, 3D printing, and augmented reality [6], [7]. Stereo vision systems are widely used in computer vision to estimate depth by simulating the human binocular vision mechanism. As illustrated in Fig. 1, the setup consists of two cameras placed at a fixed baseline distance, each capturing an image of the same scene from slightly different viewpoints.
(a)
(b)
Fig. 1 (a) Stereo-vision system principle [5], (b) The stereo matching network[8].
The disparity, defined as the difference in the horizontal position of corresponding pixels between the left and right images, is directly related to the depth of real-world objects. By using the geometric relationship between the focal length, baseline, and disparity, the depth of each point in the scene can be reconstructed. This principle provides the foundation for generating disparity maps, which are then converted into depth maps for various applications, including 3D reconstruction, object recognition, robotics, and autonomous navigation [5].
This review paper contributes by systematically analyzing and synthesizing the limitations of existing stereo matching algorithms for disparity map generation, with particular attention to the challenges posed by low-texture regions, depth discontinuities, illumination variations, and occlusions. Unlike prior surveys that provide broad overviews, this work critically evaluates state-of-the-art methods to reveal how many solutions remain limited in preserving edge details, handling radiometric inconsistencies, or balancing accuracy across different problematic regions. By consolidating these insights, the paper identifies key research gaps where current approaches either underperform or trade off one performance aspect for another, such as texture robustness versus boundary accuracy. Furthermore, the review emphasizes the need for algorithms that integrate multi-cost representations with adaptive refinement strategies to achieve reliable and edge-preserving depth estimation. In doing so, the paper provides a clear roadmap for researchers to develop more effective stereo matching frameworks that directly address unresolved challenges, thereby advancing the reliability of disparity map estimation for 3D pose estimation and related computer vision applications.
Section II introduces the taxonomy of stereo matching methods, followed by Section III on matching cost and aggregation. Section IV discusses disparity optimization, while Section V covers refinement and filtering techniques. Section VI examines segment-based and multi-cost approaches, and Section VII reviews related works. Finally, Section VIII concludes the paper with key insights and future directions.
Taxonomy of stereo matching
Stereo matching algorithms are often organised into a multi-stage pipeline that calculates disparity maps by identifying correspondences between left and right stereo picture pairings. Hamzah and Ibrahim (2016) presented a taxonomy that divides the stereo matching process into four basic stages: matching cost computation (MCC), cost aggregation (CA), disparity selection (DS), and disparity refining (DR). This methodology has been widely used in stereo vision research, providing a systematic way to develop and evaluate depth estimation algorithms. Each stage is crucial for decreasing mistakes and enhancing resilience, especially in images with weak texture or complicated geometry, as shown in Figure 3.
Fig. 2 A framework for the steps of stereo matching algorithms [7].
The first phase, matching cost computation, calculates similarity scores between pixels in the left and right pictures. This can be accomplished via pixel-based approaches such as Absolute Difference (AD), Squared Difference (SD), or Truncated Absolute Difference (TAD), which are susceptible to noise and textureless areas [4]. To overcome these constraints, area-based approaches such as the Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), and Normalised Cross-Correlation (NCC) employ local windows to increase resilience [9], [10]. After computing the initial costs, the aggregation stage merges these values throughout a support region, which can be fixed, adaptive, or edge-aware windows. This stage is critical for reducing mismatches caused by noise, occlusions, and brightness fluctuations. Following cost aggregation, the disparity selection step selects the best match based on optimisation algorithms. Local approaches frequently employ the Winner-Take-All (WTA) strategy, which chooses the disparity with the lowest cost at each pixel individually. In contrast, global approaches treat disparity estimates as an energy minimization issue, using techniques such as Dynamic Programming (DP), Graph Cuts, or Belief Propagation [11], [12]. The third stage, disparity refinement, resolves any residual problems caused by occlusions, streaking artefacts, or mismatches using methods such as sub-pixel interpolation, filtering (e.g., Hybrid Median Filter), and consistency checks [13]. These four phases provide the essential framework of most stereo matching systems and lay the groundwork for sophisticated algorithmic advances, such as the one suggested in the reviewed research. To better understand the development of stereo matching frameworks, various taxonomies have been proposed, each organizing the stereo matching pipeline in different ways. These taxonomies differ in how they define the key stages of disparity estimation, the level of detail they provide, and their adaptability to modern techniques such as deep learning. A comparative overview of these taxonomies is presented in Table 1, highlighting their main classification stages, contributions, and limitations.
Table 1. Comparative Analysis of Stereo Matching Taxonomies
Taxonomy | Main Stages | Strengths(+)
Limitations(-) |
---|---|---|
Early Vision-Based | 1. Feature detection
2. Correspondence 3. Depth recovery |
(+) Conceptual foundation for stereo vision
(–) Too abstract, no clear pipeline |
Classical Four-Stage | 1. MCC
2. CA 3. Disparity Optimization 4. Refinement |
(+) Standard, widely used for benchmarking
(–) Cannot handle modern learning methods |
Modified Four-Stage | 1. MCC
2. CA 3. Disparity Selection 4. Refinement |
(+) Clearer separation of local/global methods
(–) Still limited to traditional approaches |
Deep Learning-Oriented | 1. Feature Extraction
2. Cost Volume 3. Aggregation 4. Output |
(+) End-to-end CNN stereo matching
(–) Narrow focus, less generalizable |
Hybrid | 1. Traditional pipeline
2. Deep Learning modules |
(+) Combines classical and modern strategies
(–) Still evolving, no unified framework |
Table 1 shows the evolution of stereo matching taxonomies from abstract vision-based models to structured pipelines and modern hybrid approaches. Early schemes lacked clear computational steps, while the classical and modified four-stage taxonomies standardized the process but remained limited to traditional methods. Deep learning taxonomies introduced end-to-end architecture but were less generalizable. Hybrid models now combine classical pipelines with learning modules, offering a more flexible framework, though still without a unified standard.
Matching cost and aggregation
Matching cost computation is the first and most important stage in stereo matching since it directly impacts the accuracy of the disparity estimate. It calculates the similarity of matching pixels in a stereo picture pair. Traditional pixel-based approaches like Absolute Difference (AD), Squared Difference (SD), and Truncated Absolute Difference (TAD) are popular due to their ease of use and quickness [4]. However, these approaches are susceptible to noise, repeating textures, and lighting changes, making them less successful in real-world scenarios. Area-based techniques have been devised to address these limitations. These strategies broaden the comparison zone to include a neighbourhood of pixels, increasing matching reliability. Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), and Normalised Cross-Correlation (NCC) are three common area-based approaches, each having different trade-offs in terms of speed, complexity, and robustness [9], [10]. SAD is computationally economical and ideal for real-time applications, whereas NCC performs better under changing illumination circumstances but is more complicated. Figure 4 shows the matching cost function of a pixel on the left image \(I_{l}\) with a pixel on the right image \(I_{r}\). \(I_{l}\) and \(I_{r}\) pixels represent the matching pixel intensities on the left and right image planes for the same scene point, respectively. The x and y indicate the coordinate positions of a pixel. At the same time, the d represents the position increment of a pixel.
Fig. 3 Overview of pixel-based matching algorithm.
The Equation (1) of the Absolute Difference (AD) shows that the differences of intensity values) between reference pixels and candidate pixels is aggregated [4].
\(AD(x,y,d) = \left| I_{l}(x,y) – I_{r}(x + d,y) \right|\) (1)
While Equation (2) of the Squared Difference (SD) shows that the differences between reference pixels and candidate pixels are squared [14].
\(SD(x,y,d) = \left| I_{l}(x,y) – I_{r}(x + d,y) \right|^{2}\) (2)
Equation (3) of the Truncated Absolute Difference (TAD) is mainly to minimize the influence of outliers[15]
\(TAD(x,y,d) = \min\left| I_{l}(x,y) – I_{r}(x + d,y) \right|,r\) (3)
Once the matching cost has been calculated, cost aggregation eliminates ambiguities and strengthens consistency. This procedure entails adding or averaging the matching costs throughout a support region, which is commonly specified by a window. Fixed windows are simple but frequently fail at depth discontinuities and low-textured regions, resulting in fuzzy or incorrect disparity. Adaptive windows increase performance by altering size or shape based on local picture attributes, improving edge preservation and matching precision. [16], [17]. More complex approaches, such as bilateral or directed filtering, retain edges while aggregating costs, but may increase computing load. The reviewed research proposes a unique hybrid random learning (HRL) aggregation approach that uses dynamic aggregation behaviour to reduce mistakes more efficiently. This invention seeks to enhance disparity estimates in historically challenging areas, such as occluded or texture less surfaces, distinguishing it from previous techniques.
The Sum of Absolute Differences (SAD) is the most straightforward metric algorithm for computing the similarity between two stereo pair images. The SAD works by subtracting the pixels with a square neighbour pixel between the reference and candidate images. The Sum of Absolute Differences algorithm considers the absolute difference between the intensity of each pixel in the reference windows, where the differences are significantly summed up over the aggregated window to produce a simple metric of block similarity. The method can function in real-time implementation with particularly less time-consuming, as demonstrated by [18], who computed and evaluated the Sum of Absolute Differences (SAD) performance for real-time pose images. Thus, various applications apply the SAD algorithm to obtain the best match. The aggregation of square windows is used on the absolute differences, followed by the optimization stage, applying the Winner-Take-All (WTA) method for disparity selection, as shown in Figure 5, where Il indicates a pixel on the left image and \(I\)R represents a pixel on the right image. The d represents the position increment of a pixel, while (x, y) are the coordinate parameters of both images. However, the SAD method is limited, where the critical matches are applied only for the reference image. In contrast, other points of stereo pairs are possibly matched with multiple points. The method also does not perform well for images with high texture [10][19].
\(\left| I_{l}(x,y) – I_{r}(x + d,y) \right|\) (4)
Fig. 4 Winner Takes All (WTA) for disparity selection [20].
The Sum of Squared Differences (SSD) algorithm performs the summation, particularly over the squared difference of pixel intensity values between two corresponding pixels. It then aggregates them within a square window and optimizes the result using the Winner-Take-All (WTA) method. Recent advancements in SSD-based matching cost computation have continued to emphasize its accuracy despite computational expense [18]. The SSD similarity measure involves higher computational complexity than the Sum of Absolute Differences (SAD) due to the multiplication operations involved.
Disparity optimization approaches
After aggregating the matching costs, the next stage in stereo matching is disparity selection or optimisation, which determines the optimal disparity value for each pixel. Local and global optimisation are the two basic methodologies that dominate this stage. Local techniques, such as the Winner-Take-All (WTA) strategy, choose the disparity with the lowest cost for each pixel individually. While this approach is computationally rapid and appropriate for real-time applications, it frequently fails in occluded or low-textured regions, where confusing matches might result in inaccurate disparities [7], [21]. These challenges demonstrate the trade-off between speed and resilience in local optimisation methods.
In [22] Global optimization methods, on the other hand, formulate the disparity estimation as an energy minimization problem. This approach seeks to minimize a global cost function, typically expressed as the sum of a data term and a smoothness term expressed as,
\[E(d) = E < \text{ sub } > \text{ data } < /\text{ sub } > (d) + E < \text{ sub } > \text{ smooth } < /\text{ sub } > (d)\] | (5) |
---|
These approaches look for a disparity function d that minimizes the global energy over disparity computation, such as pixel-based matching cost, by selecting the proper surface in the disparity space image. Based on global formulation, the stereo matching algorithm targets to determine the optimal energy disparity assignments as given by (6) [22].
\(E(d)\ = \ E_{data(d)} + \ E_{smooth(d)}\ \) (6)
Several techniques have been developed to address this optimisation problem, including Dynamic Programming (DP), Graph Cuts (GC), and Belief Propagation. DP is especially popular because of its simplicity and low memory needs, and it has been used for scanline optimisation and hierarchical structures [12]. On the other hand, traditional DP is restricted by the streaking effect created by inter-scanline independence, making it less successful at preserving vertical constancy. The research investigates various DP variations to overcome these restrictions, including 1D optimisation, 2D scanline optimisation, and dynamic programming on tree topologies. Among these, dynamic programming on trees improves accuracy by enabling disparity information to propagate non-linearly throughout the whole picture [23], [24]. This approach promotes vertical and horizontal uniformity, making it more useful in scenarios with complicated depth fluctuation. DP is combined with a winner-update technique in the suggested framework to improve the disparity space image (DSI) while minimizing computation. This combination strikes a compromise between optimisation quality and performance, allowing the proposed approach to manage occlusions and depth discontinuities more consistently than traditional local or global techniques. Table 2 provides a comparative overview of commonly used similarity measures in stereo matching.
Table 2. Summary of advantages and disadvantages of similarity measures [9], [18].
Method | Advantages | Disadvantages |
Sum of Absolute Differences (SAD) | High speed Reasonable quality Low algorithmic complexity |
Sensitive to outliers Poor performance in high-texture areas |
Sum of Squared Differences (SSD) | Fast and simple Effective in grayscale applications |
Sensitive to outliers Lower accuracy in complex scenes |
Normalized Cross Correlation (NCC) | Robust under illumination changes Widely used in object recognition |
Computationally expensive Complex operations |
Table 3 summarizes key past research studies related to stereo matching algorithms, highlighting the similarity measures, optimization techniques, and main contributions of each work.
Table 3. A summary of past research studies cited in this thesis is based on stereo matching algorithms
Year | Similarity Measuring Method | Optimizing method | Study focuses and remarks | Ref. |
2009 | Multiple cost aggregation (L1 Distance + cost function) | WTA | Focused on the performance of different cost aggregation methods. However, the output result of the disparity map suffers from noise and low accuracy. | [25] |
2013 | Multi-view estimation based (DP) | DP | A novel cost function that focused on discontinuity and occlusion problems. The disparity maps obtained from this method contain horizontal streaks. | [26] |
2016 | Distinct similarity measure (DSM) | WTA | Focused on solving the mismatching problems. However, it works well only for feature selection under the ambiguity point. | [27] |
2017 | New feature based on DP | DP | Applied color stereo matching in order to increase accuracy. While the obtained disparity maps contain horizontal streaks due to the inter-scanline process | [12] |
2011 | SAD | DP | Focused on using the available information during DP more efficiently. However, the obtained disparity maps contain horizontal streaks. | [28] |
2016 | Joint bilateral stereo | WTA | A cost aggregation strategy based on joint bilateral filtering to gain accurate inference of disparity maps. However, the bilateral filter has a high processing time. | [29] |
2020 | AD | DP | Highlighting the limitations and strengths of performance obtained with GPU and FPGA in real-time stereo vision systems. However, a single match cost gives an error in low-texture regions. | [30] |
2013 | SAD | WTA | Obtaining a real-time disparity map by calculating the matching error based on the rank transform. However, the window base causes fattening effects on the obtained disparity maps | [31] |
2022 | SSD | DP | Obtaining a disparity map based on suboptimal cost with dynamic ID programming. Meanwhile, the disparity maps that were obtained contain horizontal streaks. | [32] |
2012 | SAD | WTA | Binary stereo matches for higher computational efficiency. However, the method is only applicable for global optimization. | [33] |
2019 | Laplacian Gaussian pyramid | DP | A new dynamic programming approach to improve the matching quality. However, the method produced fattening effects on the obtained disparity map. | [34] |
2014 | SAD | – | New interpolation and reverse mapping to estimate the disparity map. At the same time, the method suffers from low accuracy of the disparity map. | [35] |
2015 | Block matching-based DP | DP | Stereo matching method to estimate the disparity map for satellite stereo images. However, the process has low accuracy in the generated disparity map. | [36] |
2017 | SAD | BP | Using the census transform and the sum of absolute differences algorithm to achieve high matching accuracy. However, the method is complex due to the hardware architecture. | [37] |
2017 | SAD | Bilateral Filter (BF) | Applying the SAD algorithm and bilateral filter to produce an accurate disparity map for the textured regions. | [38] |
Refinement and filtering methods
The next step in the stereo matching process is disparity refining, critical for enhancing the disparity map’s accuracy and visual quality. Despite efficient cost computation and optimisation, the final disparity map frequently contains noise, mismatches, and artefacts, especially around occlusion zones, object borders, and low-texture areas. To overcome these concerns, refinement approaches are used to smooth the disparity map, save crucial structural information, and remove false disparities [39]. Sub-pixel interpolation is one of the oldest and most often used refining techniques, improving disparity resolution by predicting intermediate values between discrete disparity levels. This increases the precision of depth information, particularly in high-gradient locations, and is commonly accomplished with a parabola fitting model [40]. Filtering methods are commonly employed in refining because they may reduce noise and increase disparity smoothness. The Median Filter is a nonlinear filter that replaces the value of each pixel with the median of its neighbours. It works well for correcting isolated mismatches, but it can cause blunted edges and loss of detail in thin structures [41]. To address this shortcoming, the Hybrid Median Filter (HMF) was developed, which combines several directional medians to retain corners and edge transitions [42]. This approach is practical in disparity tuning, where maintaining object boundaries is critical. The HMF technique is the principal filtering solution in the reviewed work, allowing it to minimise noise while preserving essential visual elements.
In addition to filtering, [43] uses advanced consistency checking approaches to enhance the disparity map. Mismatches are detected and corrected using Left-Right Consistency Check (LRC) and Confidence Disparity Filling (CDF), which compare discrepancies in both directions. Pixels that fail consistency tests are marked as unreliable and restored using neighbourhood-based filling techniques. [43] Furthermore, using K-means clustering with the Side Window Filter (SWF) improves visibility at object edges while lowering noise in low-textured regions. The SWF, in particular, avoids over-smoothing at borders by applying filters to one side of edges, allowing for crisp transitions. These refining strategies contribute to a more accurate and artifact-free disparity map, improving the system’s performance in real-world depth estimation scenarios.
More recently, deep learning–based refinement has gained attention, where convolutional neural networks are trained to correct disparity errors and enhance detail, outperforming traditional hand-crafted filters in challenging regions. These approaches often incorporate uncertainty estimation, allowing unreliable disparities to be identified and refined adaptively. Together, classical and learning-based refinement strategies highlight the ongoing progression toward more accurate and artifact-free disparity maps, enabling stereo vision systems to achieve reliable depth estimation in real-world scenarios.
Segment-based and multi-cost techniques
Traditional stereo matching algorithms frequently suffer in low-textured areas, depth discontinuities, and light variations. Researchers have developed segment-based techniques to overcome these restrictions, such as dividing the picture into sections based on colour or intensity and giving disparity values to each region rather than individual pixels. This strategy improves resilience by requiring local smoothness and minimising ambiguity in flat or repeated areas [44]. Mean-shift segmentation, paired with region-based cost functions and disparity plane fitting, improves edge retention and structural consistency in depth maps. However, the intricacy of segment-to-disparity mapping may restrict these approaches, and artefacts may occur if segments are not adequately matched with depth discontinuities.
The reviewed study proposes a unique hybrid stereo matching framework that improves the performance of segment-based approaches by integrating them with a multi-cost pyramid fusion mechanism. Rather than depending on a single cost metric, this approach combines many cost functions, including gradient-based and intensity-based measures across a multi-resolution pyramid. This method provides more precise matching in highly textured and texture-less areas, lowering the likelihood of local minima and mismatches. The pyramid captures global structural signals at coarse levels, whereas finer levels clarify local inequalities, particularly around edges and occlusions. The result is a more robust and exact cost volume that more accurately depicts depth fluctuations across various sections of the picture [45].
To improve the accuracy of disparity assignment, the article incorporates numerous innovative components: Hybrid Random Learning (HRL) is used during the cost aggregation stage to refine costs based on neighbourhood structure intelligently; K-means clustering segments the disparity space for better region classification; and the Side Window Filter (SWF) reduces boundary blurring by applying directional filtering only on one side of edges. Furthermore, Left-Right Consistency (LRC) and Confidence Disparity Filling (CDF) guarantee that disparity values are consistent across the two viewpoints, fixing mismatches and filling in doubtful regions. These combined approaches provide a significant addition to the stereo vision field by providing a solution that combines the benefits of segment-based reasoning, adaptive filtering, and multi-cost fusion to create high-quality disparity maps, especially in challenging circumstances, compared to conventional algorithms. To better understand the strengths and limitations of existing segmentation strategies, Table 4 summarizes several widely used methods along with their main processes, advantages, and disadvantages. Each technique offers unique contributions, ranging from robustness to noise and automation to high accuracy in specialized domains, but also faces challenges such as computational cost, sensitivity to parameters, or limited generalization. This comparison highlights the trade-offs among different segmentation approaches and provides context for selecting suitable methods in stereo vision applications.
The segmentation methods summarized in Table 4 demonstrate distinct trade-offs in performance. Region-based methods generally achieve higher robustness to noise and intensity variations, particularly in medical imaging tasks, but require extensive preprocessing and are domain-specific. Threshold-based approaches provide fast and fully automated processing, making them suitable for real-time applications; however, their sensitivity to histogram distributions can cause over-segmentation in complex scenes. Fuzzy clustering methods, such as FGFCM, perform well on high-resolution and noisy data, but their reliance on parameter tuning limits adaptability to diverse environments. Neural network-based methods achieve the highest accuracy and generalization across different data types, yet they incur high computational cost and risk of overfitting without sufficient training data. Edge-based approaches are advantageous for unsupervised evaluation and noise resilience, though their reliance on edge detector quality may misclassify fine detail
Table 4. Comparison of common segmentation methods
Segmentation methods | Main function process | Advantages | Disadvantages | Ref. |
Region-based | Combines region-based and histogram methods for brain MRI tissue classification. | – Auto seed selection. – Robust to noise and bias. – Outperforms SPM/FSL in noise. – Uses spatial + intensity features. |
– Complex preprocessing. – T1-weighted MRI only. – Lower GM accuracy. |
[46] |
Thresholds | Uses multimodal PSO to detect histogram peaks/valleys for auto image thresholding. | – Fully automatic. – Faster than Otsu/Kapur. – No objective function needed. – Works on noisy images. |
– Sensitive to histogram shape – Grayscale only. – May over-segment closely spaced peaks. |
[47] |
Fuzzy | Segment remote sensing images using FGFCM with TCR to auto-optimize clusters. | – Fast for high-res images. – Noise resistant. – Auto cluster selection. – Works with indices like NDVI. |
– Struggles with ultra-high-res data – Needs parameter tuning. – Limited non-urban testing. |
[48] |
Neural network | Classifies network traffic using a BiRNN with attention on byte sequences. | – No protocol knowledge needed. – Works for all protocol types. – High accuracy (95.82% F1). – Supports multi-class & novel protocols |
– High computation cost. – Risk of overfitting. – Needs preprocessing. |
[49] |
Edge-based | Evaluates image segmentation quality using edge alignment, without needing ground truth. | – No reference needed (unsupervised). – Robust to noise and texture. – Detects over/under-segmentation. – Works for full images and single regions. |
– Depends on edge detector quality. – Needs parameter tuning. – May misjudge small regions. |
[50] |
Recent advances, particularly in instance segmentation (e.g., Mask R-CNN and Transformer-based models), show improved adaptability by integrating semantic understanding with pixel- and region-level accuracy. Including such methods in future evaluations would provide a more comprehensive benchmark and demonstrate performance across broader application domains.
Review of related works
Stereo matching has advanced significantly in recent years thanks to traditional algorithms and learning-based solutions [51], [52]. Researchers have proposed several strategies for improving depth accuracy, eliminating disparity mismatches, and increasing performance in difficult settings such as occluded or low-texture regions. This section focuses on numerous major works contributing to stereo vision from several angles, such as algorithm design, dataset generation, and application-specific implementation. Each study is assessed using its methodological approach, assessment dataset, important contributions, limits, and practical significance. The comparative insights gained from this research serve as a contextual framework for assessing the novelty and effect of the technique provided in this study. Research was done to investigate several advanced stereo matching techniques utilising publicly accessible datasets and benchmarks [53], [54]. The study focuses on stereopsis’ fundamental concepts, such as the perspective camera model and epipolar geometry, and presents methods for assessing disparity confidence and matching precision. The study’s analysis of disparity estimating obstacles in actual settings sheds light on the limits of present approaches, including the absence of ground truth disparity, difficulty with domain adaptation, and concerns that arise in occluded regions. The techniques covered here are relevant in various applications, including mobile computing, autonomous vehicles, and UAV systems, where real-time and adaptive depth perception are essential. In [55] an enhanced stereo matching method has been suggested, which uses a joint similarity measure and an adaptive weighting approach. The work aims to use stereo pictures and CT data from Imperial College London’s heart models for medical imaging applications, namely 3D reconstruction of soft heart tissue. The approach significantly eliminates mismatches by improving correspondence accuracy in complicated anatomical systems. However, the algorithm’s processing cost restricts its suitability for real-time implementation. Despite this restriction, the technique represents a significant step forward in precise depth estimation in clinical applications, indicating potential for future optimisation and real-world adaptation. The authors in [56] and [57] presented a convolutional network architecture for disparity and scene flow estimates, based on the well-known FlowNet framework. Their solution includes a combined training mechanism that learns flow and disparity simultaneously, dramatically improving picture perception in motion-rich contexts. Their work significantly contributes by creating and using large-scale synthetic datasets, notably FlyingThings3D, Monkaa, and Driving, which provide high-quality ground truth, different situations, and greater realism, all of which are critical for training strong deep learning models. However, the paper admits difficulties in generalisation across datasets; models fine-tuned on certain benchmarks, such as KITTI, may underperform when applied to others. Despite this, the research provides the framework for stereo matching in sophisticated driver assistance systems and self-driving cars, where precise depth and motion estimation are required for real-time reconstruction and navigation tasks. The authors in [58] introduced a deep learning framework for 3D human posture estimation that combines a modified dynamic graph convolutional neural network (DGCNN) with the PointNet architecture. The approach transforms 2.5D depth photos into 3D point clouds, which are then used to forecast human joint locations with better spatial knowledge. When evaluated on the ITOP dataset, the suggested methodology outperformed existing approaches for predicting body key points. However, the system has difficulty detecting quick or occluded limb motions, particularly in reliably finding the hands and elbows, where depth data is frequently sparse or noisy. Despite these limitations, the model has a high potential for use in rehabilitation, sports tracking, and motion analysis in exercise or training settings. The authors in [59] contributed to the stereo image super-resolution (SR) area by presenting the Flickr1024 dataset a large-scale collection of 1,024 high-resolution stereo picture pairings. This dataset is intended to help with the training and assessment of SR algorithms by providing a variety of real-world scenarios. Flickr1024 tackles overfitting and improves the SR model’s generalisation compared to previous datasets like KITTI and Middlebury. Although the dataset shows substantial advances in variety and quality, its early evaluation lacked depth, and it omitted stereo pictures without vertical calibration, limiting its application potential. Nonetheless, this contribution is especially important for furthering stereo image SR in mobile photography and image improvement fields where resolution and stereo consistency are vital [60].
To synthesise the findings of these investigations, a detailed comparison table has been created to summarise their essential elements. Table 5 contains information on the authors and year, the type of stereo matching method employed, the dataset or application domain, strengths, limitations, and special contributions to disparity estimation. By arranging the assessed works side by side, the table emphasises trends in algorithm design, the evolution of benchmark datasets, and common issues across diverse techniques. It also helps to contextualise the reason for the suggested technique in this thesis, which seeks to bridge the stated gaps using a hybrid, multi-cost, and segment-aware stereo matching strategy.
Table 5. Comparison of Reviewed Stereo Matching Studies.
Ref. | Design Type | Dataset / Domain | Strengths | Limitations | Contribution to Disparity Estimation |
[53] | Disparity evaluation methods & taxonomy | General (mobile, UAV, robotics) | Covers fundamental models and limitations | Limited ground truth; occlusion issues | Identifies gaps in confidence and robustness |
[55] | Joint similarity + adaptive weighting | CT heart model dataset (medical) | High correspondence accuracy in anatomy | High computational cost | Enhances matching via joint similarity metrics |
[57] | CNN-based disparity & flow estimation | Synthetic datasets (FlyingThings3D, etc.) | Joint training improves scene understanding | Generalization is limited across datasets | Provides training data and scene flow estimation |
[58] | DGCNN + PointNet for pose estimation | ITOP (3D human pose) | Good accuracy on human key points | Inaccurate on fast or occluded limbs | Links depth maps to 3D human posture |
[59] | Stereo image super-resolution dataset | Flickr1024 (Stereo SR) | A large variety improves generalization | No vertical calibration; limited evaluation | Supports SR training with realistic stereo pairs |
CONCLUSION AND INSIGHTS
This review examined a wide range of stereo matching algorithms, tracing their development from classical pixel-based approaches to more advanced deep learning and hybrid solutions. The stereo vision pipeline encompassing matching cost computation, cost aggregation, disparity optimisation, and refinement has been extensively studied across the literature. Prior studies have explored strategies such as segment-based methods and multi-cost fusion, which remain central to many approaches. However, persistent challenges were observed, including handling occlusions, improving disparity estimation in low-texture regions, and preserving depth boundaries in complex scenes. Despite considerable progress, significant limitations remain. Many methods face high computational costs, making real-time deployment difficult. Others show limited transferability across datasets and domains, leading to reduced robustness in unconstrained environments. Additionally, there is a lack of comprehensive benchmarking for consistency checks and filtering strategies, which constrains comparative analysis across techniques. Future research should address these gaps by focusing on real-time implementation, particularly through hardware accelerators and lightweight architectures. Moreover, advances in multi-scale pyramid fusion and hybrid random learning could offer promising directions for improving robustness and generalisation. Further attention should also be given to domain adaptation, energy-efficient inference, and handling dynamic or unstructured environments, which remain underexplored in current literature. Together, these directions highlight the key opportunities for advancing stereo matching in next-generation computer vision systems.
ACKNOWLEDGMENT
The authors express their gratitude to the Ministry of Higher Education (MoHE) Malaysia and University Technical Malaysia Melaka (UTeM) for providing the funding necessary to complete this study through the Fundamental Research Grants Scheme No: FRGS/1/2024/ICT09/UTEM/03/1/F00561 and UTeM Kesidang Scholarship.
REFERENCES
- H. Ma et al., “TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation,” 32nd Br. Mach. Vis. Conf. BMVC 2021, pp. 1–16, 2021.
- H. Sang, Q. Wang, and Y. Zhao, “Multi-Scale Context Attention Network for Stereo Matching,” IEEE Access, vol. 7, pp. 15152–15161, 2019, doi: 10.1109/ACCESS.2019.2895271.
- J. Huang, Z. Zhu, F. Guo, and G. Huang, “The devil is in the details: Delving into unbiased data processing for human pose estimation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 5699–5708, 2020, doi: 10.1109/CVPR42600.2020.00574.
- R. A. Hamzah, M. G. Y. Wei, and N. S. N. Anwar, “Development of stereo matching algorithm based on sum of absolute RGB color differences and gradient matching,” Int. J. Electr. Comput. Eng., vol. 10, no. 3, pp. 2375–2382, 2020, doi: 10.11591/ijece.v10i3.pp2375-2382.
- M. ABOALI, N. ABD MANAP, R. A. HAMZAH, and A. M. DARSONO, “a New Post-Processing Method for Stereo Matching Algorithm,” Seybold Rep. J., no. May, 2021, [Online]. Available: https://zenodo.org/records/6553698
- Y. Zhong, T. Jia, K. Xi, W. Li, and D. Chen, “Dual-stream stereo network for depth estimation,” Vis. Comput., vol. 39, no. 11, pp. 5343–5357, 2023, doi: 10.1007/s00371-022-02663-3.
- R. A. Hamzah and H. Ibrahim, “Literature survey on stereo vision disparity map algorithms,” J. Sensors, vol. 2016, 2016, doi: 10.1155/2016/8742920.
- B. Zhang, X. Wu, S. Liu, and W. Wan, “A Fast Stereo Matching Network Based on Multi-Scale Feature Fusion,” in 2024 IEEE 7th International Conference on Electronic Information and Communication Technology (ICEICT), IEEE, Jul. 2024, pp. 491–496. doi: 10.1109/ICEICT61637.2024.10670969.
- L. Yu, Y. Wang, Y. Wu, and Y. Jia, “Deep stereo matching with explicit cost aggregation sub-architecture,” 32nd AAAI Conf. Artif. Intell. AAAI 2018, pp. 7517–7524, 2018, doi: 10.1609/aaai.v32i1.12267.
- M. Yao, W. Ouyang, and B. Xu, “Hybrid cost aggregation for dense stereo matching,” Multimed. Tools Appl., vol. 79, no. 31–32, pp. 23189–23202, 2020, doi: 10.1007/s11042-020-09127-7.
- M. G. Mozerov and J. Van De Weijer, “Accurate Stereo Matching by Two-Step Energy Minimization,” no. January, 2015, doi: 10.1109/TIP.2015.2395820.
- S. Cheng, F. Da, J. Yu, Y. Huang, and S. Gai, “A cross-scale constrained dynamic programming algorithm for stereo matching,” Fifth Int. Conf. Opt. Photonics Eng., vol. 10449, p. 1044923, 2017, doi: 10.1117/12.2270830.
- J. Heng, Z. Xu, Y. Zheng, and Y. Liu, “Disparity refinement using merged super-pixels for stereo matching,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10666 LNCS, 2017, pp. 295–305. doi: 10.1007/978-3-319-71607-7_26.
- Y. Lin, Y. Gao, and Y. Wang, “An Improved Sum of Squared Difference Algorithm for Automated Distance Measurement,” Front. Phys., vol. 9, no. August, pp. 1–7, 2021, doi: 10.3389/fphy.2021.737336.
- R. A. Hamzah and H. Ibrahim, “Improvement of stereo matching algorithm based on sum of gradient magnitude differences and semi-global method with refinement step,” Electron. Lett., vol. 54, no. 14, pp. 876–878, 2018, doi: 10.1049/el.2017.3956.
- F. Chen, J. Zhang, M. Zheng, J. Wu, and N. Ling, “Long-term rate control for concurrent multipath real-time video transmission in heterogeneous wireless networks,” J. Vis. Commun. Image Represent., vol. 77, no. March 2020, p. 102999, 2021, doi: 10.1016/j.jvcir.2020.102999.
- K. Zhang, W. Zuo, S. Member, and L. Zhang, “FFDNet : Toward a Fast and Flexible Solution for CNN based Image Denoising,” pp. 1–15.
- C. Zhu and Y. Z. Chang, “Simplified High-Performance Cost Aggregation for Stereo Matching,” Appl. Sci., vol. 13, no. 3, 2023, doi: 10.3390/app13031791.
- J. Agustín Tortolero Osuna and A. Jorge Rosales Silva, “Parallel Peer Group Filter for Impulse Denoising in Digital Images on Gpu,” Comput. Informatics, vol. 38, pp. 1320–1340, 2019, doi: 10.31577/cai.
- H. Guan, W. Xiaoye, and R. Wang, “E cient stereo matching using the weighted polygon window and the HSV color space,” pp. 0–8, 2023.
- L. Li, J. Wang, S. Yang, and H. Gong, “Binocular stereo vision based illuminance measurement used for intelligent lighting with LED,” Optik (Stuttg)., vol. 237, no. March, p. 166651, 2021, doi: 10.1016/j.ijleo.2021.166651.
- M. G. Mozerov and J. van de Weijer, “Accurate Stereo Matching by Two-Step Energy Minimization,” IEEE Trans. Image Process., vol. 24, no. 3, pp. 1153–1163, Mar. 2015, doi: 10.1109/TIP.2015.2395820.
- H. Proenca, J. C. Neves, and G. Santos, “Segmenting the periocular region using a hierarchical graphical model fed by texture / shape information and geometrical constraints,” in IEEE International Joint Conference on Biometrics, IEEE, Sep. 2014, pp. 1–7. doi: 10.1109/BTAS.2014.6996228.
- J. Witt and U. Weltin, “Sparse stereo by edge-based search using dynamic programming,” Proc. – Int. Conf. Pattern Recognit., no. November 2012, pp. 3631–3635, 2012.
- Q. Yang, L. Wang, R. Yang, H. Stewénius, and D. Nistér, “Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 3, pp. 492–504, 2009, doi: 10.1109/TPAMI.2008.99.
- B. Salehian, A. A. Raie, A. M. Fotouhi, and M. Norouzi, “Efficient interscanline consistency enforcing method for dynamic programming-based dense stereo matching algorithms,” J. Electron. Imaging, vol. 22, no. 4, p. 043028, 2013, doi: 10.1117/1.jei.22.4.043028.
- S. DeForte and V. N. Uversky, “Resolving the ambiguity: Making sense of intrinsic disorder when PDB structures disagree,” Protein Sci., vol. 25, no. 3, pp. 676–688, 2016, doi: 10.1002/pro.2864.
- A. P.R. and G. V.K., “Stereo Correspondence Using Census Based Dynamic Programming and Segmentation,” in Communications in Computer and Information Science, vol. 157 CCIS, 2011, pp. 631–638. doi: 10.1007/978-3-642-22786-8_79.
- L. Fu, G. Peng, and W. Song, “Histogram‐based cost aggregation strategy with joint bilateral filtering for stereo matching,” IET Comput. Vis., vol. 10, no. 3, pp. 173–181, Apr. 2016, doi: 10.1049/iet-cvi.2014.0411.
- S. Shrivastava, Z. Choudhury, S. Khandelwal, and S. Purini, “FPGA Accelerator for Stereo Vision using Semi-Global Matching through Dependency Relaxation,” Proc. – 30th Int. Conf. Field-Programmable Log. Appl. FPL 2020, no. 1, pp. 304–309, 2020, doi: 10.1109/FPL50879.2020.00057.
- W. Zhang, K. Hao, Q. Zhang, and H. Li, “A Novel Stereo Matching Method based on Rank Transformation,” vol. 10, no. 2, pp. 39–44, 2013.
- L. Duguet, J. Calve, C. Cauchois, and P. Weiss, “a Fast Dejittering Approach for Line Scanning Microscopy,” Proc. – Int. Conf. Image Process. ICIP, pp. 3441–3445, 2022, doi: 10.1109/ICIP46576.2022.9897890.
- K. Zhang, J. Li, Y. Li, W. Hu, L. Sun, and S. Yang, “Binary stereo matching,” Proc. – Int. Conf. Pattern Recognit., pp. 356–359, 2012.
- Z. Zhao, Y. Piao, and C. Liu, “A study of disparity map based on improved dynamic programming algorithm,” 2019 IEEE/CIC Int. Conf. Commun. China, ICCC 2019, pp. 567–571, 2019, doi: 10.1109/ICCChina.2019.8855829.
- S. Zhu, Z. Li, and Y. Yu, “Virtual view synthesis using stereo vision based on the sum of absolute difference,” Comput. Electr. Eng., vol. 40, no. 8, pp. 236–246, 2014, doi: 10.1016/j.compeleceng.2014.03.015.
- A. Qayyum, A. S. Malik, M. N. B. M. Saad, F. Abdullah, and M. Iqbal, “Disparity map estimation based on optimization algorithms using satellite stereo imagery,” in 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), IEEE, Oct. 2015, pp. 127–132. doi: 10.1109/ICSIPA.2015.7412176.
- K. Bae and B. Moon, “An accurate and cost-effective stereo matching algorithm and processor for real-time embedded multimedia systems,” Multimed. Tools Appl., vol. 76, no. 17, pp. 17907–17922, Sep. 2017, doi: 10.1007/s11042-016-3248-y.
- R. A. Hamzah, S. F. A. Ghani, A. F. Kadmin, M. S. Hamid, S. Salam, and T. M. F. T. Wook, “Disparity map estimation uses block matching algorithm and bilateral filter,” 2017 Int. Conf. Inf. Technol. Syst. Innov. ICITSI 2017 – Proc., vol. 2018-Janua, pp. 151–154, 2017, doi: 10.1109/ICITSI.2017.8267934.
- M. Zahari, R. A. Hamzah, N. A. Manap, and A. I. Herman, “Stereo matching algorithm based on combined matching cost computation and edge preserving filters,” Indones. J. Electr. Eng. Comput. Sci., vol. 26, no. 3, pp. 1415–1422, 2022, doi: 10.11591/ijeecs.v26.i3.pp1415-1422.
- P. Ochs, J. Malik, and T. Brox, “Segmentation of moving objects by long term video analysis,” no. April, 2014, doi: 10.1109/TPAMI.2013.242.
- S. Park, M. Park, and K. Yoon, “Confidence-based Weighted Median Filter for Effective Disparity Map Refinement,” no. Urai, pp. 573–575, 2015.
- S. Sreejith and J. Nayak, “Study of hybrid median filter for the removal of various noises in digital image,” J. Phys. Conf. Ser., vol. 1706, no. 1, p. 012079, Dec. 2020, doi: 10.1088/1742-6596/1706/1/012079.
- Deepa and K. Jyothi, “A Robust Disparity Map Estimation for Handling Outliers in Stereo Images,” in 2021 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), IEEE, Dec. 2021, pp. 38–43. doi: 10.1109/ICEECCOT52851.2021.9708034.
- X. Qin, M. Yang, L. Zhang, T. Yang, and M. Liao, “Health Diagnosis of Major Transportation Infrastructures in Shanghai Metropolis Using High-Resolution Persistent Scatterer Interferometry,” Sensors, vol. 17, no. 12, p. 2770, Nov. 2017, doi: 10.3390/s17122770.
- S. Poddar, V. Kumar, H. Sahu, and A. Das, “Gradient and Color Intensity based Dense Disparity Estimation Using Adaptive Weight Aggregation,” in 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS), IEEE, Mar. 2022, pp. 816–822. doi: 10.1109/ICACCS54159.2022.9785264.
- S. Yazdani, R. Yusof, A. Karimian, Y. Mitsukira, and A. Hematian, “Automatic Region-Based Brain Classification of MRI-T1 Data.,” PLoS One, vol. 11, no. 4, p. e0151326, Apr. 2016, doi: 10.1371/journal.pone.0151326.
- T. Rahkar Farshi and R. Demirci, “Multilevel image thresholding with multimodal optimization,” Multimed. Tools Appl., vol. 80, no. 10, pp. 15273–15289, Apr. 2021, doi: 10.1007/s11042-020-10432-4.
- T. Okoshi, “Segmentation Method,” in Planar Circuits for Microwaves and Lightwaves, Berlin, Heidelberg: Springer Berlin Heidelberg, 1985, pp. 87–96. doi: 10.1007/978-3-642-70083-5_5.
- R. Li, X. Xiao, S. Ni, H. Zheng, and S. Xia, “Byte Segment Neural Network for Network Traffic Classification,” in 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), IEEE, Jun. 2018, pp. 1–10. doi: 10.1109/IWQoS.2018.8624128.
- Z. Cai, Y. Liang, and H. Huang, “Unsupervised segmentation evaluation: an edge-based method,” Multimed. Tools Appl., vol. 76, no. 8, pp. 11097–11110, Apr. 2017, doi: 10.1007/s11042-016-3542-8.
- Z. Zhou and M. Pang, “Stereo Matching Algorithm of Multi-Feature Fusion Based on Improved Census Transform,” Electron., vol. 12, no. 22, p. 4594, Nov. 2023, doi: 10.3390/electronics12224594.
- X. Guo et al., “OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.00343
- C. W. Liu, H. Wang, S. Guo, M. J. Bocus, Q. Chen, and R. Fan, “Stereo Matching: Fundamentals, State-of-the-Art, and Existing Challenges,” in Advances in Computer Vision and Pattern Recognition, vol. Part F1566, 2023, pp. 63–100. doi: 10.1007/978-981-99-4287-9_3.
- F. Tosi, L. Bartolomei, and M. Poggi, “A Survey on Deep Stereo Matching in the Twenties,” Int. J. Comput. Vis., vol. 133, no. 7, pp. 4245–4276, Jul. 2025, doi: 10.1007/s11263-024-02331-0.
- X. Lai et al., “An Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weights,” Appl. Sci., vol. 13, no. 1, p. 514, Dec. 2023, doi: 10.3390/app13010514.
- T. La, L. Tao, C. Minh Tran, T. Nguyen Duc, E. Kamioka, and P. X. Tan, “Hourglass 3D CNN for Stereo Disparity Estimation for Mobile Robots,” Appl. Sci., vol. 13, no. 19, p. 10677, Sep. 2023, doi: 10.3390/app131910677.
- N. Mayer et al., “A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2016, pp. 4040–4048. doi: 10.1109/CVPR.2016.438.
- Y. Zhou, H. Dong, and A. El Saddik, “Learning to Estimate 3D Human Pose from Point Cloud,” IEEE Sens. J., vol. 20, no. 20, pp. 12334–12342, Oct. 2020, doi: 10.1109/JSEN.2020.2999849.
- Y. Wang, L. Wang, J. Yang, W. An, and Y. Guo, “Flickr1024: A large-scale dataset for stereo image super-resolution,” in Proceedings – 2019 International Conference on Computer Vision Workshop, ICCVW 2019, IEEE, Oct. 2019, pp. 3852–3857. doi: 10.1109/ICCVW.2019.00478.
- L. Wang et al., “NTIRE 2023 Challenge on Stereo Image Super-Resolution: Methods and Results,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Jun. 2023, pp. 1346–1372. doi: 10.1109/CVPRW59228.2023.00141.