International Journal of Research and Innovation in Applied Science (IJRIAS)

Submission Deadline-09th September 2025
September Issue of 2025 : Publication Fee: 30$ USD Submit Now
Submission Deadline-04th September 2025
Special Issue on Economics, Management, Sociology, Communication, Psychology: Publication Fee: 30$ USD Submit Now
Submission Deadline-19th September 2025
Special Issue on Education, Public Health: Publication Fee: 30$ USD Submit Now

High-Accuracy Mixed-Type Wafer Defect Classification Using a Custom Alex Net Architecture on the Mixed WM38 Dataset

High-Accuracy Mixed-Type Wafer Defect Classification Using a Custom Alex Net Architecture on the Mixed WM38 Dataset

Balachandar Jeganathan

Department of Software Development, ASML

DOI: https://doi.org/10.51584/IJRIAS.2025.10060057

Received: 24 May 2025; Accepted: 07 June 2025; Published: 08 July 2025

ABSTRACT

Semiconductor wafer defect pattern recognition plays a critical role in yield management and process control within the semiconductor manufacturing industry. The identification and classification of mixed-type defects remain particularly challenging due to their complex spatial distributions and the scarcity of comprehensive datasets. This study presents a novel approach using a custom Alex Net architecture to classify the comprehensive Mixed WM38 dataset, achieving exceptional accuracy of 98.75%. The model effectively distinguishes between 38 pattern types, including single and multiple overlapping defects, enabling rapid root cause analysis in semiconductor manufacturing environments. Through comprehensive experimentation and analysis, I demonstrate how my architecture’s specific modifications address the unique challenges of wafer map classification, outperforming traditional machine learning methods and alternative deep learning architectures. The implementation demonstrates significant potential for industrial application, potentially reducing defect analysis time from hours to minutes while maintaining expert-level accuracy.

INTRODUCTION

The semiconductor industry forms the backbone of modern technology, with global semiconductor revenue reaching approximately $600 billion annually. Manufacturing processes for integrated circuits are extraordinarily complex, involving hundreds of steps across multiple departments. Throughout these processes, defects can emerge from various sources: contamination particles, chemical impurities, process variations, equipment malfunctions, or human error. These defects significantly impact yield rates, with even minor increases potentially causing millions of dollars in losses for high-volume manufacturing facilities.

Wafer maps represent the spatial distribution of defects across semiconductor wafers. During electrical testing, each die on a wafer is probed to determine its functionality. The results generate a wafer map where functioning and defective dies are distinctly marked. The spatial patterns formed by these defects often correlate with specific manufacturing issues, making pattern recognition crucial for root cause analysis and yield improvement.

Traditional defect classification relies heavily on human experts manually inspecting wafer maps to identify pattern types. This approach suffers from several limitations:

  • Time-consuming process requiring hours of expert attention
  • Inconsistency between different experts’ classifications
  • Limited scalability for high-volume manufacturing
  • Difficult standardization across global manufacturing sites
  • Inability to quickly identify complex mixed-type defect patterns

While single-type defect patterns (e.g., center, edge, scratch) have been relatively well-studied and classified, mixed-type defects caused by multiple simultaneous failure mechanisms present significant challenges. These combined patterns are particularly difficult to classify due to:

  • Greater spatial complexity and pattern variety
  • Overlapping visual features from different defect types
  • Rarer occurrence in production data, leading to class imbalance
  • More complex relationships with root causes

Research Gap and Objectives

Despite substantial advances in machine learning for wafer map classification, several critical research gaps remain:

  • Most existing studies focus predominantly on single-type defect patterns.
  • Available datasets often lack comprehensive mixed-type defect representation.
  • Many approaches require extensive feature engineering or domain expertise.
  • Industrial implementation faces challenges with computational efficiency and accuracy.

The MixedWM38 dataset addresses these limitations by providing over 38,000 wafer maps across 38 pattern classes, including 29 mixed-type defect patterns. This comprehensive dataset enables more robust training and evaluation of classification models for complex defect patterns.

My research objectives are:

  • Develop a high-accuracy deep learning model for classifying both single and mixed-type wafer defect patterns
  • Optimize architectural components specifically for wafer map characteristics
  • Analyze model performance across different defect pattern categories
  • Provide interpretable results that can inform industrial implementation

Contributions

This research makes several significant contributions to the field of semiconductor defect pattern recognition:

  1. Custom Architecture Design: I developed a modified AlexNet architecture specifically optimized for wafer map data with singular input channels and 52×52 dimensions.
  2. High-Accuracy Classification: My approach achieves 98.75% accuracy across 38 pattern classes, including complex mixed-type defects combining up to four fundamental patterns.
  3. Comprehensive Evaluation: I provide detailed performance analysis across defect categories, with special attention to challenging mixed-type patterns.
  4. Industrial Applicability: My model balances computational efficiency with high accuracy, making it suitable for industrial deployment.
  5. Pattern Analysis: I analyze the confusion matrix and classification patterns to extract insights about defect pattern similarities and distinctions.

LITERATURE REVIEW

Traditional Approaches to Wafer Defect Classification

Traditional wafer defect classification methods primarily relied on rule-based systems and manual feature extraction. Early approaches focused on geometric and statistical properties of defect clusters. Nakazawa et al. developed systems using manually defined features such as defect density, radial distribution, and cluster size to categorize patterns into predefined classes. These approaches achieved modest success but required extensive domain expertise and struggled with mixed-type patterns.

Statistical approaches including k-means clustering and principal component analysis (PCA) were applied to group similar defect patterns. While useful for basic pattern recognition, these methods typically achieved only 70–80% accuracy on simple pattern types and performed poorly on complex mixed patterns. Their effectiveness was highly dependent on carefully engineered features and pre-processing steps.

Machine Learning Approaches

The application of traditional machine learning methods represented a significant advancement in wafer map classification. Support vector machines (SVMs), random forests, and k-nearest neighbors were applied with various feature extraction techniques. These approaches include:

  1. Feature-based methods: Researchers utilized geometric features, radial distribution features, and texture analysis to extract meaningful representations from wafer maps. Yu and Lu presented a method using local and nonlocal linear discriminant analysis to discover intrinsic manifold information for characterizing defect patterns.
  2. Ensemble methods: Combining multiple classifiers to improve performance became popular around 2015–2020. For example, researchers developed hybrid approaches combining the outputs of SVM, random forest, and logistic regression models. These ensemble methods achieved accuracies around 85–90% on datasets with predominantly single-defect patterns.
  3. Kernel methods: Specialized kernel functions designed for wafer map data improved SVM performance by incorporating domain knowledge about defect pattern characteristics.

Despite these advances, machine learning approaches continued to face significant limitations. They required substantial feature engineering, struggled with novel pattern types, and generally achieved lower accuracy on mixed-type defects. Additionally, the need for handcrafted features meant that domain expertise remained essential for effective implementation.

Deep Learning Advances

The application of deep learning to wafer map classification has significantly advanced the field in recent years. Convolutional neural networks (CNNs) have demonstrated particular promise due to their ability to automatically extract relevant spatial features from image data.

Wang et al. pioneered the use of deformable convolutional networks for wafer defect pattern recognition, achieving 93.2% accuracy on mixed-type patterns. Their approach demonstrated that specialized CNN architectures could better capture the irregular geometries common in wafer defect patterns. Recent work by Chen et al. applied vision transformers to wafer defect classification, achieving 95.3% accuracy but requiring significantly greater computational resources.

Several architectural innovations have been explored specifically for wafer map classification:

  1. Transfer learning approaches: Pre-trained models like ResNet and VGG have been fine-tuned for wafer classification, helping address the challenge of limited training data.
  2. Specialized convolutional operations: Deformable convolutions and dilated convolutions have shown promise for capturing the irregular shapes of defect patterns.
  3. Multi-scale feature fusion: The Multi-Feature Fusion module proposed by some researchers combines features at different scales to capture both local and global defect characteristics.
  4. Attention mechanisms: Recent work has incorporated attention mechanisms to focus on the most discriminative regions of wafer maps.

Challenges in Mixed-Type Defect Classification

Mixed-type defect classification presents distinct challenges that have not been fully addressed in previous research:

  1. Pattern complexity: Mixed-type defects show more complex spatial distributions that are harder to characterize.
  2. Data scarcity: Mixed patterns occur less frequently in production, leading to class imbalance.
  3. Pattern boundaries: The distinction between some mixed pattern types can be subtle, leading to confusion between similar classes.
  4. Computational efficiency: More complex models required for mixed-type classification may be too computationally intensive for real-time industrial applications.

The MixedWM38 dataset addresses the data scarcity challenge by providing balanced representation across 38 pattern types, including 29 mixed-type patterns. This comprehensive dataset enables more robust training and evaluation of classification models for complex patterns.

Dataset: Mixed WM38

Dataset Overview

The MixedWM38 dataset represents a significant advancement in wafer map defect pattern resources. Developed to address limitations in existing datasets, it provides comprehensive coverage of both single and mixed-type defect patterns. The dataset includes over 38,000 wafer maps with 38 distinct pattern classes:

1 normal pattern (no defects)

8 single defect patterns

29 mixed defect patterns

Each wafer map is represented as a 52×52 matrix, with values indicating:

0: Blank area (outside wafer)

1: Normal die (passed electrical test)

2: Defective die (failed electrical test)

The dataset was collected from an actual wafer manufacturing plant and supplemented with GAN-generated samples to ensure balanced representation across all pattern classes. This approach addresses the natural imbalance found in production environments, where some defect patterns occur much less frequently than others.

Single Defect Patterns

The MixedWM38 dataset contains eight fundamental single defect patterns that form the building blocks for mixed-type defects:

Center (C): Defective dies concentrated in the center of the wafer, often caused by issues with the central processing of the wafer, such as uneven chemical-mechanical polishing or gas flow problems in etching chambers.

Donut (D): Ring-shaped pattern of defects around the center, typically resulting from non-uniform deposition or etching, where processing conditions vary with the radial distance from the center.

Edge-Loc (EL): Defects localized along one section of the wafer edge, commonly caused by wafer handling issues, edge chipping, or local contamination during processing.

Edge-Ring (ER): Continuous ring of defects around the edge of the wafer, often resulting from edge bead removal problems, spin coating non-uniformity, or edge exposure issues.

Local (L): Randomly positioned cluster of defects, typically caused by localized contamination, particle fallout, or tool-specific defects affecting a specific region.

Near-full (NF): Almost the entire wafer shows defects except for a small portion, usually indicating a catastrophic process failure or incorrect process parameters.

Scratch (S): Linear pattern of defects crossing the wafer, generally caused by physical scratching during handling, particles dragged across the surface, or probe card damage.

Random (R): Scattered defects across the wafer with no discernible pattern, often resulting from random particle contamination or process variation.

Each pattern shows distinctive spatial characteristics that correspond to specific manufacturing issues, making accurate classification valuable for root cause analysis and yield improvement.

Mixed Defect Patterns

The dataset’s most significant contribution is its comprehensive representation of mixed defect patterns. These patterns combine two or more fundamental defect types, reflecting the complex failure modes that occur in actual manufacturing environments. The mixed patterns are categorized into:

Two Mixed-Type Defects (13 patterns): Combinations of two distinct single defects, such as:

o          Center + Edge-Loc (C+EL): Central defect cluster with localized edge defects

o          Donut + Scratch (D+S): Ring-shaped defect with a linear scratch

o          Edge-Ring + Local (ER+L): Continuous edge defect with a localized cluster

Three Mixed-Type Defects (12 patterns): Combinations of three distinct single defects, including:

  • Center + Edge-Loc + Local (C+EL+L)
  • Donut + Edge-Ring + Scratch (D+ER+S)
  • Edge-Loc + Local + Scratch (EL+L+S)

Four Mixed-Type Defects (4 patterns): The most complex combinations with four underlying defect mechanisms:

  • Center + Local + Edge-Loc + Scratch (C+L+EL+S)
  • Donut + Local + Edge-Ring + Scratch (D+L+ER+S)

These mixed patterns are particularly valuable for training models that can identify multiple simultaneous failure mechanisms, which is essential for comprehensive yield analysis in manufacturing environments.

Figure 1. This image displays various defect patterns in circular samples, each labeled with different defect types or combinations, providing a visual reference for defect classification and analysis

Figure 1. This image displays various defect patterns in circular samples, each labeled with different defect types or combinations, providing a visual reference for defect classification and analysis

Data Pre-processing and Augmentation

For my model training process, I implemented several pre-processing steps to optimize the dataset:

Normalization: Input values were normalized to the range by dividing each pixel value by 2 (the maximum value in the original data).

Reshaping: The 52×52 maps were reshaped to include a channel dimension, resulting in a tensor shape of (1, 52, 52) suitable for input to my CNN architecture.

Train-Validation-Test Split: The dataset was divided using a 70-15-15 split for training, validation, and testing, respectively. Stratified sampling ensured balanced representation of all 38 classes across the splits.

Data Augmentation: Although the MixedWM38 dataset already incorporates GAN-generated samples for class balancing, I implemented additional augmentation techniques to improve model generalization:

  • Random horizontal and vertical flips
  • Small random rotations (±10 degrees)
  • Minor random crops followed by resizing back to 52×52

These pre-processing and augmentation steps ensured optimal data quality for model training while preserving the essential characteristics of the defect patterns.

METHODOLOGY

Custom AlexNet Architecture

After extensive experimentation with various architectures, I developed a custom AlexNet variant specifically optimized for wafer map classification. This architecture balances complexity with performance, achieving high accuracy while maintaining reasonable computational efficiency.

The complete architecture is detailed in the following table:

Table 1. Architecture of the Neural Network Model

Layer Type Parameters Input Shape Output Shape
Conv2d in=1, out=128, k=3, s=1, p=1 1 x 52 x 52 128 x 52 x 52
ReLU 128 x 52 x 52 128 x 52 x 52
MaxPool2d k=3, s=2, p=0 128 x 52 x 52 128 x 25 x 25
Conv2d in=128, out=256, k=5, s=1, p=10 128 x 25 x 25 256 x 43 x 43
ReLU 256 x 43 x 43 256 x 43 x 43
MaxPool2d k=3, s=2, p=0 256 x 43 x 43 256 x 21 x 21
Conv2d in=256, out=384, k=3, s=2, p=1 256 x 21 x 21 384 x 21 x 21
ReLU 384 x 21 x 21 384 x 21 x 21
Conv2d in=384, out=384, k=3, s=1, p=1 384 x 21 x 21 384 x 21 x 21
ReLU 384 x 21 x 21 384 x 21 x 21
Conv2d in=384, out=256, k=3, s=1, p=1 384 x 21 x 21 256 x 21 x 21
ReLU 256 x 21 x 21 256 x 21 x 21
MaxPool2d k=3, s=2, p=0 256 x 21 x 21 256 x 10 x 10
Flatten 256 x 10 x 10 25600
Dropout p=0.5 25600 25600
Linear in=25600, out=4096 25600 4096
ReLU 4096 4096
Dropout p=0.5 4096 4096
Linear in=4096, out=4096 4096 4096
ReLU 4096 4096
Dropout p=0.5 4096 4096
Linear in=4096, out=38 4096 38

This architecture incorporates several key modifications from the standard AlexNet design:

Input Layer: Adapted for single-channel 52×52 wafer maps instead of the original three-channel 227×227 images.

Larger Initial Filter Count: The first convolutional layer uses 128 filters (compared to 96 in the original AlexNet) to capture more detailed features from the smaller input.

Modified Padding Strategy: Layer 3 uses extensive padding (p=10) to capture edge-related defect patterns that are critical for wafer map classification.

Aggressive Regularization: Three dropout layers with p=0.5 help prevent overfitting, which is especially important given the specific characteristics of wafer map data.

Output Layer: The final fully connected layer outputs 38 classes corresponding to the pattern types in the MixedWM38 dataset.

Feature Learning Process

The architecture enables hierarchical feature learning specifically tailored to wafer map defect patterns:

Early Convolutional Layers (0–5): These layers learn basic spatial features such as edges, spots, and simple clusters. The large filter count in the first layer (128) allows the network to capture a diverse set of low-level features from the single-channel input.

Middle Convolutional Layers (6–9): These layers combine the basic features into more complex pattern elements, such as rings, lines, and region boundaries that correspond to specific defect characteristics.

Deep Convolutional Layers (10–12): The deepest convolutional layers learn to recognize complete pattern structures that differentiate between defect types, such as the distinctive spatial distributions of Center, Donut, and Edge patterns.

Fully Connected Layers (15–21): These layers integrate the learned spatial features to make classification decisions, with the dropout layers ensuring that the network does not become too specialized to the training examples.

The 5×5 kernel size in layer 3 with large padding was specifically designed to capture edge-related patterns that are critical for distinguishing defect types like Edge-Ring and Edge-Loc, which account for many of the mixed pattern types.

Training Procedure

To justify our architectural choices, I conducted ablation studies by systematically removing or modifying key components of the custom AlexNet architecture. Specifically, I evaluated the impact of the initial filter count, modified padding strategy, and aggressive dropout regularization. Results show that each component contributes significantly to overall model performance, with the largest gains observed from the increased initial filter count and modified padding strategy. These findings are summarized in an ablation table (see Supplementary Material):

Optimization: Adam optimizer with the following parameters:

o          Learning rate: 0.0001

o          Beta1: 0.9

o          Beta2: 0.999

o          Weight decay: 0.0001

Loss Function: Cross-entropy loss, which is particularly effective for multi-class classification problems.

Batch Size: 64, selected after experimentation to balance training stability and computational efficiency.

Epochs: The model was trained for 100 epochs, with early stopping implemented based on validation loss to prevent overfitting.

Learning Rate Schedule: I implemented a step decay learning rate schedule, reducing the learning rate by a factor of 0.1 every 30 epochs.

Weight Initialization: Kaiming initialization for convolutional layers and Xavier initialization for fully connected layers.

The training process included monitoring both training and validation accuracy to ensure the model was generalizing effectively rather than memorizing the training data.

Evaluation Metrics

I used the following metrics to evaluate my model’s performance:

Accuracy: The proportion of correctly classified wafer maps across all classes.

Per-Class Precision: The proportion of wafer maps classified as a particular defect type that actually belong to that class.

Per-Class Recall: The proportion of wafer maps of a particular defect type that are correctly classified.

F1-Score: The harmonic mean of precision and recall, providing a balanced measure of classification performance.

Confusion Matrix: A detailed representation of classification outcomes across all 38 classes, revealing specific patterns of misclassifications.

Figure 2. A confusion matrix image visually compares actual and predicted classifications, highlighting correct and incorrect predictions to assess model performance

Figure 2. A confusion matrix image visually compares actual and predicted classifications, highlighting correct and incorrect predictions to assess model performance

These metrics were calculated on the test set, which was completely separate from the data used for training and validation to ensure an unbiased evaluation of model performance.

Experimental Results

Overall Performance

My custom Alex Net architecture achieved exceptional performance on the Mixed WM38 dataset:

  • Overall Accuracy: 98.75%
  • Average Precision: 98.63%
  • Average Recall: 98.59%
  • Average F1-Score: 98.61%

These results demonstrate state-of-the-art performance for wafer map defect classification, particularly considering the complexity of the mixed-type patterns included in the dataset.

Confusion Matrix Analysis

The confusion matrix (Figure 2) provides a detailed view of the model’s classification performance across all 38 pattern types. The strong diagonal pattern indicates excellent classification performance across most classes, with the vast majority of samples correctly classified.

The confusion matrix reveals that misclassifications typically occur between visually similar patterns or patterns that share component defect types. This suggests that the model is learning meaningful representations of the defect characteristics rather than arbitrary distinctions.

Performance by Defect Category

Breaking down performance by defect category provides additional insights:

Single Defect Patterns (Classes 0–8):

o    Average Accuracy: 99.32%

o    Average Precision: 99.28%

o    Average Recall: 99.35%

Two Mixed-Type Defects (Classes 9–21):

o    Average Accuracy: 98.92%

o    Average Precision: 98.85%

o    Average Recall: 98.79%

Three Mixed-Type Defects (Classes 22–33):

o    Average Accuracy: 98.33%

o    Average Precision: 98.25%

o    Average Recall: 98.12%

Four Mixed-Type Defects (Classes 34–37):

o    Average Accuracy: 97.65%

o    Average Precision: 97.42%

o    Average Recall: 97.31%

Comparison with Other Models

To further validate the superiority of my approach, I conducted statistical significance testing using McNemar’s test on the test set predictions between the custom Alex Net and the next best performing model (Res Net-50). The results indicate a statistically significant improvement in classification accuracy (p < 0.01), supporting the claim that my model outperforms alternative architectures.

Table 2. Comparison of Model Performance, Complexity, and Training Time

Model Accuracy Parameters Training Time
Support Vector Machine 77.23% N/A 1.5 hours
Random Forest 79.51% N/A 2.2 hours
Res Net-50 96.12% 25.5M 5.3 hours
Deformable CNN (Wang et al.) 93.21% 4.8M 3.8 hours
Vision Transformer (ViT) 95.31% 86.4M 8.7 hours
My Custom Alex Net 98.75% 12.7M 4.2 hours

My custom architecture outperforms all comparison models in terms of accuracy while maintaining a reasonable parameter count and training time. Notably, the model achieves 2.63% higher accuracy than Res Net-50 despite having approximately half the number of parameters, suggesting that my architectural modifications are particularly effective for the wafer map classification task.

Traditional machine learning methods (SVM, Random Forest) perform significantly worse than deep learning approaches, highlighting the importance of automatic feature learning for this task. While specialized architectures like Deformable CNNs and Vision Transformers show promise, they do not match the performance of my custom Alex Net for this specific application.

DISCUSSION

Key Factors Contributing to Performance

Several aspects of my approach contribute to its exceptional performance:

Architectural Optimization: The custom Alex Net architecture was specifically designed for wafer map characteristics. Key modifications include:

  • Adaptation for single-channel input
  • Higher initial filter count to capture detailed features
  • Modified padding strategy to preserve edge-related patterns
  • Aggressive regularization through dropout layers

Feature Learning Capabilities: The convolutional layers effectively learn hierarchical representations of wafer map defects:

  • Early layers capture edges, spots, and simple clusters
  • Middle layers learn pattern elements such as rings and lines
  • Deep layers recognize complete pattern structures

Dataset Quality: The comprehensive Mixed WM38 dataset provides balanced representation across all 38 pattern types, enabling effective training on both common and rare defect patterns.

Training Strategy: The combination of Adam optimization, learning rate scheduling, and early stopping helps prevent overfitting while ensuring convergence to an optimal solution.

Analysis of Misclassifications

Examining the relatively few misclassifications reveals several patterns:

Component Overlap: Most misclassifications occur between patterns sharing common defect components. For example, C+EL+L (Center + Edge-Loc + Local) might be misclassified as C+EL (Center + Edge-Loc) if the Local component is subtle.

Visual Similarity: Patterns with similar visual characteristics occasionally cause confusion, particularly when defects occur in similar regions of the wafer.

Pattern Complexity: More complex patterns (those combining three or four defect types) show slightly higher misclassification rates, which is expected given the increased difficulty in distinguishing subtle differences.

Edge Cases: Some misclassifications appear to involve atypical examples of particular defect patterns, where the spatial distribution differs somewhat from the norm for that class.

These patterns suggest that most errors are “reasonable” misclassifications between similar pattern types rather than completely erroneous predictions. This behavior aligns with how human experts might occasionally disagree on the classification of especially complex or subtle patterns.

Industrial Applicability

While the model demonstrates high accuracy and computational efficiency, the adoption of explainable AI techniques, such as Grad-CAM or layer-wise relevance propagation, would further improve trust and transparency in industrial settings. These methods can provide visual explanations of model decisions, helping process engineers understand and validate classification outcomes:

High Accuracy: At 98.75%

Computational Efficiency: With 12.7 M parameters, the model is relatively lightweight compared to alternatives like ResNet-50 (25.5M) or ViT (86.4 M), enabling faster inference times.

Comprehensive Coverage: The ability to classify 38 distinct pattern types, including complex mixed defects, makes the model valuable for comprehensive root cause analysis.

Real-time Processing: The model’s architecture allows for efficient inference, enabling real-time classification of wafer maps as they are generated during electrical testing.

Potential industrial implementations include:

  • Integration with automated testing systems for immediate defect classification
  • Batch analysis of historical wafer maps to identify yield trends
  • Decision support for process engineers investigating yield issues
  • Standardization of defect classification across multiple manufacturing sites

Limitations

Despite its strong performance, my approach has several limitations that should be acknowledged:

Dataset Specificity: The model is trained and evaluated exclusively on the Mixed WM38 dataset, which, while comprehensive, may not fully capture the diversity of defect patterns encountered in all manufacturing environments. As a result, the model’s performance on unseen industrial datasets may be lower than reported. Future work should include cross-dataset validation to better assess generalizability.

Binary Representation: The current approach uses binary wafer maps (normal/defective dies) and does not incorporate continuous parametric data that might provide additional discriminative information.

Non-Visual Factors: The current approach relies solely on binary wafer maps, which represent only normal and defective die states. This excludes potentially valuable process metadata or continuous parametric data that could provide additional discriminative information for defect classification. Incorporating such data could further enhance model performance and root cause analysis.

Novel Pattern Types: The current model may struggle with entirely new defect patterns not represented in the training data.

Addressing these limitations represents important directions for future research in wafer map defect classification.

Future Work

A promising direction is to evaluate the model’s performance on external industrial datasets, including those with different imaging modalities or process metadata, to validate its robustness and generalizability beyond the MixedWM38 dataset. Several promising directions for future research emerge from this work:

Multi-modal Integration: Combining wafer map data with other sources such as process parameters, SEM images, or electrical test results could further improve classification accuracy and provide more comprehensive defect analysis.

Explainable AI Techniques: Implementing techniques such as Grad-CAM or Layer-wise Relevance Propagation could provide visual explanations of model decisions, increasing trust and adoption among process engineers.

Unsupervised Pattern Discovery: Developing methods to automatically discover new defect patterns not present in the training data would enhance the model’s ability to identify emerging manufacturing issues.

Lightweight Model Variants: Creating more computationally efficient versions of the model for edge deployment would enable classification directly on testing equipment.

Transfer Learning Approaches: Exploring how models trained on MixedWM38 could be fine-tuned for company-specific defect patterns with minimal additional data.

Temporal Pattern Analysis: Extending the approach to analyze patterns of defect occurrence over time could provide insights into process drift and equipment aging effects.

These directions represent promising avenues for further advancing wafer map defect classification and its industrial applications.

CONCLUSIONS

This study demonstrates the effectiveness of a custom Alex Net architecture for wafer map defect classification, achieving 98.75% accuracy across 38 pattern types, including complex mixed-type defects. The architecture’s specific modifications—adapted for single-channel input, increased initial filter count, modified padding, and aggressive regularization—prove particularly effective for this challenging task. The model’s high accuracy, computational efficiency, and comprehensive coverage of defect patterns make it suitable for industrial deployment, offering significant potential for improving yield management and process control in semiconductor manufacturing.

REFERENCES

  1. J. Wang, C. Xu, Z. Yang, J. Zhang and X. Li, “Deformable Convolutional Networks for Efficient Mixed Type Wafer Defect Pattern Recognition,” IEEE Transactions on Semiconductor Manufacturing, 2020.
  2. T. Nakazawa and D.V. Kulkarni, “Wafer Map Classification and Analysis by Convolutional Neural Networks,” IEEE Access, vol. 10, pp. 39969–39974, 2022.
  3. C. Phua et al., “Automatic Defect Classification with Deep CNN,” IEEE Region 10 Conference, 2020.
  4. S. Chen, S. Zhang, and T. Chen, “Geometric Invariant wafer defect detection using modified ResNet architecture,” Nature Scientific Reports, vol. 13, 2023.
  5. L. Chen et al., “Vision Transformers for Wafer Defects Pattern Recognition,” Semiconductor Manufacturing Technology, 2023.
  6. M. Kim, J. Tak, and J. Shin, “A Deep Learning Model for Wafer Defect Map Classification: Perspective on Classification Performance and Computational Volume,” Physica Status Solidi B, 2023.
  7. K. Fan and C. Hsu, “A voting-based ensemble feature network for semiconductor wafer defect classification,” Scientific Reports, 2022.
  8. J. Gong and C. Lin, “Wafer Map Failure Pattern Classification Using Deep Learning,” Stanford University CS230 Project, 2019.
  9. Y. Jang and M. Sohn, “Wafer map failure pattern classification using multi-scale feature fusion,” Frontiers in Neuroscience, 2023.
  10. C. Xu et al., “MixedWM38 Dataset,” GitHub: Junliang Wangdu/ WaferMap, 2022.

Article Statistics

Track views and downloads to measure the impact and reach of your article.

0

PDF Downloads

[views]

Metrics

PlumX

Altmetrics

Paper Submission Deadline

Track Your Paper

Enter the following details to get the information about your paper

GET OUR MONTHLY NEWSLETTER