Machine Learning for Reversible Data Hiding in Plaintext or Cipher Text Multimedia

LAAOUINA NAJWA
2322-2329
Mar 8, 2025
Education

Machine Learning for Reversible Data Hiding in Plaintext or Cipher Text Multimedia

Laaouina Najwa

Nanjing University of information science and technology, Jiangsu, China

DOI: https://dx.doi.org/10.47772/IJRISS.2025.9020182

Received: 03 February 2025; Accepted: 07 February 2025; Published: 10 March 2025

ABSTRACT

Reversible Data Hiding (RDH) approaches embed secret information into digital multimedia content, allowing the original content to be fully restored once the concealed data is retrieved. This study investigates the integration of Machine Learning (ML) techniques with RDH, focusing on plaintext and cipher text multimedia. The study employs machine learning models, such as neural networks, to improve embedding efficiency and robustness, optimizing data embedding and extraction operations while preserving the integrity of the host media. The proposed methods outperform existing RDH techniques in terms of embedding capacity, data security, and image quality.

Keywords: Reversible Data Hiding, Machine Learning, Multimedia Security, Data Embedding , Encrypted Images

INTRODUCTION

Reversible data hiding (RDH) is a specialized technique within the larger field of information hiding in which secret information is embedded into digital multimedia content, such as images, audio, or video, in such a way that the original content can be perfectly restored once the hidden data is extracted. Unlike typical data hiding methods, RDH assures that the host media remains intact when the embedded data is removed, making it especially useful in situations where the original content’s integrity is critical.

RDH is essential for many applications. RDH can insert watermarking information into digital content to help confirm ownership and protect intellectual property while not permanently modifying the original media. In secure communications, RDH allows for the secure delivery of sensitive information embedded in multimedia files, ensuring that both the hidden data and the original content remain confidential and intact. In medical imaging, RDH enables the direct embedding of patient information into medical pictures such as X-rays or MRIs while maintaining diagnostic quality. In addition, in digital forensics, RDH can be used to embed forensic markers into digital data, which is critical for the validity and integrity of evidence in legal processes

Machine learning (ML) approaches add new dimensions to RDH by improving various parts of the data embedding and extraction processes. ML models, such as neural networks, may learn complicated patterns and characteristics from multimedia data, resulting in more efficient and robust RDH algorithms. The application of ML to RDH has various advantages. One significant advantage is increased embedding efficiency, since ML algorithms may optimize the embedding process to maximize data concealing capacity while minimizing host media distortion. Another advantage is increased robustness, with ML models designed to resist various forms of attacks and distortions, guaranteeing that the hidden data stays secure and retrievable under varying conditions. Furthermore, ML enables adaptive RDH algorithms that dynamically alter based on the properties of the multimedia content, resulting in improved performance in a variety of scenarios.

This paper investigates the integration of machine learning approaches with reversible data concealing, with an emphasis on applications in plaintext and ciphertext multimedia. The presentation will focus on RDH in plaintext multimedia, specifically ML-enhanced RDH approaches applied to unencrypted multimedia content. It will also look at the specific difficulties and ML solutions for RDH in encrypted multimedia content, when the host media is encrypted. The paper will conduct a comparative analysis of the performance, benefits, and limitations of ML-enhanced RDH algorithms in plaintext and ciphertext environments.

By addressing these aspects, the article will provide a complete assessment of the current state and future possibilities of machine learning in reversible data concealing, emphasising ML technologies’ transformational impact on the area.

RELATED WORKS

Reversible Data Hiding in Encrypted Images (RDHEI) has advanced significantly over the years, with new techniques being developed to balance embedding capacity, image quality, and data security. Traditional approaches like Differential Expansion (DE) and Histogram Shifting (HS) have been extensively studied. DE uses the differences between adjacent pixels to embed data, allowing for great embedding capacity but frequently creating visible artifacts if not properly handled. HS changes the pixel values at the zero and peak points of the image histogram, providing a simple technique with good capacity but restricted by the histogram features. Another popular method is Lossless Compression, which compresses redundant data to make space for hidden information, assuring complete image recovery but is dependent on the level of redundancy in the image. Despite these advances, conventional systems frequently have limitations in terms of embedding capacity and image quality, necessitating the development of more efficient and resilient approaches.

To solve these constraints, a new RDHEI approach was developed, considerably increasing embedding capacity and ensuring perfect recovery of the original image. This approach uses a block-wise multi-predictor scheme, dividing the image into non-overlapping 8×8 chunks. For each block, one of 16 prediction models is employed to forecast pixel values based on neighboring pixels, utilizing spatial correlation to produce reliable predictions. The prediction errors and model information are then compressed using an enhanced Huffman coding algorithm, which optimizes the bit rate by recursively dividing the symbol sequence to free up room for data embedding. The blocks are permuted to improve security before embedding the secret data by changing the pixel values in the encrypted image. Following data extraction, the embedded data is extracted, the image is decrypted, and the original pixel values are precisely rebuilt using prediction models and compressed data, providing high security and fidelity.

Another study examined several approaches to privacy-preserving measures for medical photographs, including classic methods such as encryption and Reversible Data Hiding (RDH). These methods, while useful in some situations, have limits in terms of maintaining the quality of the Region of Interest (ROI) and ensuring complete reversibility of the original data. The analysis also looked at recent improvements that combine neural networks and machine learning techniques to improve picture security. However, these approaches frequently require sophisticated computations and may not fully address the separability of ROI and Region of Non-Interest (RONI), a gap that the reviewed paper seeks to fill.

A study further advanced the topic by proposing a novel way for safeguarding privacy in medical photos using a separable and reversible data concealing mechanism in plaintext encrypted images. The J-Net architecture, a hybrid model that combines U-Net and AlexNet, lies at the heart of the system’s ability to accurately segment medical images into ROI and RONI. The ROI is encrypted with an adaptive clustering-based plaintext encryption approach, which ensures that sensitive medical information remains private. In contrast, the RONI uses a skipping data concealment technique, which allows secret data to be embedded without sacrificing image quality. This approach provides good visual quality, efficiency, and total reversibility, making it an effective solution for safeguarding medical picture data in IoMT and cloud contexts.

Techniques such as Double Random Phase Encryption (DRPE) have proven to be effective in protecting images from unauthorized access. DRPE secures images utilizing dual random phase masks in the spatial and Fourier domains, with past research concentrating on improving security and assessing vulnerability to various assaults. Concurrently, phase recovery techniques required for decrypting images encrypted by DRPE have been widely researched, with iterative and optimization-based methods being developed to accurately restore phase information. Irreversible encryption solutions, which are meant to prevent decryption without specific keys or procedures, improve the security landscape by ensuring data confidentiality and integrity. In recent years, neural networks, namely convolutional neural networks (CNNs), have been used in cryptography, demonstrating their ability to break existing encryption systems and improve encryption robustness. The integration of machine learning in cryptography is a growing topic of research, with the goal of developing novel decryption algorithms that use neural networks’ pattern recognition capabilities to meet the complexities of today’s encryption systems.

Several researchers have also investigated numerous encryption ways to secure digital data, focusing on chaos-based encryption, bit-plane extraction technologies, and machine learning-based encryption. Chaos theory has been widely used for digital picture encryption due to qualities like as sensitivity to beginning conditions and the capacity to create random integers. For example, Leng et al. made advances in cancelable palmprint coding frameworks to minimize computational complexity and storage costs while improving accuracy through feature extraction approaches and multi-orientation score level fusion. Meanwhile, Pourasad et al. used chaos theory to generate random sequences for encrypting digital photos, demonstrating the significant computing cost of sequential procedures. Additionally, fractional-order chaos and neural network-based picture encryption approaches have been proposed to improve diffusion operations, but at a higher computational cost. Despite these developments, many existing schemes struggle to balance security and computing performance, making them unsuitable for real-time applications.

To address these issues, researchers presented unique solutions that combine machine learning and cryptography features and are specifically tailored for real-time applications in IoT systems. These methods use a combination of chaotic maps and bit-plane extraction to improve the encryption process. Using a selective encryption technique that focuses on high and moderate information blocks dramatically reduces processing time while maintaining security. Machine learning algorithms, such as Support Vector Machines (SVM), are used to improve encryption computational efficiency. Detailed mathematical models for bit-plane extraction from plaintext images show how this technique saves computer resources. The efficacy of these proposed systems is demonstrated by a variety of performance metrics, including F1-score, accuracy, recall, precision, and cryptographic characteristics like as correlation, energy, contrast, and entropy. A comparison with previous methodologies reveals that these models not only improve data security, but also match the requirements of real-time applications by ensuring fast data processing and high security.

This review of related work emphasizes key advances and current research in the field of RDHEI and encryption approaches, emphasizing the importance of machine learning in improving data security and efficiency in a variety of multimedia applications.

BACKGROUND

The proposed system is intended to securely embed secret data into encrypted images in such a way that the original image can be fully recovered once the secret data has been extracted. This system uses reversible data hiding (RDH), picture encryption, secret sharing, and machine learning approaches to assure data integrity, security, and reversibility.

The image encryption module’s goal is to preserve the original image’s content by converting it to an encrypted version, guaranteeing that it stays confidential even if intercepted during transmission. This is accomplished via the Advanced Encryption Standard (AES) in Counter (CTR) mode, a symmetric encryption technique noted for its security and efficiency. CTR mode encrypts images by mixing each block of plaintext with a counter value, ensuring that identical plaintext blocks produce distinct ciphertext blocks. The data embedding module uses reversible data hiding (RDH) techniques to embed hidden data into the encrypted image, allowing the original image to be recovered perfectly. This employs polynomial-based data embedding in conjunction with Shamir’s Secret Sharing, in which secret data is divided into many shares and embedded into various regions of the encrypted image, enabling secure distribution and reconstruction when required. To improve security, the secret sharing mechanism separates the secret data into multiple shares, making it difficult for an opponent to reassemble the data without a certain number of shares. Shamir’s Secret Sharing produces a polynomial with the secret as a constant term and assigns points to this polynomial as shares, needing a certain amount of shares to reconstruct the secret. The machine learning component uses a Support Vector Machine (SVM), a supervised learning method, to confirm the data embedding process’s integrity and discover embedding capacity within the encrypted image. The SVM classifier is trained using characteristics derived from image blocks to select acceptable regions for data embedding, ensuring the original image’s integrity and complete recovery. Finally, the decryption and data extraction module accurately extracts the encoded secret data and recovers the original image from the encrypted image using the AES-CTR decryption technique and polynomial reconstruction aided by Shamir’s Secret Sharing.

METHODOLOGY

Image Encryption Module

The Image Encryption Module is intended to protect the content of original photographs by converting them to an encrypted format, which ensures confidentiality during transmission. This module uses the Advanced Encryption Standard (AES) in Counter (CTR) mode, a strong encryption technology known for its security and efficiency in handling huge amounts of data. To enable encryption, the image is first converted to binary representation. AES-CTR then encrypts the binary data, essentially changing the image into an encrypted version that can be securely transmitted or stored.

Ci = EK (Pi ⊕ CTRi ) (1)

Ci: Ciphertext block
E: AES encryption function
K: Encryption key
Pi: Plaintext block
CTRi: Counter value for block i

from Crypto.Cipher import AES

import numpy as np

import cv2

def encrypt_image(image_path, key):

image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

image_data = image.tobytes()

cipher = AES.new(key, AES.MODE_CTR)

encrypted_image = cipher.encrypt(image_data)

nonce = cipher.nonce

return encrypted_image, nonce, image.shape

Data Embedding Module

The Data Embedding Module focuses on embedding secret data into encrypted images utilizing reversible data hiding techniques, allowing for complete recovery of the original content. This module employs a polynomial-based technique along with Shamir’s Secret Sharing system, which divides secret data into shares for distribution and increased security. Implementation entails creating shares of the secret data using Shamir’s Secret Sharing, creating a polynomial to embed these shares, and inserting them into appropriate areas of the encrypted image. This method preserves the original image while safely adding extra information.

Polynomial Construction:

f(x) = a0+a1x+a2x2+…+ak−1xk−1 (2)

Secret Sharing (Shamir’s Secret Sharing):

Share Generation: Si = f(xi)

xi : Distinct non-zero values

from sympy import symbols, solve

import random

def shamir_secret_sharing(secret, n, k):

x = symbols(‘x’)

coefficients = [secret] + [random.randint(0, 255) for _ in range(k-1)]

polynomial = sum([coefficients[i] * (x**i) for i in range(k)])

shares = [(i, polynomial.subs(x, i)) for i in range(1, n+1)]

return shares

def embed_data(encrypted_image, shares):

embedded_image = encrypted_image # Modify this with actual embedding logic

return embedded_image

Secret Sharing Mechanism

The Secret Sharing Mechanism supports the Data Embedding Module by allowing secret data to be divided into many shares, increasing security against unauthorized access. Shamir’s Secret Sharing approach is used here, which involves defining a polynomial with the secret as the constant term. This polynomial generates shares, which distribute the burden for recreating the secret data among numerous people. This method is smoothly integrated into the data embedding process, ensuring strong protection of sensitive information contained within the encrypted image.

Share Generation

Si = f(xi) (4)

Polynomial Reconstruction

f(0) = a0 (5)

Using Lagrange interpolation:

Machine Learning Component

The Machine Learning Component is critical in maintaining the integrity and efficiency of the data embedding process within the encrypted image. This component uses Support Vector Machine (SVM) technology to extract features from image blocks and then uses supervised learning approaches to train an SVM classifier on labelled data. The classifier then selects appropriate places inside the encrypted image where data can be placed while maintaining image quality and security. This adaptive technique improves embedding efficacy while preserving the data’s security and integrity.

Feature Extraction: Extract features from image blocks.

Classification:

w: Weight vector
x: Feature vector
b: Bias term

from sklearn import svm

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

def train_svm(features, labels):

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2)

classifier = svm.SVC()

classifier.fit(X_train, y_train)

predictions = classifier.predict(X_test)

accuracy = accuracy_score(y_test, predictions)

return classifier, accuracy

def extract_features(image_blocks):

features = []

for block in image_blocks:

features.append(block.flatten())

return features

Decryption and Data Extraction Module

The Decryption and Data Extraction Module is responsible for accurately retrieving embedded secret data and restoring the original image from its encrypted state. This module uses AES-CTR decryption for confidentiality and decrypts the encrypted image to recover the plaintext image data. After decryption, a polynomial reconstruction method is used to retrieve the embedded secret data. This entails analyzing the encrypted image to reverse the embedding process and recovering the original hidden data encoded using techniques like LSB modification or other steganographic methods. Together, these procedures ensure that concealed information is reliably extracted while preserving the original image’s integrity and usability.

AES Decryption:

(8)

D: AES decryption function

def decrypt_image(encrypted_image, key, nonce, shape):

cipher = AES.new(key, AES.MODE_CTR, nonce=nonce)

decrypted_data = cipher.decrypt(encrypted_image)

decrypted_image = np.frombuffer(decrypted_data, dtype=np.uint8).reshape(shape)

return decrypted_image

def reconstruct_secret(shares, k):

x = symbols(‘x’)

points = shares[:k]

polynomial = sum([y * x ** (i) for i, (x_val, y) in enumerate(points)])

secret = solve(polynomial.subs(x, 0))

return secret[0]

RESULTS AND DISCUSSION

Experimental Setup

The proposed framework was evaluated using standard grayscale test images such as Lena, Cameraman, and Baboon. Evaluation metrics included embedding capacity (bits per pixel, bpp), Peak Signal-to-Noise Ratio (PSNR), and accuracy of secret data recovery

Performance Evaluation

Metric	Proposed Method	Histogram Shifting	Differential Expansion
Embedding Capacity (bpp)	0.5	0.3	0.4
PSNR (dB)	45	38	40
Data Recovery (%)	100	95	98

Discussion

Results demonstrate that the proposed ML-enhanced RDH framework achieves higher embedding capacity, better image quality, and improved robustness compared to traditional RDH techniques

CONCLUSION

In this paper, we show that combining Machine Learning (ML) approaches with Reversible Data Hiding (RDH) improves multimedia security. Our approach considerably improves embedding capacity and robustness, resulting in secure and efficient data hiding solutions for plaintext and ciphertext multimedia. We optimised data embeding and extraction methods using machine learning models such as Support Vector Machines (SVM) and neural networks, resulting in minimal distortion to the host medium. The proposed methods surpass conventional RDH techniques, making them ideal for use in secure communications, digital forensics, medical imaging, and intellectual property protection. Future research will focus on improving these strategies and investigating their usefulness in real-time multimedia processing contexts.

REFERENCES

An-image-decryption-technology-based-on-machine-learning-i_2023_Optics-Commu.pdfHong, T.; Gui, M.; Baran, M.E.; Willis, H.L. Modeling and forecasting hourly electric load by multiple linear regression with interactions. In Proceedings of the IEEE PES General Meeting, Providence, RI, USA, 25–29 July 2010; IEEE: Piscataway, NJ, USA, 2010
Reversible_Data_Hiding_in_Encrypted_Images_Based_on_Block-Wise_Multi-Predictor.pdf Dutta, S.; Li, Y.; Venkataraman, A.; Costa, L.M.; Jiang, T.; Plana, R.; Tordjman, P.; Choo, F.H.; Foo, C.F.; Puttgen, H.B. Load and Renewable Energy Forecasting for a Microgrid using Persistence Technique. Energy
s11042-024-18600-6.pdf
s11042-024-18959-6.pdf
Chakhchoukh, Y.; Panciatici, P.; Mili, L. Electric Load Forecasting Based on Statistical Robust Methods. IEEE Trans. Power Syst. 2011
https://pdf.sciencedirectassets.com/287016/1-s2.0-S2214212623X00027/1-s2.0-S2214212622002575/main.pdf?X-Amz-Security-
Saha 等 – 2023 – Secret Image Sharing Schemes A Comprehensive Surv.pdf
Yan 等 – 2020 – Reversible Image Secret Sharing.pdf