Warwick Image Forensics Dataset for Device Fingerprinting In Multimedia Forensics

Device fingerprints like sensor pattern noise (SPN) are widely used for provenance analysis and image authentication. Over the past few years, the rapid advancement in digital photography has greatly reshaped the pipeline of image capturing process on consumer-level mobile devices. The flexibility of camera parameter settings and the emergence of multi-frame photography algorithms, especially high dynamic range (HDR) imaging, bring new challenges to device fingerprinting. The subsequent study on these topics requires a new purposefully built image dataset. In this paper, we present the Warwick Image Forensics Dataset, an image dataset of more than 58,600 images captured using 14 digital cameras with various exposure settings. Special attention to the exposure settings allows the images to be adopted by different multi-frame computational photography algorithms and for subsequent device fingerprinting. The dataset is released as an open-source, free for use for the digital forensic community.


INTRODUCTION
Image device fingerprinting is an important topic in multimedia forensics. It allows forensic investigators to establish an image's history, identify the source device and authenticate the content. Sensor Pattern Noise (SPN) [1], as its name suggests, is a noise intrinsically embedded in images, primarily due to Photo Response Non-Uniformity (PRNU). Such an intrinsic property makes SPN a popular candidate for device fingerprinting and many researches are done on SPN-based source camera identification [1,2], tampering localization [3,4] and source camera clustering [5][6][7]. Public datasets like Dresden Image Dataset [8] and VISION Image Dataset [9], which can be used as benchmarking platforms, are very important for the study of device fingerprint analysis and the development of relevant techniques.
As the digital forensic community is gaining more understanding of image device fingerprinting, digital and computational photography has undergone huge development as well. Driven by the need for consumer-level devices to produce better images, we witness significant advances in both hardware and software development. As far as hardware is concerned, the improvement in the design of electronic components like complementary metal-oxide-semiconductor (CMOS) brings better noise immunity. Such improvements allow cameras to have greater flexibility in camera parameter settings, especially for using high signal gain (commonly known by the name of ISO speed in photography) without introducing too much noise to images. Thus, digital photography becomes more versatile under different lighting conditions and can be used for high-speed photography. In addition, the ever-increasing computational power of consumer-level mobile devices brought by the improvement in hardware allows more sophisticated computational photography algorithms to be processed in real-time. Among these algorithms, merging multiple time-sequential image frames is a very popular computational photography strategy used by consumer-level devices, especially for high dynamic range (HDR) imaging [10]. By processing a burst shots of images, the resultant image can be of higher dynamic range, less noisy and often aesthetically more appealing. Thus, the HDR imaging mode has received great popularity and become available in most mobile imaging devices.
While the above mentioned improvements are greatly appreciated by the users, new challenges are faced by existing SPN-based device fingerprinting methods. Often, existing SPN-based device fingerprinting methods are working on the correlation between the noise residuals extracted from the images. The intra-class correlations (the correlations between noise residuals of images from the same source device) can be greatly affected by images' ISO speeds and the alignment operation used in multi-frame computational photography algorithms. This results in compromised forensic accuracy when running existing SPN-based methods on these images. Thus, insightful investigations are required to understand the problems behind and develop effective forensic methods accordingly. However, the images of the existing datasets in the public domain are not purposefully collected to help answer these problems. Therefore, we have built a new dataset called Warwick Image Forensics Dataset, which can not only serve the same purposes as the existing datasets, but also includes images with their source cameras working in different exposure settings. It is intended to pave the way for finding methods to deal with the impact on the accuracy of device fingerprinting due to exposure parameter settings and multiframe computational photography algorithms.
The rest of the paper is organized as follows. In the next section, related work, including existing forensic datasets, will be discussed. The details of the Warwick Image Forensics Dataset are presented in Section 3 and experimental evaluations are carried out in Section 4. A conclusion is given in Section 5.

ISO Speed's Impact On SPN-Based Digital Forensics
SPN, as a fixed pattern noise, primarily arises from PRNU. [3] considers an image I with a sensor output model as: where g is the camera gain, γ is the gamma correction factor and Y is the scene light intensity. The model considers two major noise terms, represented by Λ and Θ q , respectively. Λ is a combination of noise sources including dark current, shot noise and the read-out noise. Θ q represents the quantization noise. The PRNU term of our interest is represented by K, showing the non-uniform response to the scene light intensity Y. The model is simplified in [3] by exploiting the Taylor expansion of the gamma correction and can be written as: with I (0) = (gY) γ , being the sensor output in the absence of noise, and Θ = γI (0) Λ/Y + Θ q , being a complex of PRNU-irrelevant random noise components. Written in this form, the PRNU component I (0) K is a multiplicative term with the noise free image I (0) . However, the role of camera gain, g, in the sensor output model can be easily ignored. Given similar I (0) from different images, the size of Θ would differ with different camera gain g as higher g requires less input intensity Y to produce the same output signal I (0) . As Θ = γI (0) Λ/Y + Θ q , a smaller Y will induce more PRNUirrelevant noise in an image's noise residual. Because SPN is often estimated as the noise residual of an image, the addition of SPN-irrelevant images will make this image's noise residual less correlated with noise residuals extracted from other intra-class images.
With the above relationship in mind, in [11], the authors empirically show that given similar contents in images taken with different ISO speed settings, the intra-class correlation distributions can vary according to ISO speeds, which directly control the camera gain g. This results in higher error rates in source camera identification for images of higher ISO speeds. Due to this phenomenon, [11] suggests that camera exposure parameters like ISO speed should be considered from a forensic perspective. It is also suggested that the construction of forensic image datasets should include images of different exposure parameter settings, which can also be beneficial for studies in steganalysis.

High Dynamic Range Imaging
HDR images can capture more details from scenes compared to standard dynamic range (SDR) images and hence receive much attention from computational photography researchers. From the early works in [12,13] to the more recent works like HDR+ [14] and deep neural network based methods [15], different HDR imaging techniques are developed to allow them to be used under different conditions. Despite the differences, these methods also share a few things in common, which make HDR images a hard subject in general for SPN-based device fingerprinting. For most HDR imaging algorithms, conventional exposure methods of taking a set of time-sequential images are often used, despite some methods have images with the same exposure time and some others use images with different exposure time. A radiance map can be reconstructed from a set of time-sequential images and provides a larger dynamic range than single exposure images. However, as it is almost impossible to avoid object or camera motion during the capturing process of the time-sequential image sets, the reconstruction of the radiance map usually involves pixel-wise alignment to compensate the object motions across different image frames to avoid motion blurring. Such an operation will mix the SPN signal from different pixel and cause misalignment between the SPN embedded in the resultant HDR images and reference SPN extracted from single exposure images taken by the same camera. Due to such misalignment, intra-class SPN pairs will be less correlated and cause difficulty in SPN-based provenance analysis.
In addition to the misalignment problem, tone mapping is another operation commonly used in HDR algorithms, which can cause trouble for existing SPN-based forensic methods. Tone mapping is used to reconstruct a color image from a radiance map. Each implementation of different HDR algorithms may have its unique tone mapping curve and on top of that, different tone mapping curves can be applied either globally or locally on the same image. As SPN-based forgery localization methods often use a content dependent correlation predictor to estimate the block-wise intra-class correlations to discover pixels with its SPN absent, without the prior knowledge of the tone mapping curve, reliable predictions from the correlation predictor can hardly be expected. These problems require specific adjustment for existing SPN-based methods to make them effective on HDR images.

Existing Public Image Datasets
As a rapidly developing topic, device fingerprinting draws many researchers' attention and several image datasets are constructed over the years to facilitate the researches. One of the earliest image datasets adopted for device fingerprinting is the Uncompressed Colour Image Dataset (UCID) [16]. From then on, more dedicated image datasets for provenance analysis are constructed. Notably, the Dresden Image Dataset [8], RAISE dataset [17] and VISION dataset [9] are three datasets widely used for benchmarking in device fingerprinting. Each dataset consists of a large number of high resolution images from multiple devices, either digital cameras or smartphone cameras. More recent datasets like the SOCRatES [18] and DAXING datasets [19] feature images from a vast number of source devices (103 smartphone cameras from SOCRatES and 90 smartphone cameras from DAXING dataset). Despite the images from these datasets show good diversity and heterogeneity in terms of contents, all the above mentioned datasets focus on SDR images only and the diversity in camera exposure parameter settings was not given adequate consideration during the construction of these datasets.
The 'HDR dataset' from [20] is the first forensic dataset featuring HDR images. The images in this dataset are taken with 23 smartphone cameras and for each scene included in this dataset, both a SDR image and a HDR image are provided. The images are taken under three different conditions: taken from the tripod, by the hand and by a shaky hand. Despite [20] featuring both SDR and HDR images, its real contribution of the image pairs towards the understanding of HDR images' impact on source device identification is limited. Firstly, the SDR images included in the dataset are not the SDR images used for the construction of the HDR images. As a result, these pairs may not best reflect the impact of HDR algorithms on device fingerprints in SDR images. Secondly, as the HDR images in this dataset are generated directly from the smartphones, the coverage of different implementations of HDR algorithms are confined by the choice of smartphones included in this dataset. As the development of new HDR algorithms continues, research findings stemmed from this dataset are unlikely to be applicable to other HDR images produced by future algorithms. Acknowledging this problem, our Warwick Image Forensics Dataset takes the flexibility of generating HDR images using different implementations of HDR algorithms into account as we shall see from the following section.

DATASET DETAILS
In this section, we present the details of our Warwick Image Forensics Dataset.

The selection of cameras
The images from the Warwick Image Forensics Dataset are captured by 14 digital cameras. The details and the technical specifications of the cameras are shown in Table 1. The primary goal of this dataset is helping the digital forensic community to develop better understanding of the impacts from both camera exposure parameter settings and multi-frame computational photography algorithms, especially HDR imaging, on device fingerprinting. The choice of using digital cameras instead of smartphone cameras in this dataset allows us to have better control on camera exposure parameter settings during the image capturing process. And with these fine controls, the images captured are suitable for different HDR algorithms, whether they are using images of the same or different exposures to produce HDR images. The 14 cameras are from 11 different models and cover a good range of major camera manufacturers. Also, the 14 cameras show good diversity of different image sensor formats with the smallest sensor of comparable size to the sensors used on smartphones cameras.

Image Acquisition
The images from this dataset can be categorized into the following three classes: • Flatfield images • SDR images • HDR-ready SDR images The flatfield images are mainly for reference SPN extraction. For each camera, 100 flatfield images are captured by taking photos of a flat blue board with the lenses adjusted to be out of focus. For each image shot, the camera is set to its lowest ISO speed to reduce the amount of read-out noise in the image. The exposure metering of each shot is adjusted to normal exposure, making the images neither too dark nor too saturated.
The SDR images in this dataset are the standard dynamic range images taken with the cameras' single-shot mode and thus cannot be used for HDR merging algorithms. These images are taken with systematic control of the cameras' ISO speed. For each camera, images are taken with the ISO speed set to be one of the following values: ISO 100, 200, 400, 800, 1600, 3200 and 6400, with the only exceptions from the two Panasonic Lumix DC-TZ90 as their ISO speeds go only up to 3200. 30 images of different scenes in different conditions we enable the camera's Program Mode, allowing the camera to adjust its aperture size and exposure time automatically to allow sufficient exposure. Almost all the images from this set are taken in a hand-held style. This set of images provide good diversity in scenes as well as camera exposure parameter settings at the same time.
The HDR-ready SDR images are the set of standard dynamic range images, which can be used with different algorithms to produce HDR images. Images of 20 different scenes are taken for this set. Different HDR algorithms may require different sets of images. For example, [13] uses set of images of varying exposure times and [14] expects a burst shot of under-exposure images with the same exposure time, we took continuous shots of images using three different modes. The first one is using the auto exposure bracketing (AEB) function on each camera. The AEB function allows us to take continuous shots of images with varying exposure times. The second and third modes both use fast continuous shot mode to take at least 7 continuous shots of images with the same exposure. However, one set is taken at normal exposure and the other is taken as under-exposed, usually by 1 or 2 stops measured by the cameras' exposure metering system. An example of the images taken with these three modes are shown in Fig. 1. Furthermore, to increase the diversity in exposure parameter settings, we systematically repeat these three modes with cameras set to 7 different ISO speeds as mentioned above. Thus, for each camera, more than 120 images of the same scene with various camera parameter settings are taken. The 20 different scenes included in this dataset are carefully selected, covering both indoor and outdoor, day-light and night environment, still and dynamic scenes as well as objects with different texture. The images are taken with the cameras either hand-held or sat on a tripod. With such a good diversity of camera exposure parameter settings, these images can be easily adopted by different HDR imaging algorithms and be used for other camera exposure parameter setting dependent studies as well.
For every image from our Warwick Image Forensics Dataset, both the unaltered RAW image file and the camera generated JPEG image file are available.

EXPERIMENTAL EVALUATIONS
In this section, we conduct experimental evaluations on SPNbased source camera identification and clustering's performance on the Warwick Image Forensics Dataset. In particular, we will show how the performance varies by using images of different ISO speeds for the tests.
For source camera identification, from each camera, we extract the reference SPNs from 100 flatfield JPEG images using the BM3D de-noising algorithm [21]. The extracted reference SPNs are processed by a spectrum equalizer from [22] to remove unwanted artefacts. We test the performance of source camera identification method from [1] on the SDR images from the dataset. For each image, we crop a region of 512 × 512 pixels from its center to extract the noise residual and compute the correlations with the corresponding pixels from the reference SPNs. The receiver operator characteristics (ROC) curves for the method on images of ISO speed 100, 800 and 3200 are shown in Fig. 2. Apparently, as the ISO speed gets higher, smaller under curve area is observed indicating worse performance. Fig. 3 shows the correlation matrices of pairwise correlations between noise residuals extracted from SDR images AEB Normal Exposure Burst Shots Under Exposure Burst Shots Fig. 1. Sample images of a scene from the HDR-ready SDR images in Warwick Image Forensics Dataset. These images are taken by a Canon EOS 6D Mark II with ISO speed set to 100. From top to bottom, we show the images taken with three different modes. The top one uses the camera's auto exposure bracketing (AEB) function and the following two rows are shots with consistent exposure time within each row. The middle row has normal exposure and the images in the bottom row are under exposed by 1 stop measured by the cameras exposure metering system. Due to the limit of space, we only show a portion of the images taken with three modes at ISO 100. All experiments mentioned above prove that different camera exposure settings have different levels of impact on the quality of SPN and the forensic analyses, which need to be considered in forensic research and real-world investigations. Therefore, it is important to include images of diverse camera parameter settings in the image datasets in order to facilitate future research.

CONCLUSION
In this paper, we demonstrated the impact of camera exposure parameter settings like ISO speed on the quality of SPN and the importance of having an image dataset that can facilitate future research into the development of better solutions to deal with this impact. We presented the Warwick Image Forensics Dataset, a novel forensic image dataset consisting of more than 58,600 images, captured with special attentions to exposure parameter settings. The images are from 14 different digital cameras. The good diversity of camera parameter settings allows studies on different exposure parameters' impact on device fingerprinting to be carried out on this dataset. With the diverse ways of taking these images, they can easily be used by different multi-frame computational photography algorithms including HDR imaging. Thus, HDR image related studies in device fingerprinting can be carried out using this dataset as well. In addition, the dataset can also be used for other studies like steganalysis. Thus, we believe it is beneficial for the digital forensic community with the dataset released as an open-source. ISO 800  ISO 3200   1  2  3  4  5  6  7  8  9  10  11  12  13  14   1  2  3  4  5  6  7  8  9  10  11  12  13  14   1  2  3  4  5  6  7  8  9  10  11  12