THE DEVELOPMENT AND EVALUATION OF TECHNIQUES FOR USE IN MAMMOGRAPHIC SCREENING COMPUTER AIDED DETECTION SYSTEMS
Date of Award
Doctor of Philosophy
Electrical and Computer Engineering
The material presented in this dissertation details techniques developed to aid in the detection of a specific type of cancerous lesion visible on screening mammography images. These spiculated lesions most often appear as centrally bright objects with semi-defined borders. Furthermore, lesion margins are composed of indicative spiculations or fine tendrils projecting outward from the mass center. The techniques developed here to identify these characteristics and detect these objects are intended to operate as a processing pipeline. The first group of these processing stages is responsible for converting raw mammogram pixel data into localized and described objects. A second group of processing stages categorizes these objects by manipulating their descriptors and evaluating their meaning. At the conclusion of this processing pipeline, it is intended that image pixels which designate a cancerous mass will be highlighted and presented to a human operator as an aid in the early detection of breast cancers. The initial problem of object localization is addressed with breast tissue region extraction followed by a specialized spot detection algorithm. Tissue region extraction is accomplished using specific dataset image domain knowledge along with a simple threshold segmentation algorithm. Once this image area of interest is specified, contained objects of interest are identified using Iterative Disjoint Region Detection (IDRD). This specialized procedure utilizes iterative threshold segmentation to produce a three dimensional map of each image's pixel space. In this map, two dimensions directly correspond to the spatial dimension of the original image while the third corresponds to the normalized gray level of individual pixels. Traversing this map from the brightest pixel values to the darkest yields object "peaks", which are taken to be seeds of visible objects. Seeds are further processed at each successive threshold iteration by considering the effects of combining adjacent designations. This seeding process effectively detected all objects of interest with at least one seed. Because it was designed as a general purpose spot detection algorithm, many non-cancerous locally bright objects were detected as well. These other detections accounted for a wide majority of the seeds noted in each mammogram with approximately thirty to sixty seeds identified in most dataset images. A complementary task to object localization is the identification of each object's visible border and pixel area. This process is accomplished by a customized general purpose region growing routine, commonly known as pixel aggregation. During this procedure, spatially attached pixels are considered for inclusion with a prototype region defined by the region's corresponding seed object. Candidate pixels must meet a gray tone similarity criteria with our inclusion interval computed using the template region's average gray value. This process is supplemented by a leakage detection mechanism which serves to detect and recover from over segmentation of non-target objects in the image space. Leakage detection operates by tracking pixel aggregation rates for each iteration of the region growing process. A leakage is said to occur if the aggregation rate profile exhibits telltale characteristics of object border crossing followed by segmentation of an adjacent object. Once objects have been localized and their member pixels identified through the proceeding procedures it is the purpose of the next system stage to describe these objects using various measured features. The extraction of these measurements is the final step in transforming objects from image based visual depictions to abstract numerical representations. This new representation facilitates the forthcoming statistical treatment of these objects. Feature extraction is accomplished using a number of general use as well as special purpose measurements which quantify characteristics such as object shape, texture, and parent seed evolution. A total of forty-one feature measurements are extracted in order to insure full representation of detected objects and to facilitate accurate object class membership. In the next section of work, we seek to categorize these objects which have just been detected, segmented and described using feature measurements. The roll of a statistical classifier in accomplishing this is presented along with specifics as to the type of classifier used here. The use of a Bayes classifier is discussed and rationalized along with the development of the parametric Gaussian model for class conditional density estimation. Along with classifier development, a treatment of system performance evaluation is given. The Free-response Receiver Operating Characteristic (FROC) is described as an appropriate method by which to evaluate observer studies. This method suits the described CAD system, as a certain number of false positive detections are seen as acceptable and the system goal is to maximize mass sensitivity within these bounds. Our CAD system supplements the traditional classifier components by considering the effects of advanced feature vector manipulation. In total, five distinct models are developed including various iterations of feature selection and feature vector transformation. The Select model is presented as a benchmark and consists of a cumulative performance based feature selection step. The PCT Select and the DCT Select models are used to generate new feature vectors from the original measured set as linear combinations of its elements. PCT and DCT indicate the vector transformation model, Principle Components Transform and Discrete Cosine Transform respectively. Once transformed, the resultant feature vectors are processed with the same Select feature selection routine as in the benchmark model. The goal with both Transform-Select feature manipulation models is to generate a compact feature set which retains all of the necessary discriminatory information from measured features while rejecting measured characteristics which do not support accurate object classification. Two related models are also considered which measure the impact of implementing feature pre-selection on the PCT Select and the DCT Select models. The aptly named Select PCT Select and the Select DCT Select models seek to remove measured features which contain no discriminatory information from the pool of transformed data. System performance results for the five selection models are then compared to discern the contribution of each in the detection of cancerous masses. A complete analysis of the feature selection and transformation models show that while the benchmark Select model performs reasonably, considerable performance improvements are possible using feature vector manipulation methods. Performance metrics are generated with the use of a Free-response Receiver Operating Characteristic (FROC) plot. This method compares the mass detection sensitivity possible to the number of false positive detections per mammogram evaluated. Feature selection and classifier training is performed to maximized this sensitivity at a particular operating point, 4 FPpI. This point is taken as within the range of acceptable false indications in a typical clinical setting. Overall, the best system performance is seen with the use of the Select DCT Select feature model (84.51% sensitivity at 4 FPpI). This corresponds to a net increase of eighteen additional mass detections with the same amount of false positive indications and an increased mass sensitivity of 84.51% from 71.53% using the benchmark Select model. The other selection model using a pre-selection stage, Select PCT Select, reports similar performance results. This model is used to detect 118 true positive masses, sixteen more then the Select model and just two less then the Select DCT Select model. Both of the other system configurations, PCT Select and DCT Select, were able to detect 109 true masses in the data set. This corresponds to a 76.76% mass sensitivity at 4 FPpI. Although not as impressive as results generated with the pre-selection models, this is still a 5.23% improvement in mass sensitivity in comparison to the benchmark.
This dissertation is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.