Data Sets Used in Modern Multivariate Statistical Techniques

This page is under construction.

The data sets listed below are available for downloading in different formats. Many of these data sets and their descriptions can also be found elsewhere on the Internet --- e.g., in the UCI Machine Learning Repository ( or in the StatLib website (

24 psychological tests, both schools (doc) (xls)

24 psychological tests, individual schools: Grant-White (xls), Pasteur (xls)

Alon.colon (article)

alontop.df (txt)

alontop.df2 (txt)

baseball (txt) (article)

bodyfat (xls)

British-towns: lower-triangular proximity matrix (txt) (xls)

bupa (txt)

cleveland (txt) (htm)

color-stimuli: dissimilarity matrix (txt) (xls)

COMBO-17 (csv)

COMBO-17, no missing data (csv)

covertype (data.gz) (htm)

diabetes (txt)

ecoli (data)

fetal.ecg (txt)

food (txt)

food2 (txt) (xml)

geyser (xls)

gilgaied.soil (xls)

glass (txt)

glass2 (txt)

golub (txt)

Hidalgo1872 (xls)

ionosphere (data)

iris (data)

letter-recognition (txt)

Lloyds Bank employees: notes (rtf)

Lloyds Bank employees data: samp05 (xls) (sd2) samp25 (xls) (sd2)

Lloyds Bank employees proximity matrices: samp05d (xls) (sd2) samp25d (xls) (sd2)`


Morse-code proximity matrix (txt) (xls)

ncifinal (txt) (xls)

norwaypaper1 (txt)

optdigits.train (txt)

packetdata (xls)

PAH (txt)

pendigits (txt)

PET (xls)

pima-indians-diabetes (data) (txt)

pima (txt)

primate.scapulae (txt) (xls)

satimage (txt)

segmentation (data) (xls)

shuttle.train (txt)

shuttle.test (txt)

sonar_all-data (txt)

sonarB (txt)

spambase (txt) (xls)

spambase.train (txt)

spambase.test (txt)

SRBCT (txt)

steganographyStarTrek (txt) (xls)

SwissBankNotes (txt)

ticdata2000 (txt) ticeval2000 (txt) tictgts2000 (txt) description (txt)

tobacco (txt) (xls)

ushighways (txt) (xls)

vehicle (txt) (xls)

wdbc (data)

wine.test (xls)

wine.train (txt) (xls)

yeast (data) (txt)