Publications Teaching Software Data Events e-mail

Data sets used in publications by David Meredith

First book of J. S. Bach's Das Wohltemperirte Klavier (BWV 846 - 869) (41544 notes)

This dataset was used in the following publication:

Meredith, D. (2003). Pitch spelling algorithms. In R. Kopiez, A. C. Lehmann, I. Wolther and C. Wolf (eds.), Proceedings of the Fifth Triennial ESCOM Conference, September 8-13, 2003, Hanover University of Music and Drama, Hanover, Germany, pp. 204-207. [Abstract] [Full paper]

File	Description
ps13escomopnd.zip (156KB)	Zip file containing OPND format files of all Preludes and all Fugues in first book of J. S. Bach's Das Wohltemperirte Klavier (BWV 846 - 869) as used in comparison described in paper on pitch spelling algorithms published at ESCOM 5 conference in Hanover in 8-13 September 2003. For further details, see here.
ps13escomnotes.zip (196KB)	Zip file containing "notes" format files of all Preludes and all Fugues in first book of J. S. Bach's Das Wohltemperirte Klavier (BWV 846 - 869) as used in comparison described in paper on pitch spelling algorithms published at ESCOM 5 conference in Hanover in 8-13 September 2003. These files can be used as input to Temperley and Sleator's Melisma programs. For further details, see here.

"8 x 25000" note data set (195972 notes)

This data set contains 195972 notes and consists of 216 movements from works by 8 baroque and classical composers (Corelli, Vivaldi, Telemann, J. S. Bach, Handel, Haydn, Mozart and Beethoven). The idea was to have a corpus containing roughly equal amounts of music from a variety of composers in order to allow more accurate measurement of the extent to which the performance of an algorithm depends on compositional style. The corpus was derived by automatic conversion (with some "cleaning up") from the MuseData collection of encoded scores. A complete listing of the music in the corpus is given on pages 14-18 of my dissertation, which is available here and, on the Oxford University Research Archive, here.

The data is available in MIDI format, OPD format (which gives, for each note, the onset time, chromatic pitch, morphetic pitch and duration) and OPNDV fomat (which gives, for each note, the onset time, pitch name, duration and voice). The OPNDV format files were processed so that they do not contain overlapping notes, making them suitable to be used as input to Temperley's Melisma system. A "noisy" version of the corpus is also provided in which the onset times and durations are randomly adjusted by small amounts to simulate data derived from a performance. Full information about the data set is provided in my dissertation on pages 13-31.

This data set was used in the following publications:

Meredith, D. and Wiggins, G. A. (2005). Comparing pitch spelling algorithms. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), London, 11-15 September 2005. [Online publication]

Meredith, D. (2006). The ps13 pitch spelling algorithm. Journal of New Music Research, 35(2), pp. 121-159. [Online publication]

Meredith, D. (2007). Computing Pitch Names in Tonal Music: A Comparative Analysis of Pitch Spelling Algorithms. D.Phil. dissertation. Faculty of Music, University of Oxford. Defended 2 February 2007, final version approved 10 May 2007. [Full text (PDF, 6.19MB)] [Publication on Oxford University Research Archive]

Meredith, D. (2007). Optimizing Chew and Chen's pitch spelling algorithm. Computer Music Journal, 31(2), pp. 54-72. [Online publication]

File	Description
opnd-m.zip (771KB)	Clean version of the corpus in OPNDV format. Each file contains a Lisp list of elements. Each element represents either a note in the score or a sequence of tied notes. Each element is itself a list, with the format (o p d v) where o is the onset time in tatums and p is the pitch name in standard ASA format but with "n" for natural, "s" for sharp and "f" for flat (s and f can be repeated any number of times to give multiple sharps or flats). d is the duration in tatums and v is an integer indicating the voice to which the note belongs. In situations where a note starts with the same pitch as another note that is already sounding, the previously started note with the pitch is stopped. This is to ensure that the data can be loaded into Temperley's Melisma system which cannot deal with notes that overlap in this way (for more details see pages 19-20 and pages 342-347 of my dissertation.)
opd.zip (789KB)	Clean version of the corpus in OPD format. Each file contains a Lisp list of elements. Each element represents either a note in the score or a sequence of tied notes. Each element has the format, (o c m d v), where o is the onset time (in tatums), c is the chromatic pitch of the note, m is the morphetic pitch of the note, d is the duration of the note and v is an integer indicating the voice of the note. For more information about chromatic and morphetic pitch, see pages 37-51 of my dissertation. The dissertation also provides algorithms for converting between this "chromamorphetic pitch" encoding and pitch names.
midi.zip (381KB)	Clean corpus in MIDI format using note onsets and durations given in opnd-m-from-nts.zip.
opnd-m-from-nts.zip (817KB)	The same files as in opnd-m.zip, except that durations and onset times are expressed in milliseconds, not tatums. The timing of notes in these files are those used in the MIDI files in midi.zip.
opnd-m-noisy.zip (1MB)	A version of the "8 x 25000" corpus in which the onset times and durations of notes in opnd-m-from-nts.zip have been randomly adjusted by small amounts. The dataset was used to evaluate how robust algorithms were to temporal deviations like the ones that occur in data derived from human performances. For more information see pages 30-31, 332-340 of my dissertation.
opnd-m-noisy-MIDI.zip (801KB)	MIDI files for the noisy version of the "8 x 25000" corpus using the onsets and durations given in opnd-m-noisy.zip.
opd-noisy.zip (967KB)	OPD format files generated from opnd-m-noisy.zip. Onset times and durations are "noisy" and in milliseconds, as in opnd-m-noisy.zip; pitch names are replaced with chromatic pitch and morphetic pitch as in opd.zip. Note that, in these files, each datapoint is on a separate line, and has the format o c m d v with no parentheses (to ease parsing in C-like languages).
opd-from-nts.zip (723KB)	OPD format files generated from opnd-m-from-nts.zip. Onset times and durations are clean and in milliseconds, as in opnd-m-from-nts.zip; pitch names are replaced with chromatic pitch and morphetic pitch as in opd.zip. Note that, in these files, each datapoint is on a separate line, and has the format o c m d v with no parentheses (to ease parsing in C-like languages).